Using multiple tokenizers in Solr

Declare another fieldType (i.e. a different name) that has the NGram tokenizer, then declare a field that uses the fieldType with NGram and another field with the standard "text" fieldType. Use copyField to copy one to another. See Indexing same data in multiple fields.


An alternative would be to apply the EdgeGramFilterFactory to the existing field and stay with your current tokenizer (WhitespaceTokenizerFactory), e.g.

<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" />

This would keep your current schema unchanged, i.e. you would not need an additional field which has another tokenizer (NGramTokenizerFactory)

Your field look then something like the below:

   <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
...
</fieldType>

Tags:

Tokenize

Solr