
EdgeNGramFilterFactory seems to drop tokens shorter than minGramSize param.
Check the example of minGramSize="4" maxGramSize="6" case in below page.

So, you should set minGramSize=2 or 1 if you want to keep 72 or the other
short tokens, I think.


2019年7月4日(木) 17:20 Shamik Bandopadhyay:

> Hi,
>    I'm using EdgeNGramFilterFactory to support partial search. Here's my
> field definition.
> <fieldType name="adsktext" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
> <filter class="solr.PorterStemFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> maxGramSize="30"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.SynonymGraphFilterFactory"
> synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
> <filter class="solr.PorterStemFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldType>
> I run into an issue when I'm trying a numeric terms in search. For e.g. if
> I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores
> hou and hour in index. Since I'm using AND operator, the query fails to
> match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but
> I thought that would be an un-necessary overhead. Is there a reason why 72
> is ignored and what'll be the best way to address this scenario?
> Any pointers will be appreciated.
> Thanks,
> Shamik

