Hi, EdgeNGramFilterFactory seems to drop tokens shorter than minGramSize param. Check the example of minGramSize="4" maxGramSize="6" case in below page. https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html#edge-n-gram-filter
So, you should set minGramSize=2 or 1 if you want to keep 72 or the other short tokens, I think. Thanks, Yasufumi 2019年7月4日(木) 17:20 Shamik Bandopadhyay <sham...@gmail.com>: > Hi, > > I'm using EdgeNGramFilterFactory to support partial search. Here's my > field definition. > > <fieldType name="adsktext" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> > <filter class="solr.PorterStemFilterFactory"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" > maxGramSize="30"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" /> > <filter class="solr.SynonymGraphFilterFactory" > synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> > <filter class="solr.PorterStemFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType> > > I run into an issue when I'm trying a numeric terms in search. For e.g. if > I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores > hou and hour in index. Since I'm using AND operator, the query fails to > match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but > I thought that would be an un-necessary overhead. Is there a reason why 72 > is ignored and what'll be the best way to address this scenario? > > Any pointers will be appreciated. > > Thanks, > Shamik >