Hi,

EdgeNGramFilterFactory seems to drop tokens shorter than minGramSize param.
Check the example of minGramSize="4" maxGramSize="6" case in below page.
https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html#edge-n-gram-filter

So, you should set minGramSize=2 or 1 if you want to keep 72 or the other
short tokens, I think.

Thanks,
Yasufumi

2019年7月4日(木) 17:20 Shamik Bandopadhyay <sham...@gmail.com>:

> Hi,
>
>    I'm using EdgeNGramFilterFactory to support partial search. Here's my
> field definition.
>
> <fieldType name="adsktext" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
> <filter class="solr.PorterStemFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> maxGramSize="30"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterGraphFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="0"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
> <filter class="solr.SynonymGraphFilterFactory"
> synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
> <filter class="solr.PorterStemFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldType>
>
> I run into an issue when I'm trying a numeric terms in search. For e.g. if
> I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only stores
> hou and hour in index. Since I'm using AND operator, the query fails to
> match 72 hours. I can enable EdgeNGramFilterFactory in the query chain, but
> I thought that would be an un-necessary overhead. Is there a reason why 72
> is ignored and what'll be the best way to address this scenario?
>
> Any pointers will be appreciated.
>
> Thanks,
> Shamik
>

Reply via email to