Re: Numeric value ignored by EdgeNGramFilterFactory

Zheng Lin Edwin Yeo Thu, 04 Jul 2019 20:27:00 -0700

Hi,

You can use the "Analysis" page in the Solr Admin UI to input your value
and test the output, and see how the tokenizersand various filters does to
your value.


Regards,
Edwin

On Thu, 4 Jul 2019 at 17:28, Yasufumi Mizoguchi <yasufumi0...@gmail.com>
wrote:

> Hi,
>
> EdgeNGramFilterFactory seems to drop tokens shorter than minGramSize param.
> Check the example of minGramSize="4" maxGramSize="6" case in below page.
>
> https://lucene.apache.org/solr/guide/8_1/filter-descriptions.html#edge-n-gram-filter
>
> So, you should set minGramSize=2 or 1 if you want to keep 72 or the other
> short tokens, I think.
>
> Thanks,
> Yasufumi
>
> 2019年7月4日(木) 17:20 Shamik Bandopadhyay <sham...@gmail.com>:
>
> > Hi,
> >
> >    I'm using EdgeNGramFilterFactory to support partial search. Here's my
> > field definition.
> >
> > <fieldType name="adsktext" class="solr.TextField"
> > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1"
> > generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="0"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
> > <filter class="solr.PorterStemFilterFactory"/>
> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> > maxGramSize="30"/>
> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.WordDelimiterGraphFilterFactory"
> generateWordParts="1"
> > generateNumberParts="1" catenateWords="0" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="0"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> > <filter class="solr.SynonymGraphFilterFactory"
> > synonyms="synonyms/synonyms.txt" ignoreCase="true" expand="true"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
> > <filter class="solr.PorterStemFilterFactory"/>
> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> > </analyzer>
> > </fieldType>
> >
> > I run into an issue when I'm trying a numeric terms in search. For e.g.
> if
> > I search for "72 hours", EdgeNGramFilterFactory ignores 72 and only
> stores
> > hou and hour in index. Since I'm using AND operator, the query fails to
> > match 72 hours. I can enable EdgeNGramFilterFactory in the query chain,
> but
> > I thought that would be an un-necessary overhead. Is there a reason why
> 72
> > is ignored and what'll be the best way to address this scenario?
> >
> > Any pointers will be appreciated.
> >
> > Thanks,
> > Shamik
> >
>

Re: Numeric value ignored by EdgeNGramFilterFactory

Reply via email to