Re: correct location in chain for EdgeNGramFilterFactory ?

Erick Erickson Tue, 24 Apr 2012 13:22:40 -0700

Well, what effect do you _want_?

I'd probably put it after the PorterStemFilterFactory. As it is, it'll
form a bunch of ngrams, then WordDelimiterFilterFactory will
try to break them up according to _its_ rules and eventually
you'll be sending absolute gibberish to the stemmer. I mean
what is the stemmer going to think of (starting out with running)
ru, run, runn, runni, runnin, running?


I suggest you spend some time with admin/analysis with various
orderings to understand better how all the parts interact.

Best
Erick

On Tue, Apr 24, 2012 at 11:20 AM, geeky2 <gee...@hotmail.com> wrote:
> hello all,
>
> i want to experiment with the EdgeNGramFilterFactory at index time.
>
> i believe this needs to go in post tokenization - but i am doing a pattern
> replace as well as other things.
>
> should the EdgeNGramFilterFactory go in right after the pattern replace?
>
>
>
>
>    <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>
>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>        <filter class="solr.PatternReplaceFilterFactory" pattern="\."
> replacement="" replace="all"/>
>
> *put EdgeNGramFilterFactory here ===> ?*
>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> preserveOriginal="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>        <filter class="solr.PatternReplaceFilterFactory" pattern="\."
> replacement="" replace="all"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> thanks for any help,
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/correct-location-in-chain-for-EdgeNGramFilterFactory-tp3935589p3935589.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: correct location in chain for EdgeNGramFilterFactory ?

Reply via email to