Re: Turning on KeywordRepeat and RemoveDups on an existing fieldType.

Jack Krupansky Mon, 05 May 2014 14:45:26 -0700

I haven't personally used this technique, but I gather that the intent isthat the unstemmed term will have a lower term frequency (more unique) thanthe stemmed term which may generate the same stemmed term from a number ofdifferent source terms.

To answer your question, no, you don't need a separate field or type forthis feature, but it will tend to generate a lot more terms in your indexsince it will index a stemmed term as two terms.


Only use the repeat/remove filters for the index analyzer.

You will need to reindex to see the full effect immediately, but you can dothe reindex incrementally (as you replace existing documents) as well if youdon't mind if the difference in relevancy takes an extended time to becomeapparent.


-- Jack Krupansky

-----Original Message-----From: Michael Tracey

Sent: Monday, May 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Turning on KeywordRepeat and RemoveDups on an existing fieldType.

As per the stemming docs (https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), Iwant to score the original term higher than the stemmed version by adding:


  <filter class="solr.KeywordRepeatFilterFactory"/>
  <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

to a field type that is already created (with Stemming). I have 100Mdocuments in this index, and it gets slowly reindexed every month as recordschange. My question is, can I add this to the existing fieldType, or do Ineed to make a new fieldType, and copyField the data over to it, and afterit's all reindexed switch my code? I'd rather be able to just add the linesto my fieldType because I don't think I have enough disk space on my cloudmembers to hold my primary fulltext field twice.

Just in case it helps, I'm running 4.4.0 and the field I'm wanting to modlooks like this:

<fieldType name="keywordText" class="solr.TextField"positionIncrementGap="100">

     <analyzer type="index">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.WordDelimiterFilterFactory"generateWordParts="1" generateNumberParts="1" catenateWords="1"catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>

       <filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.StopFilterFactory" ignoreCase="true"words="keyword_stopwords.txt" enablePositionIncrements="true" /><filter class="solr.SnowballPorterFilterFactory" language="English"protected="protwords.txt"/>

     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"ignoreCase="true" expand="true"/><filter class="solr.StopFilterFactory" ignoreCase="true"words="keyword_stopwords.txt" enablePositionIncrements="true" /><filter class="solr.WordDelimiterFilterFactory"generateWordParts="1" generateNumberParts="1" catenateWords="0"catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>

       <filter class="solr.LowerCaseFilterFactory"/>

<filter class="solr.SnowballPorterFilterFactory" language="English"protected="protwords.txt"/>

     </analyzer>
   </fieldType>

Thanks,

M.

Re: Turning on KeywordRepeat and RemoveDups on an existing fieldType.

Reply via email to