Turning on KeywordRepeat and RemoveDups on an existing fieldType.

Michael Tracey Mon, 05 May 2014 13:53:46 -0700

As per the stemming docs ( 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), I want 
to score the original term higher than the stemmed version by adding:


   <filter class="solr.KeywordRepeatFilterFactory"/>
   <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

to a field type that is already created (with Stemming). I have 100M documents 
in this index, and it gets slowly reindexed every month as records change.  My 
question is, can I add this to the existing fieldType, or do I need to make a 
new fieldType, and copyField the data over to it, and after it's all reindexed 
switch my code?  I'd rather be able to just add the lines to my fieldType 
because I don't think I have enough disk space on my cloud members to hold my 
primary fulltext field twice.

Just in case it helps, I'm running 4.4.0 and the field I'm wanting to mod looks 
like this:

    <fieldType name="keywordText" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="keyword_stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="keyword_stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>
      </analyzer>
    </fieldType>

Thanks,

M.

Turning on KeywordRepeat and RemoveDups on an existing fieldType.

Reply via email to