KStem Token Filter
------------------

                 Key: SOLR-379
                 URL: https://issues.apache.org/jira/browse/SOLR-379
             Project: Solr
          Issue Type: New Feature
          Components: search
            Reporter: Pieter Berkel
            Priority: Minor


A Lucene / Solr implementation of the KStem stemmer.  Full credit goes to Harry 
Wagner for adapting the Lucene version found here:
http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi

Background discussion to this stemmer (including licensing issues) can be found 
in this thread:
http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295

I've made some minor changes to KStemFilterFactory so that it compiles cleanly 
against trunk:
1) removed some unnecessary imports
2) changed the init() method parameters introduced by SOLR-215
3) moved KStemFilterFactory into package org.apache.solr.analysis

Once compiled and included in your Solr war (or as a jar in your lib directory, 
the KStem filter can be used in your schema very easily:

      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KStemFilterFactory" cacheSize="20000"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to