KStem Token Filter ------------------ Key: SOLR-379 URL: https://issues.apache.org/jira/browse/SOLR-379 Project: Solr Issue Type: New Feature Components: search Reporter: Pieter Berkel Priority: Minor
A Lucene / Solr implementation of the KStem stemmer. Full credit goes to Harry Wagner for adapting the Lucene version found here: http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi Background discussion to this stemmer (including licensing issues) can be found in this thread: http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295 I've made some minor changes to KStemFilterFactory so that it compiles cleanly against trunk: 1) removed some unnecessary imports 2) changed the init() method parameters introduced by SOLR-215 3) moved KStemFilterFactory into package org.apache.solr.analysis Once compiled and included in your Solr war (or as a jar in your lib directory, the KStem filter can be used in your schema very easily: <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.StandardFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.KStemFilterFactory" cacheSize="20000"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.