[ 
https://issues.apache.org/jira/browse/SOLR-379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pieter Berkel updated SOLR-379:
-------------------------------

    Attachment: KStemSolr.zip

I've attached a zip file containing the KStem source rather than a patch as I'm 
not sure how this code will be eventually integrated with Solr.

Since I did not write this and am unsure of the legal status of this code, I 
have not granted ASF license, although recent discussion suggests the license 
included with KStem is compatible with the Apache license.

Hopefully we'll be able to resolve these above issues fairly quickly.


> KStem Token Filter
> ------------------
>
>                 Key: SOLR-379
>                 URL: https://issues.apache.org/jira/browse/SOLR-379
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>            Reporter: Pieter Berkel
>            Priority: Minor
>         Attachments: KStemSolr.zip
>
>
> A Lucene / Solr implementation of the KStem stemmer.  Full credit goes to 
> Harry Wagner for adapting the Lucene version found here:
> http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi
> Background discussion to this stemmer (including licensing issues) can be 
> found in this thread:
> http://www.nabble.com/Embedded-about-50--faster-for-indexing-tf4325720.html#a12376295
> I've made some minor changes to KStemFilterFactory so that it compiles 
> cleanly against trunk:
> 1) removed some unnecessary imports
> 2) changed the init() method parameters introduced by SOLR-215
> 3) moved KStemFilterFactory into package org.apache.solr.analysis
> Once compiled and included in your Solr war (or as a jar in your lib 
> directory, the KStem filter can be used in your schema very easily:
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true" 
> words="stopwords.txt"/>
>         <filter class="solr.StandardFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.KStemFilterFactory" cacheSize="20000"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>       </analyzer>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to