Re: Stemming & Analyzers

Roger Marin Fri, 20 Aug 2010 09:39:32 -0700

Ok so now the plugin is working, it changes the analyzer to the
SnowballAnalyzer but when I parse the query some letters end up
being stripped, like for instance if I search for "exchanges" it gets turned
into "exchang" and of course not getting any results, what could be the
cause of this? as far as I can see the SnowballAnalyzer is being loaded and
used for crawling  how can I make sure that this analyzer is used by nutch
for both querying and crawling? do I need to modify any nutch classes or
maybe I need something extra in my plugin code?...this is really confusing
hope anyone can help me.


Here's the code for the snowball analyzer plugin I'm using:

public class SnowballAnalyzer extends NutchAnalyzer {

private static String[] stopWords = null;
private static int counter = 0;
static{
  stopWords  = new String[StopAnalyzer.ENGLISH_STOP_WORDS_SET.size()];
  for (Object o : StopAnalyzer.ENGLISH_STOP_WORDS_SET) {
  stopWords[counter++] = o.toString();
}
}
  private static final Analyzer ANALYZER = new
org.apache.lucene.analysis.snowball.SnowballAnalyzer(Version.LUCENE_CURRENT,
"English", stopWords);
    /** Creates a new instance of SnowballAnalyzer */
    public SnowballAnalyzer () {

    }

    public TokenStream tokenStream(String fieldName, Reader reader) {
        return ANALYZER.tokenStream(fieldName, reader);
    }

}

Thanks.

On 19 August 2010 21:09, Roger Marin <[email protected]> wrote:

> Hello,
>
> Is it possible to change the lucene analyzer that nutch uses by default? I
> would like to use the snowball analyzer to search and crawl, I tried
> creating a plugin based on the analysis-fr and alaysis-dr plugins but it
> didn't work, not sure if i need to create a plugin for querying too.
> I would also like to allow stemming but i cannot find any info on this, do
> i need to modify source code? configuration files?.
>
> I appreciate any help you can give me, thanks.
>
>
> Roger Mairn
>

Re: Stemming & Analyzers

Reply via email to