subject:"sanity check on how stemming, stopwords, and snowball analyzer works together"

Re: sanity check on how stemming, stopwords, and snowball analyzer works together

2007-10-15 Thread Mark Miller

It depends on the order of the filters in your Analyzer. You would want to be sure you put the StopWord filter before the Stemming filter. The reason that the MoreLikeThis class does not do as you want is that first it applies the Analyzer (which stems) and THEN it applies its custom stop word

Re: sanity check on how stemming, stopwords, and snowball analyzer works together

2007-10-15 Thread Donna L Gresh

I wasn't sure this: Instead add the stopwords to the analyzer that > you pass to MoreLikeThis. That way you can ensure that the analyzer > applies the stopword list before stemming would work, because I don't want to provide all the variants of the stopword list-- if I do this, only the one pr

Re: sanity check on how stemming, stopwords, and snowball analyzer works together

2007-10-15 Thread Mark Miller

Sounds right to me. The other option I think you have is to not use the MoreLikeThis stopword functionality. Instead add the stopwords to the analyzer that you pass to MoreLikeThis. That way you can ensure that the analyzer applies the stopword list before stemming (The MoreLikeThis stopword

sanity check on how stemming, stopwords, and snowball analyzer works together

2007-10-15 Thread Donna L Gresh

Could those "in the know" comment on my current understanding of stemming and stopwords using the snowball analyzer? In my application, I am using the MoreLikeThis class to find similar documents to an input "text blob". There are words in the input text blob which are "uninteresting" for my ap