Re: Stemming Problem

2010-05-19 Thread Larry Hendrix
Thanks for the advice. I want to keep the capitalization because in our application we are mining specific contact and company names from news articles. About 99% of the time if we match a contact or company and it's capitalized we avoid false matches. --Larry On May 18, 2010, at 7:46 PM, Eric

Re: Stemming Problem

2010-05-18 Thread Erick Erickson
You can construct your own analyzer by creating it from a pre-existing Tokenizer (e.g. WhiteSpaceTokenizer) and any number of TokenfFilters (e.g. TokenFilter). You can string any number of TokenFilters together to get many different effects. But I have to ask, why you want to keep capitalization?

RE: Stemming Problem

2010-05-18 Thread Christopher Condit
Hi Larry- > Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having > problems with stemming. Does anyone have a recommendation for other > text analyzers that handle stemming and also keep capitalization, stop words, > and punctuation? Have you tried the SnowballFilter? You co