Thanks for the advice. I want to keep the capitalization because in our
application we are mining specific contact and company names from news
articles. About 99% of the time if we match a contact or company and it's
capitalized we avoid false matches.
--Larry
On May 18, 2010, at 7:46 PM, Eric
You can construct your own analyzer by creating
it from a pre-existing Tokenizer
(e.g. WhiteSpaceTokenizer) and any number
of TokenfFilters (e.g. TokenFilter). You can
string any number of TokenFilters together
to get many different effects.
But I have to ask, why you want to keep capitalization?
Hi Larry-
> Right now I'm using Lucene with a basic Whitespace Anayzer but I'm having
> problems with stemming. Does anyone have a recommendation for other
> text analyzers that handle stemming and also keep capitalization, stop words,
> and punctuation?
Have you tried the SnowballFilter? You co