: Thursday, March 14, 2002 9:51 PM
To: Lucene Users List
Subject: Re: Need pointers on using a very small part of Lucene
Robert,
>
> I just have one more question - how do I remove repeated words? Does
> anyone have a filter for doing this?
>
> For example, here's the result of o
> What I want to do is pass to a Lucene method some text, and have it return
> the text that it would normally put into the index.
The part of Lucene that does this is called the Analyzer. There are
quite a few Analyzers in the Lucene distribution, depending on the text you
plan to process, so
Robert,
>
> I just have one more question - how do I remove repeated words? Does
> anyone have a filter for doing this?
>
> For example, here's the result of one of my files being worked on:
> "todai customer.formattedmailingaddress3 dear customer.dearnam respond
> request inform productlongnam s
I must say, lucene is pretty damn cool.
I now have it working and filtering stuff using a custom analyzer I built
named FragmentAnalyzer. It works like a StandardAnalyzer but also uses the
PorterStemFilter.
I'm very impressed with its speed.
I just have one more question - how do I remove repea
On Thu, 14 Mar 2002, Robert A. Decker wrote:
> Yes, unique terms. I've started looking at the StandardAnalyzer, and
> related classes, and I'll see if I can use them for what I want.
>
> Also, I'd like massage the text based a bit more than just the unique
> terms. For example, common words shou
Yes, unique terms. I've started looking at the StandardAnalyzer, and
related classes, and I'll see if I can use them for what I want.
Also, I'd like massage the text based a bit more than just the unique
terms. For example, common words should be removed (some of which are
found in the StandardAn
Hi
I am a little confused by your request.
When you say get the text that lucene would normally put into the index
doesn't really make sense since lucene is term based.
What data are you trying to get. The set of unique terms for each document?
If you are trying to use lucene to normalize the da