RE: Need pointers on using a very small part of Lucene

2002-03-15 Thread Alex Murzaku
: Thursday, March 14, 2002 9:51 PM To: Lucene Users List Subject: Re: Need pointers on using a very small part of Lucene Robert, > > I just have one more question - how do I remove repeated words? Does > anyone have a filter for doing this? > > For example, here's the result of o

Re: Need pointers on using a very small part of Lucene

2002-03-15 Thread Brian Goetz
> What I want to do is pass to a Lucene method some text, and have it return > the text that it would normally put into the index. The part of Lucene that does this is called the Analyzer. There are quite a few Analyzers in the Lucene distribution, depending on the text you plan to process, so

Re: Need pointers on using a very small part of Lucene

2002-03-14 Thread Kelvin Tan
Robert, > > I just have one more question - how do I remove repeated words? Does > anyone have a filter for doing this? > > For example, here's the result of one of my files being worked on: > "todai customer.formattedmailingaddress3 dear customer.dearnam respond > request inform productlongnam s

Re: Need pointers on using a very small part of Lucene

2002-03-14 Thread Robert A. Decker
I must say, lucene is pretty damn cool. I now have it working and filtering stuff using a custom analyzer I built named FragmentAnalyzer. It works like a StandardAnalyzer but also uses the PorterStemFilter. I'm very impressed with its speed. I just have one more question - how do I remove repea

Re: Need pointers on using a very small part of Lucene

2002-03-14 Thread Joshua O'Madadhain
On Thu, 14 Mar 2002, Robert A. Decker wrote: > Yes, unique terms. I've started looking at the StandardAnalyzer, and > related classes, and I'll see if I can use them for what I want. > > Also, I'd like massage the text based a bit more than just the unique > terms. For example, common words shou

Re: Need pointers on using a very small part of Lucene

2002-03-14 Thread Robert A. Decker
Yes, unique terms. I've started looking at the StandardAnalyzer, and related classes, and I'll see if I can use them for what I want. Also, I'd like massage the text based a bit more than just the unique terms. For example, common words should be removed (some of which are found in the StandardAn

Re: Need pointers on using a very small part of Lucene

2002-03-14 Thread Peter Carlson
Hi I am a little confused by your request. When you say get the text that lucene would normally put into the index doesn't really make sense since lucene is term based. What data are you trying to get. The set of unique terms for each document? If you are trying to use lucene to normalize the da