Term/Phrase frequencies

2010-05-06 Thread manjula wijewickrema
Hi, I am new to Lucene. If I want to know the term or phrase frequency of an input document, will it be possible through Lucene? Thanks, Manjula

Re: Term/Phrase frequencies

2010-05-06 Thread Erick Erickson
Terms are relatively easy, see TermFreqVector in the JavaDocs. Phrases aren't as easy, before you go there, though, what is the high-level problem you're trying to solve? Possibly this is an XY problem (see http://people.apache.org/~hossman/#xyproblem). Best Erick On Thu, May 6, 2010 at 6:39 AM,

RE: IndexWriter and memory usage

2010-05-06 Thread Woolf, Ross
Sorry to be so long in getting back on this. The patch you provided has improved the situation but we are still seeing some memory loss. The following are some images from the heap dump. I'll share with you what we are seeing now. This first image shows the memory pattern. Our fist commit tak

Re: problem in Lucene's ranking function

2010-05-06 Thread José Ramón Pérez Agüera
thank you very much for your answer, but even trying to solve the problem at the boolean layer, the problem remains at ranking function, therefore the quality of the ranking would be very low, since term frequency function is not computed properly. jose On Wed, May 5, 2010 at 4:11 PM, Yonik Seele

Re: Term/Phrase frequencies

2010-05-06 Thread manjula wijewickrema
Hi Erik, Thanks for the reply. What I want to do is, to identify key terms and key phrases of a document according to their number of occurences in the document. Output should be the highest freequency words and (two or three word) phrases. For this purpose can I use Lucene? Thanks Manjula On Th