Hi Erik,
Thanks for the reply. What I want to do is, to identify key terms and key
phrases of a document according to their number of occurences in the
document. Output should be the highest freequency words and (two or three
word) phrases. For this purpose can I use Lucene?
Thanks
Manjula
On Th
thank you very much for your answer, but even trying to solve the
problem at the boolean layer, the problem remains at ranking function,
therefore the quality of the ranking would be very low, since term
frequency function is not computed properly.
jose
On Wed, May 5, 2010 at 4:11 PM, Yonik Seele
Sorry to be so long in getting back on this. The patch you provided has
improved the situation but we are still seeing some memory loss. The following
are some images from the heap dump. I'll share with you what we are seeing now.
This first image shows the memory pattern. Our fist commit tak
Terms are relatively easy, see TermFreqVector in the JavaDocs.
Phrases aren't as easy, before you go there, though, what is the
high-level problem you're trying to solve? Possibly this is an XY problem
(see http://people.apache.org/~hossman/#xyproblem).
Best
Erick
On Thu, May 6, 2010 at 6:39 AM,
Hi,
I am new to Lucene. If I want to know the term or phrase frequency of an
input document, will it be possible through Lucene?
Thanks,
Manjula