Re: Hot to get word importance in lucene index

2010-07-23 Thread Xaida
Thanx! I am not sure, I have to study this class more deeper today , this is bit complex, and i am not so advanced user to understand all. But this part written in description is important to me: "An efficient, effective "more-like-this" query generator would be a great contribution, if anyone'

Re: Hot to get word importance in lucene index

2010-07-23 Thread Xaida
Hi! thanks for reply! I will try to explain better, sorry if it was unclear. I have user text document collection. Not too big. Goal is to get the most "important" concepts which would in a way represent user interests. That is what i mean when i say important :) So lets say, in my collection

Hot to get word importance in lucene index

2010-07-22 Thread Xaida
Hi all! hmmm, i need to get how important is the word in entire document collection that is indexed in the lucene index. I need to extract some "representable words", lets say concepts that are common and can be representable to whole collection. Or collection "keywords". I did the fulltext index

Applying term frequency thresholds on indexing time

2010-05-24 Thread Xaida
Hi guys! does there exist a way to define some threshold on the terms I wanna store in the index(before they are indexed). I need to store the terms with higheest frequencies. I done it with term vectors and some cutoff ratio that cuts off the least occuring terms, but all this is, ofcourse work

Implementing Analyzer with Pling Stemmer

2010-05-23 Thread Xaida
Hi guys! for the purpose of my project professor has advised me I should use the PlingStemmer to index the terms obtained from Lucene. http://www.mpi-inf.mpg.de/yago-naga/javatools/doc/javatools/parsers/PlingStemmer.html I see it is new approach and for sure I understand benefits of incorpora