How to get Term Weights (document term matrix)?

2006-11-02 Thread Soeren Pekrul
Hello, I would like to extract and store the document term matrix externally. I iterate the terms and the documents for each term: TermEnum terms=IndexReader.terms(); while(terms.next()) { TermDocs docs=IndexReader.termDocs(terms.term()); while(docs.next()) { //s

Re: How to get Term Weights (document term matrix)?

2006-11-03 Thread Chris Hostetter
cene.apache.org : To: java-user@lucene.apache.org : Subject: How to get Term Weights (document term matrix)? : : Hello, : : I would like to extract and store the document term matrix externally. I : iterate the terms and the documents for each term: : TermEnum terms=IndexReader.terms(); : while(t

Re: How to get Term Weights (document term matrix)?

2006-11-03 Thread Soeren Pekrul
Chris Hostetter wrote: I don't really know what a "term matrix" is, but when you ask about "weight' is it possible you are just looking for the TermDoc.freq() of the term/doc pair? Thank you Chris, that was also my first idea. I wanted to get the document frequency indexreader.docFreq(

Re: How to get Term Weights (document term matrix)?

2006-11-03 Thread Chris Hostetter
: It seems that there is no simple function to ask the weight for a term : in a document directly. So I decide not to iterate the documents of a as i said: it depends on what you mean by "term weight" ... : term or the terms of a document. I'm iterating the terms of the index, : searching for th

Re: How to get Term Weights (document term matrix)?

2006-11-04 Thread Soeren Pekrul
Chris Hostetter wrote: You really, *REALLY* don't wnat to be doing this using the "Hits" class like in your example ... 1) this will re-execute your search behind the scenes many many times 2) the scores returnd by "Hits" are psuedo-normalized ... they will be meaningless for any sort