Re: TFIDF Implementation

2004-12-15 Thread David Spencer
Christoph Kiefer wrote: David, Bruce, Otis, Thank you all for the quick replies. I looked through the BooksLikeThis example. I also agree, it's a very good and effective way to find similar docs in the index. Nevertheless, what I need is really a similarity matrix holding all TF*IDF values. For ill

Re: TFIDF Implementation

2004-12-15 Thread Christoph Kiefer
David, Bruce, Otis, Thank you all for the quick replies. I looked through the BooksLikeThis example. I also agree, it's a very good and effective way to find similar docs in the index. Nevertheless, what I need is really a similarity matrix holding all TF*IDF values. For illustration I quick and di

Re: TFIDF Implementation

2004-12-14 Thread David Spencer
Bruce Ritchie wrote: You can also see 'Books like this' example from here https://secure.manning.com/catalog/view.php?book=hatcher2&item=source Well done, uses a term vector, instead of reparsing the orig doc, to form the similarity query. Also I like the way you exclude the source doc in th

Re: TFIDF Implementation

2004-12-14 Thread David Spencer
Bruce Ritchie wrote: From the code I looked at, those calls don't recalculate on every call. I was referring to this fragment below from BooksLikeThis.docsLike(), and was mentioning it as the javadoc http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/in dex/TermFreqVector.html does n

RE: TFIDF Implementation

2004-12-14 Thread Bruce Ritchie
> > From the code I looked at, those calls don't recalculate on > every call. > > I was referring to this fragment below from BooksLikeThis.docsLike(), > and was mentioning it as the javadoc > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/in > dex/TermFreqVector.html > does not

RE: TFIDF Implementation

2004-12-14 Thread Bruce Ritchie
> > You can also see 'Books like this' example from here > > > https://secure.manning.com/catalog/view.php?book=hatcher2&item=source > > Well done, uses a term vector, instead of reparsing the orig > doc, to form the similarity query. Also I like the way you > exclude the source doc in the q

RE: TFIDF Implementation

2004-12-14 Thread Otis Gospodnetic
You can also see 'Books like this' example from here https://secure.manning.com/catalog/view.php?book=hatcher2&item=source Otis --- Bruce Ritchie <[EMAIL PROTECTED]> wrote: > Christoph, > > I'm not entirely certain if this is what you want, but a while back > David Spencer did code up a 'More L

Re: TFIDF Implementation

2004-12-14 Thread David Spencer
Otis Gospodnetic wrote: You can also see 'Books like this' example from here https://secure.manning.com/catalog/view.php?book=hatcher2&item=source Well done, uses a term vector, instead of reparsing the orig doc, to form the similarity query. Also I like the way you exclude the source doc in the

Re: TFIDF Implementation

2004-12-14 Thread David Spencer
Bruce Ritchie wrote: Christoph, I'm not entirely certain if this is what you want, but a while back David Spencer did code up a 'More Like This' class which can be used for generating similarities between documents. I can't seem to find this class in the sandbox Ot oh, sorry, I'll try to get this

RE: TFIDF Implementation

2004-12-14 Thread Bruce Ritchie
Christoph, I'm not entirely certain if this is what you want, but a while back David Spencer did code up a 'More Like This' class which can be used for generating similarities between documents. I can't seem to find this class in the sandbox so I've attached it here. Just repackage and test.