Is there a simple, efficient way to compute similarity of documents
indexed with Lucene?
My first, naive idea is to use the entire contents of one document as a
query to the second document, and use the score as a similarity
measurement. But I think I'm probably way off base with that.
Can
My first, naive idea is to use the entire contents of one document as
a query to the second document,
Sorry, I meant use the entire contents of one document as a query *on
the rest of the corpus*.
--
Matt Chaput
Word Monkey
Side Effects Software Inc.
A goddamned ray of sunshine all the
Matt,
Erik and I have some code for this in Lucene in Action, but David
Spencer did this since the book was published:
http://www.lucenebook.com/blog/announcements/more_like_this.html
Otis
--- Matt Chaput [EMAIL PROTECTED] wrote:
Is there a simple, efficient way to compute similarity of
Otis Gospodnetic wrote:
Matt,
Erik and I have some code for this in Lucene in Action, but David
Spencer did this since the book was published:
http://www.lucenebook.com/blog/announcements/more_like_this.html
If you want an informal way of doing it you're right, just feed the
words of the
Matt,
Erik and I have some code for this in Lucene in Action, but David
Spencer did this since the book was published:
http://www.lucenebook.com/blog/announcements/more_like_this.html
Otis
Awesome awesome awesome! Thanks very much.
--
Matt Chaput
Word Monkey
Side Effects Software Inc.
A