Document comparison

2005-02-18 Thread Matt Chaput
Is there a simple, efficient way to compute similarity of documents indexed with Lucene? My first, naive idea is to use the entire contents of one document as a query to the second document, and use the score as a similarity measurement. But I think I'm probably way off base with that. Can

Re: Document comparison

2005-02-18 Thread Matt Chaput
My first, naive idea is to use the entire contents of one document as a query to the second document, Sorry, I meant use the entire contents of one document as a query *on the rest of the corpus*. -- Matt Chaput Word Monkey Side Effects Software Inc. A goddamned ray of sunshine all the

Re: Document comparison

2005-02-18 Thread Otis Gospodnetic
Matt, Erik and I have some code for this in Lucene in Action, but David Spencer did this since the book was published: http://www.lucenebook.com/blog/announcements/more_like_this.html Otis --- Matt Chaput [EMAIL PROTECTED] wrote: Is there a simple, efficient way to compute similarity of

Re: Document comparison

2005-02-18 Thread David Spencer
Otis Gospodnetic wrote: Matt, Erik and I have some code for this in Lucene in Action, but David Spencer did this since the book was published: http://www.lucenebook.com/blog/announcements/more_like_this.html If you want an informal way of doing it you're right, just feed the words of the

Re: Document comparison

2005-02-18 Thread Matt Chaput
Matt, Erik and I have some code for this in Lucene in Action, but David Spencer did this since the book was published: http://www.lucenebook.com/blog/announcements/more_like_this.html Otis Awesome awesome awesome! Thanks very much. -- Matt Chaput Word Monkey Side Effects Software Inc. A