Reduction based "more like this"?

2007-02-09 Thread karl wettin
I just woke up thinking it would be cool to attempt reducing the data of all documents using PCA (or so) and store the output in a new field per dimention introduced in order to find similair documents by placing a simple proximity query. Did anyone attempt something like this? I did not

Re: Reduction based "more like this"?

2007-02-09 Thread mark harwood
adds extra complexity/cost but might be an interesting avenue to explore for some apps when selecting distinguishing characteristics or weighting query results. Cheers Mark - Original Message From: karl wettin <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, 9

Re: Reduction based "more like this"?

2007-02-09 Thread Bill Janssen
> For example, given terms "female", "John" and "London" - all 3 may > have equal IDF but should a document representing a female in London > be given equal weighting to a document representing the rarer example > of a female who happens to be called "John"? Not to mention multi-word phrase tokeni