More like this query questions

2013-03-06 Thread Misty Nodine
I have been looking over the more like this code. It looks like, in the code, the more like this query simply does more like this based on the first of the fields, and fails to consider the rest. Thus, if I have title and body indexed for some document, it will do the more like this based only

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

2013-03-06 Thread saisantoshi
Thanks for the response and really appreciate your help. I have read the documentation but could not get it in the first read as I was new to Lucene. I have changed it to AtomicReader and it seems to be working fine. One last clarification is do we also need to use AtomicReader for the following b

Re: Getting a similarity score for an arbitrary pair of documents or a query and a document

2013-03-06 Thread Emmanuel Espina
Have you already checked Solr's more like this? http://wiki.apache.org/solr/MoreLikeThisHandler and http://wiki.apache.org/solr/MoreLikeThis Your describe a problem similar to the use case of that component and if there is something to hack is solr's more like this. Lucene's similarity is a low le

Re: Split index and store

2013-03-06 Thread Emmanuel Espina
I understand and it sounds ok. The "store" index would be like an ordinary database where you search by value. Another approach you could consider is to compress the field before indexing. That is you compress with http://docs.oracle.com/javase/1.5.0/docs/api/java/util/zip/GZIPInputStream.html and

Getting a similarity score for an arbitrary pair of documents or a query and a document

2013-03-06 Thread Michael O'Leary
Is there an api in Lucene for finding the similarity score for two documents that have been randomly pulled from an index? What about for a query and a randomly selected document? I realize this isn't the standard purpose of Lucene, but I was given a task to compare similarity scores for the Simil