On Feb 26, 2009, at 6:04 AM, CIF Search wrote:

We have a distributed index consisting of several shards. There could be
some documents repeated across shards. We want to remove the duplicate
records from the documents returned from the shards, and re-order the
results by grouping them on the basis of a clustering algorithm and
reranking the documents within a cluster on the basis of log of a particular
returned field value.


I think you would have to implement your own QueryComponent. However, you may be able to get away with implementing/using Solr's FunctionQuery capabilities.

FieldCollapsing is also a likely source of inspiration/help (http://www.lucidimagination.com/search/?q=Field+Collapsing#/ s:email,issues)

As a side note, have you looked at http://issues.apache.org/jira/browse/SOLR-769 ?

You might also have a look at the de-duplication patch that is working it's way through dev: http://wiki.apache.org/solr/Deduplication



How do we go about achieving this? Should we write this logic by
implementing QueryResponseWriter. Also if we remove duplicate records, the total number of records that are actually returned are less than what were
asked for in the query.

Regards,
CI

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to