Solr 3.3. has a feature "Grouping". Is it practically same as deduplication?

Here is my use case for duplicates removal -

We have many documents with similar (upto 99%) content. Upon some search
queries, almost all of them come up on first page results. Of all these
documents, essentially one is original and the other are duplicates. We are
able to find the original content on a basis of number of factors - who
uploaded it, when, how many viral shares.....It is also possible that the
duplicates are uploaded earlier (and hence exist in search index) while the
original is uploaded later (and gets added later to index).

AFAIK, Deduplication targets index time. Is there a means I can specify the
original which should be returned and the duplicates which could be removed
from coming up.?


*Pranav Prakash*

"temet nosce"

Twitter <http://twitter.com/pranavprakash> | Blog <http://blog.myblive.com> |
Google <http://www.google.com/profiles/pranny>

Reply via email to