hello all, i have a collection of a few million documents; i have many
duplicates in this collection. they have been clustered with a simple
algorithm, i have a field called 'duplicate' which is 0 or 1 and a
fields called 'description, tags, meta', documents are clustered on
different criteria and the text i search against could be very
different among members of a cluster.

im currently using a dismax handler to search across the text fields
with different boosts, and a filter query to restrict to masters
(duplicate: 0)

my question is then, how do i best query for documents which are
masters OR match text but are not included in the matched set of
masters?

does this make sense?

Reply via email to