We all need example data, and a sample query to help you. You can use "group" to group by a field and remove dupes.
If you want to remove dupes you can do something like: q=field1:DOG AND NOT field2:DOG AND NOT field3:DOG That will remove DOG from field2 or field3. If you don't care if it is in any field, you can use dismax/edismax and qf, or you can just use OR. q=field1:DOG OR field2:DOG OR field3:DOG If you have a set of values that you want to remove duplicates at INDEX time you can do that with SQL (if coming from SQL), and write code in the DIH. var x = row.get("field1"); var x1 = row.get("field2"); var x2 = row.get("field3"); if (x.equals(x1)) { row.put("field2", ""); } if (x.equals(x2)) { row.put("field3",""); } That way you eliminate the dupes at index time... Bill On Tue, Jan 13, 2015 at 2:29 PM, tedsolr <tsm...@sciquest.com> wrote: > I have a complicated problem to solve, and I don't know enough about > lucene/solr to phrase the question properly. This is kind of a shot in the > dark. My requirement is to return search results always in completely > "collapsed" form, rolling up duplicates with a count. Duplicates are > defined > by whatever fields are requested. If the search requests fields A, B, C, > then all matched documents that have identical values for those 3 fields > are > "dupes". The field list may change with every new search request. What I do > know is the super set of all fields that may be part of the field list at > index time. > > I know this can't be done with configuration alone. It doesn't seem > performant to retrieve all 1M+ docs and post process in Java. A very smart > person told me that a custom hit collector should be able to do the > filtering for me. So, maybe I create a custom search handler that somehow > exposes this custom hit collector that can use FieldCache or DocValues to > examine all the matches and filter the results in the way I've described > above. > > So assuming this is a viable solution path, can anyone suggest some helpful > posts, code fragments, books for me to review? I admit to being out of my > depth, but this requirement isn't going away. I'm grasping for straws right > now. > > thanks > (using Solr 4.9) > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Engage-custom-hit-collector-for-special-search-processing-tp4179348.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Bill Bell billnb...@gmail.com cell 720-256-8076