Matt - what you are describing is about search-time grouping a la SOLR-236. Deduplication stuff I mention is for index-time near-duplicate detection.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Matt Mitchell <goodie...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Tuesday, May 26, 2009 1:40:01 PM > Subject: Re: grouping response docs together > > Thanks Otis. I'll give the dedup a test drive today. > > I'll explain what I'm trying to do a little better though because I don't > think I have yet! > > So, I'm indexing an XML file. There are different "sections" in the XML > file. Each of those sections gets a solr doc (the xml text-only is indexed). > Each solr doc also has a field to specify the source filename. What I'd like > to have happen is, when I do a search, I want my search results to combine > all documents that have the same filename... I want to "group by" filename > if that makes sense. Or at the very least, show only one and indicate that > there are more. > > Matt > > On Tue, May 26, 2009 at 12:58 PM, Otis Gospodnetic < > otis_gospodne...@yahoo.com> wrote: > > > > > Matt, > > > > The Deduplication feature in Solr does support near-duplicate scenario. It > > comes with a few components to help you detect near-duplicates, and you > > should be able to write a custom near-dupe detection component and plug it > > in. > > > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > > > From: Matt Mitchell > > > To: solr-user@lucene.apache.org > > > Sent: Monday, May 25, 2009 3:30:42 PM > > > Subject: Re: grouping response docs together > > > > > > Thanks guys. I looked at the dedup stuff, but the documents I'm adding > > > aren't really duplicates. They're very similar, but different. > > > > > > I checked out the field collapsing feature patch, applied the patch but > > > can't get it to build successfully. Will this patch work with a nightly > > > build? > > > > > > Thanks! > > > > > > On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic < > > > otis_gospodne...@yahoo.com> wrote: > > > > > > > > > > > Matt - you may also want to detect near duplicates at index time: > > > > > > > > http://wiki.apache.org/solr/Deduplication > > > > > > > > Otis > > > > -- > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > > > > > > > > > ----- Original Message ---- > > > > > From: Matt Mitchell > > > > > To: solr-user@lucene.apache.org > > > > > Sent: Friday, May 15, 2009 6:52:48 PM > > > > > Subject: grouping response docs together > > > > > > > > > > Is there a built-in mechanism for grouping similar documents together > > in > > > > the > > > > > response? I'd like to make it look like there is only one document > > with > > > > > multiple "hits". > > > > > > > > > > Matt > > > > > > > > > > > >