Matt - what you are describing is about search-time grouping a la SOLR-236.
Deduplication stuff I mention is for index-time near-duplicate detection.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Matt Mitchell <goodie...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 26, 2009 1:40:01 PM
> Subject: Re: grouping response docs together
> 
> Thanks Otis. I'll give the dedup a test drive today.
> 
> I'll explain what I'm trying to do a little better though because I don't
> think I have yet!
> 
> So, I'm indexing an XML file. There are different "sections" in the XML
> file. Each of those sections gets a solr doc (the xml text-only is indexed).
> Each solr doc also has a field to specify the source filename. What I'd like
> to have happen is, when I do a search, I want my search results to combine
> all documents that have the same filename... I want to "group by" filename
> if that makes sense. Or at the very least, show only one and indicate that
> there are more.
> 
> Matt
> 
> On Tue, May 26, 2009 at 12:58 PM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
> 
> >
> > Matt,
> >
> > The Deduplication feature in Solr does support near-duplicate scenario.  It
> > comes with a few components to help you detect near-duplicates, and you
> > should be able to write a custom near-dupe detection component and plug it
> > in.
> >
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> > > From: Matt Mitchell 
> > > To: solr-user@lucene.apache.org
> > > Sent: Monday, May 25, 2009 3:30:42 PM
> > > Subject: Re: grouping response docs together
> > >
> > > Thanks guys. I looked at the dedup stuff, but the documents I'm adding
> > > aren't really duplicates. They're very similar, but different.
> > >
> > > I checked out the field collapsing feature patch, applied the patch but
> > > can't get it to build successfully. Will this patch work with a nightly
> > > build?
> > >
> > > Thanks!
> > >
> > > On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic <
> > > otis_gospodne...@yahoo.com> wrote:
> > >
> > > >
> > > > Matt - you may also want to detect near duplicates at index time:
> > > >
> > > > http://wiki.apache.org/solr/Deduplication
> > > >
> > > >  Otis
> > > > --
> > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > >
> > > >
> > > >
> > > > ----- Original Message ----
> > > > > From: Matt Mitchell
> > > > > To: solr-user@lucene.apache.org
> > > > > Sent: Friday, May 15, 2009 6:52:48 PM
> > > > > Subject: grouping response docs together
> > > > >
> > > > > Is there a built-in mechanism for grouping similar documents together
> > in
> > > > the
> > > > > response? I'd like to make it look like there is only one document
> > with
> > > > > multiple "hits".
> > > > >
> > > > > Matt
> > > >
> > > >
> >
> >

Reply via email to