Thanks Otis. I'll give the dedup a test drive today.

I'll explain what I'm trying to do a little better though because I don't
think I have yet!

So, I'm indexing an XML file. There are different "sections" in the XML
file. Each of those sections gets a solr doc (the xml text-only is indexed).
Each solr doc also has a field to specify the source filename. What I'd like
to have happen is, when I do a search, I want my search results to combine
all documents that have the same filename... I want to "group by" filename
if that makes sense. Or at the very least, show only one and indicate that
there are more.

Matt

On Tue, May 26, 2009 at 12:58 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Matt,
>
> The Deduplication feature in Solr does support near-duplicate scenario.  It
> comes with a few components to help you detect near-duplicates, and you
> should be able to write a custom near-dupe detection component and plug it
> in.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: Matt Mitchell <goodie...@gmail.com>
> > To: solr-user@lucene.apache.org
> > Sent: Monday, May 25, 2009 3:30:42 PM
> > Subject: Re: grouping response docs together
> >
> > Thanks guys. I looked at the dedup stuff, but the documents I'm adding
> > aren't really duplicates. They're very similar, but different.
> >
> > I checked out the field collapsing feature patch, applied the patch but
> > can't get it to build successfully. Will this patch work with a nightly
> > build?
> >
> > Thanks!
> >
> > On Fri, May 15, 2009 at 7:47 PM, Otis Gospodnetic <
> > otis_gospodne...@yahoo.com> wrote:
> >
> > >
> > > Matt - you may also want to detect near duplicates at index time:
> > >
> > > http://wiki.apache.org/solr/Deduplication
> > >
> > >  Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > >
> > > ----- Original Message ----
> > > > From: Matt Mitchell
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Friday, May 15, 2009 6:52:48 PM
> > > > Subject: grouping response docs together
> > > >
> > > > Is there a built-in mechanism for grouping similar documents together
> in
> > > the
> > > > response? I'd like to make it look like there is only one document
> with
> > > > multiple "hits".
> > > >
> > > > Matt
> > >
> > >
>
>

Reply via email to