RE: Merging documents from a distributed search

Markus Jelsma Thu, 03 Sep 2015 12:11:08 -0700

Hello - another current topic is also covering this issue, you may want to 
check it out:
http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-td4226802.html


 
 
-----Original message-----
> From:Markus Jelsma <markus.jel...@openindex.io>
> Sent: Thursday 3rd September 2015 10:27
> To: solr-user@lucene.apache.org
> Subject: RE: Merging documents from a distributed search
> 
> Hello - We're doing something similar ended up overriding QueryComponent 
> (https://issues.apache.org/jira/browse/SOLR-7968) which needs protected 
> members instead of private members first. We could do a RankQuery and use its 
> cool MergeStrategy, but we would also ened RankQuery to provide an entry 
> point for QueryComponent.createMainQuery(). That would be ideal because we 
> can then use the Collector there for local deduplication, and a combination 
> of createMainQuery and mergeIds to do the distributed deduplication.
> 
> Markus
>  
> -----Original message-----
> > From:Joel Bernstein <joels...@gmail.com>
> > Sent: Wednesday 2nd September 2015 23:46
> > To: solr-user@lucene.apache.org
> > Subject: Re: Merging documents from a distributed search
> > 
> > The merge strategy probably won't work for the type of distributed collapse
> > you're describing.
> > 
> > You may want to begin exploring the Streaming API which supports real-time
> > map/reduce operations,
> > 
> > http://joelsolr.blogspot.com/2015/03/parallel-computing-with-solrcloud.html
> > 
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> > 
> > On Wed, Sep 2, 2015 at 5:12 PM, tedsolr <tsm...@sciquest.com> wrote:
> > 
> > > I've read from  http://heliosearch.org/solrs-mergestrategy/
> > > <http://heliosearch.org/solrs-mergestrategy/>   that the AnalyticsQuery
> > > component only works for a single instance of Solr. I'm planning to
> > > "migrate" to the SolrCloud soon and I have a custom AnalyticsQuery module
> > > that collapses what I consider to be duplicate documents, keeping stats
> > > like
> > > a "count" of the dupes. For my purposes "dupes" are determined at run time
> > > and vary by the search request. Once a collection has multiple shards I
> > > will
> > > not be able to prevent "dupes" from appearing across those shards. A 
> > > custom
> > > merge strategy should allow me to merge my stats, but I don't see how I 
> > > can
> > > drop duplicate docs at that point.
> > >
> > > If shard1 returns docs A & B and shard2 returns docs B & C (letters
> > > denoting
> > > what I consider to be unique docs), can my implementation of a merge
> > > strategy return only docs A, B, & C, rather than A, B, B, & C?
> > >
> > > thanks!
> > > solr 5.2.1
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > > http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-tp4226802.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> > 
>

RE: Merging documents from a distributed search

Reply via email to