Re: a bug of solr distributed search

MitchK Fri, 23 Jul 2010 11:37:24 -0700

... Additionally to my previous posting:
To keep this sync we could do two things:
Waiting for every server to make sure that everyone uses the same values to
compute the score and than apply them.
Or: Let's say that we collect the new values every 15 minutes. To merge and
send them over the network, we declare that this will need 3 additionally
minutes (We want to keep the network traffic for such actions very low, so
we do not send everything instantly).
Okay, and now we say "2 additionally minutes, if 3 were not enough or
something needs a little bit more time than we tought.". After those 2
minutes, every node has to apply the new values.
Pro: If one node gets broken, we do not delay the Application of the new
values.
Con: We need two HashMaps and both will have roughly the same sice. That
means we will waste some RAM for this operation, if we do not write the
values to disk (Which I do not suggest).


Thoughts?

- Mitch

MitchK wrote:
> 
> Yonik,
> 
> why do we do not send the output of TermsComponent of every node in the
> cluster to a Hadoop instance?
> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
> only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
> After reducing, every node in the cluster gets the current values to
> compute the idf.
> We can store this information in a HashMap-based SolrCache (or something
> like that) to provide constant-time access. To keep the values up to date,
> we can repeat that after every x minutes.
> 
> If we got that, it does not care whereas we use doc_X from shard_A or
> shard_B, since they will all have got the same scores. 
> 
> Even if we got large indices with 10 million or more unique terms, this
> will only need some megabyte network-traffic.
> 
> Kind regards,
> - Mitch
> 
> 
> Yonik Seeley-2-2 wrote:
>> 
>> As the comments suggest, it's not a bug, but just the best we can do
>> for now since our priority queues don't support removal of arbitrary
>> elements.  I guess we could rebuild the current priority queue if we
>> detect a duplicate, but that will have an obvious performance impact.
>> Any other suggestions?
>> 
>> -Yonik
>> http://www.lucidimagination.com
>> 
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: a bug of solr distributed search

Reply via email to