Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

MitchK Mon, 06 Sep 2010 07:19:32 -0700

Andrzej,

thank you for sharing your experiences.




> b) use consistent hashing as the mapping schema to assign documents to a 
> changing number of shards. There are many explanations of this schema on 
> the net, here's one that is very simple: 
> 
Boom. 
With the given explanation, I understand it as the following:
You can use hadoop and do some map-reduce-jobs per csv-file.
At the reducer-side, the reducer has to look for the id of the current doc
and needs to create a hash of it.
Now it looks inside a SortedSet, picks the next-best server and looks in a
map, whether this server has got free capacity or not. That's cool.

But it doesn't solve the problem at all, correct me if I am wrong, but: If
you add a new server, let's call him IP3-1, and IP3-1 is nearer to the
current ressource X, than doc x will be indexed at IP3-1 - even if IP2-1
holds the older version. 
Am I right?

Thank you for sharing the paper. I will have a look for more like this. 



> In this case the lack of good docs and user-level API can be blamed on 
> the fact that this functionality is still under heavy development. 
> 
I do not only mean documentation at the user-level but also inside a class,
if there is going on some complicated stuff. 

- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p1426728.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)

Reply via email to