Re: Distributed search fault tolerance

Mark Miller Sat, 13 Mar 2010 22:14:34 -0800

My response to this was mangled by my email client - sorry - hopefullythis one comes through a little easier to read ;)


On 03/09/2010 04:28 PM, Shawn Heisey wrote:

I attended the Webinar on March 4th. Many thanks to Yonik for puttingthat on. That has led to some questions about the best way to bringfault tolerance to our distributed search. High level question:Should I go with SolrCloud, or stick with 1.4 and use load balancing?I hope the rest of this email isn't too disjointed for understanding.
We are using virtual machines on 8-core servers with 32GB of RAM tohouse all this. For initial deployment, there are two of these, butwe will have a total of four once we migrate off our current indexingsolution. We won't be able to bring fault tolerance into the mixuntil we have all four hosts, but I need to know what direction we aregoing before initial deployment.
One choice is to stick with version 1.4 for stability and use loadbalancing on the shards. I had already planned to have a pair of loadbalancer VMs to handle redundancy on what I'm calling the broker(explained further down), so it would not be a major step to have itdo the shards as well.
I have been looking into SolrCloud. I tried to just swap out the .warfile with one compiled from the cloud branch, but that didn't work. Alittle digging showed that the cloud branch uses a core for thecollection.

Hmm - not sure I completely follow - a collection is currently made upof n cores. Unless you override the collection name that a core shouldparticipate in, it defaults to the core name.

I already have cores defined so I can build indexes and swap them intoplace quickly. A big question - can I continue to use this multi-coreapproach with SolrCloud, or does it supplant cores with its collectionlogic?


You should be able to do anything you were doing pre SolrCloud I believe.

Due to the observed high CPU requirements involved in sorting resultsfrom multiple shards into a final result, I have so far opted to gowith an architecture that puts an empty index into a broker core,which lives on its own VM host separate from the large static shards.This core's solrconfig.xml has a list of all the shards that getqueried. My application has no idea that it's talking to anythingother than a single SOLR instance. Once we get the caches warmed,performance is quite good.
The VM host with the broker will also have another VM with the shardwhere all new data goes, a concept we call the incrememental. On anightly basis, some of the documents in the incremental will beredistributed to the static shards and everything will get reoptimized.
How would you recommend I pursue fault tolerance? I had alreadyplanned to set up a load balancer VM to handle redundancy for thebroker, so it would not be a HUGE step to have it load balance theshards too.

The SolrCloud stuff has search side fault tolerance built-in - it shouldget better over time (more features eg partial results)

That's kind of a cop out answer to all you have there - but got to theend of this email and I'm feeling a bit tired.

Perhaps followup up with some clarifications/extensions to what you arelooking for and we can try some more responses.


--
- Mark

http://www.lucidimagination.com

Re: Distributed search fault tolerance

Reply via email to