SolrCloud has not tackled multi data center yet.

I don’t think a or b are very good options yet.

Honestly, I think the best current bet is to use something like Apache Flume to 
send data to both data centers - it will handle retries and keeping things in 
sync and splitting the stream. Doesn’t satisfy all use cases though.

At some point, multi data center support will happen.

I can’t remember where ZooKeeper’s support for it is at, but with that and some 
logic to favor nodes in your data center, that might be a viable route.

- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 11:48 AM, Darrell Burgan <darrell.bur...@infor.com> wrote:

> Hello, we are using Solr in a SolrCloud configuration, with two Solr 
> instances running with three Zookeepers in a single data center. We presently 
> have a single search index with about 35 million entries in it, about 60GB 
> disk space on each of the two Solr servers (120GB total). I would expect our 
> usage of Solr to grow to include other search indexes, and likely larger data 
> volumes.
>  
> I’m writing because we’re needing to grow beyond a single data center, with 
> two (potentially incompatible) goals:
>  
> 1.       We need to be able to have a hot disaster recovery site, in a 
> completely separate data center, that has a near-realtime replica of the 
> search index.
> 
> 2.       We’d like to have the option to have multiple active/active data 
> centers that each see and update the same search index, distributed across 
> data centers.
>  
> The options I’m aware of from reading archives:
>  
> a.       Simply set up the remote Solr instances as active parts of the same 
> SolrCloud cluster. This will  essentially involve us standing up multiple 
> Zookeepers in the second data center, and multiple Solr instances, and they 
> will all keep each other in sync magically. This will also solve both of our 
> goals. However, I’m concerned about performance and whether SolrCloud is 
> smart enough to route local search queries only to local Solr servers … ? 
> Also, how does such a cluster tolerate and recover from network partitions?
> 
> b.      The remote Solr instances form their own completely unrelated 
> SolrCloud cluster. I have to invent some kind of replication logic of my own 
> to sync data between them. This replication would have to be bidirectional to 
> satisfy both of our goals. I strongly dislike this option since the 
> application really should not concern itself with data distribution. But I’ll 
> do it if I must.
>  
> So my questions are:
>  
> -          Can anyone give me any guidance as to option a? Anyone using this 
> in a real production setting? Words of wisdom? Does it work?
> 
> -          Are there any other options that I’m not considering?
> 
> -          What is Solr’s answer to such configurations (we can’t be alone in 
> needing one)? Any big enhancements coming on the Solr road map to deal with 
> this?
>  
> Thanks!
> Darrell Burgan
>  
>  
> 
> Darrell Burgan | Chief Architect, PeopleAnswers
> office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | 
> darrell.bur...@infor.com | http://www.infor.com
> CONFIDENTIALITY NOTE: This email (including any attachments) is confidential 
> and may be protected by legal privilege. If you are not the intended 
> recipient, be aware that any disclosure, copying, distribution, or use of the 
> information contained herein is prohibited.  If you have received this 
> message in error, please notify the sender by replying to this message and 
> then delete this message in its entirety. Thank you for your cooperation.
> 

Reply via email to