Re: True master-master fail-over without data gaps (choosing CA in CAP)

Smiley, David W. Wed, 09 Mar 2011 14:13:34 -0800

I was just about to jump in this conversation to mention Solandra and go fig, 
Solandra's committer comes in. :-)   It was nice to meet you at Strata, Jake.


I haven't dug into the code yet but Solandra strikes me as a killer way to 
scale Solr. I'm looking forward to playing with it; particularly looking at 
disk requirements and performance measurements.

~ David Smiley

On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote:

> Hi Otis,
> 
> Have you considered using Solandra with Quorum writes
> to achieve master/master with CA semantics?
> 
> -Jake
> 
> 
> On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com
>> wrote:
> 
>> Hi,
>> 
>> ---- Original Message ----
>> 
>>> From: Robert Petersen <rober...@buy.com>
>>> 
>>> Can't you skip the SAN and keep the indexes locally?  Then you  would
>>> have two redundant copies of the index and no lock issues.
>> 
>> I could, but then I'd have the issue of keeping them in sync, which seems
>> more
>> fragile.  I think SAN makes things simpler overall.
>> 
>>> Also, Can't master02 just be a slave to master01 (in the master farm  and
>>> separate from the slave farm) until such time as master01 fails?   Then
>> 
>> No, because it wouldn't be in sync.  It would always be N minutes behind,
>> and
>> when the primary master fails, the secondary would not have all the docs -
>> data
>> loss.
>> 
>>> master02 would start receiving the new documents with an  indexes
>>> complete up to the last replication at least and the other slaves  would
>>> be directed by LB to poll master02 also...
>> 
>> Yeah, "complete up to the last replication" is the problem.  It's a data
>> gap
>> that now needs to be filled somehow.
>> 
>> Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>> 
>> 
>>> -----Original  Message-----
>>> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
>>> Sent: Wednesday, March 09, 2011 9:47 AM
>>> To: solr-user@lucene.apache.org
>>> Subject:  Re: True master-master fail-over without data gaps (choosing CA
>>> in  CAP)
>>> 
>>> Hi,
>>> 
>>> 
>>> ----- Original Message ----
>>>> From: Walter  Underwood <wun...@wunderwood.org>
>>> 
>>>> On  Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote:
>>>> 
>>>>> You mean  it's  not possible to have 2 masters that are in nearly
>>> real-time
>>>> sync?
>>>>> How  about with DRBD?  I know people use  DRBD to keep 2 Hadoop NNs
>>> (their
>>>> edit
>>>> 
>>>>> logs) in  sync to avoid the current NN SPOF, for example, so I'm
>>> thinking
>>>> this
>>>> 
>>>>> could be doable with Solr masters, too, no?
>>>> 
>>>> If you add fault-tolerant, you run into the CAP  Theorem.  Consistency,
>>> 
>>>> availability, partition: choose two. You cannot have  it  all.
>>> 
>>> Right, so I'll take Consistency and Availability, and I'll  put my 2
>>> masters in
>>> the same rack (which has redundant switches, power  supply, etc.) and
>>> thus
>>> minimize/avoid partitioning.
>>> Assuming the above  actually works, I think my Q remains:
>>> 
>>> How do you set up 2 Solr masters so  they are in near real-time sync?
>>> DRBD?
>>> 
>>> But here is maybe a simpler  scenario that more people may be
>>> considering:
>>> 
>>> Imagine 2 masters on 2  different servers in 1 rack, pointing to the same
>>> index
>>> on the shared  storage (SAN) that also happens to live in the same rack.
>>> 2 Solr masters are  behind 1 LB VIP that indexer talks to.
>>> The VIP is configured so that all  requests always get routed to the
>>> primary
>>> master (because only 1 master  can be modifying an index at a time),
>>> except when
>>> this primary is down,  in which case the requests are sent to the
>>> secondary
>>> master.
>>> 
>>> So in  this case my Q is around automation of this, around Lucene index
>>> locks,
>>> around the need for manual intervention, and such.
>>> Concretely, if you  have these 2 master instances, the primary master has
>>> the
>>> Lucene index  lock in the index dir.  When the secondary master needs to
>>> take
>>> over  (i.e., when it starts receiving documents via LB), it needs to be
>>> able to
>>> write to that same index.  But what if that lock is still around?   One
>>> could use
>>> the Native lock to make the lock disappear if the primary  master's JVM
>>> exited
>>> unexpectedly, and in that case everything *should*  work and be
>>> completely
>>> transparent, right?  That is, the secondary  will start getting new docs,
>>> it will
>>> use its IndexWriter to write to that  same shared index, which won't be
>>> locked
>>> for writes because the lock is  gone, and everyone will be happy.  Did I
>>> miss
>>> something important  here?
>>> 
>>> Assuming the above is correct, what if the lock is *not* gone  because
>>> the
>>> primary master's JVM is actually not dead, although maybe  unresponsive,
>>> so LB
>>> thinks the primary master is dead.  Then the LB  will route indexing
>>> requests to
>>> the secondary master, which will attempt  to write to the index, but be
>>> denied
>>> because of the lock.  So a  human needs to jump in, remove the lock, and
>>> manually
>>> reindex failed docs  if the upstream component doesn't buffer docs that
>>> failed to
>>> get indexed  and doesn't retry indexing them automatically.  Is this
>>> correct or
>>> is there a way to avoid humans  here?
>>> 
>>> Thanks,
>>> Otis
>>> ----
>>> Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>> 
>> 
> 
> 
> 
> -- 
> http://twitter.com/tjake

Re: True master-master fail-over without data gaps (choosing CA in CAP)

Reply via email to