RE: True master-master fail-over without data gaps (choosing CA in CAP)

Robert Petersen Wed, 09 Mar 2011 10:03:18 -0800

Can't you skip the SAN and keep the indexes locally?  Then you would
have two redundant copies of the index and no lock issues.

Also, Can't master02 just be a slave to master01 (in the master farm and
separate from the slave farm) until such time as master01 fails?  Then
master02 would start receiving the new documents with an indexes
complete up to the last replication at least and the other slaves would
be directed by LB to poll master02 also...

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
Sent: Wednesday, March 09, 2011 9:47 AM
To: solr-user@lucene.apache.org
Subject: Re: True master-master fail-over without data gaps (choosing CA
in CAP)

Hi,

----- Original Message ----
> From: Walter Underwood <wun...@wunderwood.org>

> On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote:
> 
> > You mean it's  not possible to have 2 masters that are in nearly
real-time 
>sync?
> > How  about with DRBD?  I know people use DRBD to keep 2 Hadoop NNs
(their 
>edit 
>
> > logs) in sync to avoid the current NN SPOF, for example, so I'm
thinking 
>this 
>
> > could be doable with Solr masters, too, no?
> 
> If you add fault-tolerant, you run into the CAP  Theorem. Consistency,

>availability, partition: choose two. You cannot have it  all.

Right, so I'll take Consistency and Availability, and I'll put my 2
masters in 
the same rack (which has redundant switches, power supply, etc.) and
thus 
minimize/avoid partitioning.
Assuming the above actually works, I think my Q remains:

How do you set up 2 Solr masters so they are in near real-time sync?
DRBD?

But here is maybe a simpler scenario that more people may be
considering:

Imagine 2 masters on 2 different servers in 1 rack, pointing to the same
index 
on the shared storage (SAN) that also happens to live in the same rack.
2 Solr masters are behind 1 LB VIP that indexer talks to.
The VIP is configured so that all requests always get routed to the
primary 
master (because only 1 master can be modifying an index at a time),
except when 
this primary is down, in which case the requests are sent to the
secondary 
master.

So in this case my Q is around automation of this, around Lucene index
locks, 
around the need for manual intervention, and such.
Concretely, if you have these 2 master instances, the primary master has
the 
Lucene index lock in the index dir.  When the secondary master needs to
take 
over (i.e., when it starts receiving documents via LB), it needs to be
able to 
write to that same index.  But what if that lock is still around?  One
could use 
the Native lock to make the lock disappear if the primary master's JVM
exited 
unexpectedly, and in that case everything *should* work and be
completely 
transparent, right?  That is, the secondary will start getting new docs,
it will 
use its IndexWriter to write to that same shared index, which won't be
locked 
for writes because the lock is gone, and everyone will be happy.  Did I
miss 
something important here?

Assuming the above is correct, what if the lock is *not* gone because
the 
primary master's JVM is actually not dead, although maybe unresponsive,
so LB 
thinks the primary master is dead.  Then the LB will route indexing
requests to 
the secondary master, which will attempt to write to the index, but be
denied 
because of the lock.  So a human needs to jump in, remove the lock, and
manually 
reindex failed docs if the upstream component doesn't buffer docs that
failed to 
get indexed and doesn't retry indexing them automatically.  Is this
correct or 
is there a way to avoid humans here?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

RE: True master-master fail-over without data gaps (choosing CA in CAP)

Reply via email to