Re: True master-master fail-over without data gaps (choosing CA in CAP)

Otis Gospodnetic Wed, 09 Mar 2011 11:49:25 -0800

Hi,

---- Original Message ----


> From: Robert Petersen <rober...@buy.com>
>
> Can't you skip the SAN and keep the indexes locally?  Then you  would
> have two redundant copies of the index and no lock issues.  

I could, but then I'd have the issue of keeping them in sync, which seems more 
fragile.  I think SAN makes things simpler overall.
 
> Also, Can't master02 just be a slave to master01 (in the master farm  and
> separate from the slave farm) until such time as master01 fails?   Then

No, because it wouldn't be in sync.  It would always be N minutes behind, and 
when the primary master fails, the secondary would not have all the docs - data 
loss.

> master02 would start receiving the new documents with an  indexes
> complete up to the last replication at least and the other slaves  would
> be directed by LB to poll master02 also...

Yeah, "complete up to the last replication" is the problem.  It's a data gap 
that now needs to be filled somehow.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


> -----Original  Message-----
> From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
> Sent: Wednesday, March 09, 2011 9:47 AM
> To: solr-user@lucene.apache.org
> Subject:  Re: True master-master fail-over without data gaps (choosing CA
> in  CAP)
> 
> Hi,
> 
> 
> ----- Original Message ----
> > From: Walter  Underwood <wun...@wunderwood.org>
> 
> > On  Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote:
> > 
> > > You mean  it's  not possible to have 2 masters that are in nearly
> real-time 
> >sync?
> > > How  about with DRBD?  I know people use  DRBD to keep 2 Hadoop NNs
> (their 
> >edit 
> >
> > > logs) in  sync to avoid the current NN SPOF, for example, so I'm
> thinking 
> >this 
> >
> > > could be doable with Solr masters, too, no?
> > 
> > If you add fault-tolerant, you run into the CAP  Theorem.  Consistency,
> 
> >availability, partition: choose two. You cannot have  it  all.
> 
> Right, so I'll take Consistency and Availability, and I'll  put my 2
> masters in 
> the same rack (which has redundant switches, power  supply, etc.) and
> thus 
> minimize/avoid partitioning.
> Assuming the above  actually works, I think my Q remains:
> 
> How do you set up 2 Solr masters so  they are in near real-time sync?
> DRBD?
> 
> But here is maybe a simpler  scenario that more people may be
> considering:
> 
> Imagine 2 masters on 2  different servers in 1 rack, pointing to the same
> index 
> on the shared  storage (SAN) that also happens to live in the same rack.
> 2 Solr masters are  behind 1 LB VIP that indexer talks to.
> The VIP is configured so that all  requests always get routed to the
> primary 
> master (because only 1 master  can be modifying an index at a time),
> except when 
> this primary is down,  in which case the requests are sent to the
> secondary 
> master.
> 
> So in  this case my Q is around automation of this, around Lucene index
> locks, 
> around the need for manual intervention, and such.
> Concretely, if you  have these 2 master instances, the primary master has
> the 
> Lucene index  lock in the index dir.  When the secondary master needs to
> take 
> over  (i.e., when it starts receiving documents via LB), it needs to be
> able to 
> write to that same index.  But what if that lock is still around?   One
> could use 
> the Native lock to make the lock disappear if the primary  master's JVM
> exited 
> unexpectedly, and in that case everything *should*  work and be
> completely 
> transparent, right?  That is, the secondary  will start getting new docs,
> it will 
> use its IndexWriter to write to that  same shared index, which won't be
> locked 
> for writes because the lock is  gone, and everyone will be happy.  Did I
> miss 
> something important  here?
> 
> Assuming the above is correct, what if the lock is *not* gone  because
> the 
> primary master's JVM is actually not dead, although maybe  unresponsive,
> so LB 
> thinks the primary master is dead.  Then the LB  will route indexing
> requests to 
> the secondary master, which will attempt  to write to the index, but be
> denied 
> because of the lock.  So a  human needs to jump in, remove the lock, and
> manually 
> reindex failed docs  if the upstream component doesn't buffer docs that
> failed to 
> get indexed  and doesn't retry indexing them automatically.  Is this
> correct or 
> is there a way to avoid humans  here?
> 
> Thanks,
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>

Re: True master-master fail-over without data gaps (choosing CA in CAP)

Reply via email to