Re: True master-master fail-over without data gaps (choosing CA in CAP)

Jake Luciani Wed, 09 Mar 2011 12:14:50 -0800

Hi Otis,

Have you considered using Solandra with Quorum writes
to achieve master/master with CA semantics?


-Jake


On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com
> wrote:

> Hi,
>
> ---- Original Message ----
>
> > From: Robert Petersen <rober...@buy.com>
> >
> > Can't you skip the SAN and keep the indexes locally?  Then you  would
> > have two redundant copies of the index and no lock issues.
>
> I could, but then I'd have the issue of keeping them in sync, which seems
> more
> fragile.  I think SAN makes things simpler overall.
>
> > Also, Can't master02 just be a slave to master01 (in the master farm  and
> > separate from the slave farm) until such time as master01 fails?   Then
>
> No, because it wouldn't be in sync.  It would always be N minutes behind,
> and
> when the primary master fails, the secondary would not have all the docs -
> data
> loss.
>
> > master02 would start receiving the new documents with an  indexes
> > complete up to the last replication at least and the other slaves  would
> > be directed by LB to poll master02 also...
>
> Yeah, "complete up to the last replication" is the problem.  It's a data
> gap
> that now needs to be filled somehow.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> > -----Original  Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
> > Sent: Wednesday, March 09, 2011 9:47 AM
> > To: solr-user@lucene.apache.org
> > Subject:  Re: True master-master fail-over without data gaps (choosing CA
> > in  CAP)
> >
> > Hi,
> >
> >
> > ----- Original Message ----
> > > From: Walter  Underwood <wun...@wunderwood.org>
> >
> > > On  Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote:
> > >
> > > > You mean  it's  not possible to have 2 masters that are in nearly
> > real-time
> > >sync?
> > > > How  about with DRBD?  I know people use  DRBD to keep 2 Hadoop NNs
> > (their
> > >edit
> > >
> > > > logs) in  sync to avoid the current NN SPOF, for example, so I'm
> > thinking
> > >this
> > >
> > > > could be doable with Solr masters, too, no?
> > >
> > > If you add fault-tolerant, you run into the CAP  Theorem.  Consistency,
> >
> > >availability, partition: choose two. You cannot have  it  all.
> >
> > Right, so I'll take Consistency and Availability, and I'll  put my 2
> > masters in
> > the same rack (which has redundant switches, power  supply, etc.) and
> > thus
> > minimize/avoid partitioning.
> > Assuming the above  actually works, I think my Q remains:
> >
> > How do you set up 2 Solr masters so  they are in near real-time sync?
> > DRBD?
> >
> > But here is maybe a simpler  scenario that more people may be
> > considering:
> >
> > Imagine 2 masters on 2  different servers in 1 rack, pointing to the same
> > index
> > on the shared  storage (SAN) that also happens to live in the same rack.
> > 2 Solr masters are  behind 1 LB VIP that indexer talks to.
> > The VIP is configured so that all  requests always get routed to the
> > primary
> > master (because only 1 master  can be modifying an index at a time),
> > except when
> > this primary is down,  in which case the requests are sent to the
> > secondary
> > master.
> >
> > So in  this case my Q is around automation of this, around Lucene index
> > locks,
> > around the need for manual intervention, and such.
> > Concretely, if you  have these 2 master instances, the primary master has
> > the
> > Lucene index  lock in the index dir.  When the secondary master needs to
> > take
> > over  (i.e., when it starts receiving documents via LB), it needs to be
> > able to
> > write to that same index.  But what if that lock is still around?   One
> > could use
> > the Native lock to make the lock disappear if the primary  master's JVM
> > exited
> > unexpectedly, and in that case everything *should*  work and be
> > completely
> > transparent, right?  That is, the secondary  will start getting new docs,
> > it will
> > use its IndexWriter to write to that  same shared index, which won't be
> > locked
> > for writes because the lock is  gone, and everyone will be happy.  Did I
> > miss
> > something important  here?
> >
> > Assuming the above is correct, what if the lock is *not* gone  because
> > the
> > primary master's JVM is actually not dead, although maybe  unresponsive,
> > so LB
> > thinks the primary master is dead.  Then the LB  will route indexing
> > requests to
> > the secondary master, which will attempt  to write to the index, but be
> > denied
> > because of the lock.  So a  human needs to jump in, remove the lock, and
> > manually
> > reindex failed docs  if the upstream component doesn't buffer docs that
> > failed to
> > get indexed  and doesn't retry indexing them automatically.  Is this
> > correct or
> > is there a way to avoid humans  here?
> >
> > Thanks,
> > Otis
> > ----
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> >
>



-- 
http://twitter.com/tjake

Re: True master-master fail-over without data gaps (choosing CA in CAP)

Reply via email to