Hi, ---- Original Message ----
> From: Robert Petersen <rober...@buy.com> > > Can't you skip the SAN and keep the indexes locally? Then you would > have two redundant copies of the index and no lock issues. I could, but then I'd have the issue of keeping them in sync, which seems more fragile. I think SAN makes things simpler overall. > Also, Can't master02 just be a slave to master01 (in the master farm and > separate from the slave farm) until such time as master01 fails? Then No, because it wouldn't be in sync. It would always be N minutes behind, and when the primary master fails, the secondary would not have all the docs - data loss. > master02 would start receiving the new documents with an indexes > complete up to the last replication at least and the other slaves would > be directed by LB to poll master02 also... Yeah, "complete up to the last replication" is the problem. It's a data gap that now needs to be filled somehow. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ > -----Original Message----- > From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] > Sent: Wednesday, March 09, 2011 9:47 AM > To: solr-user@lucene.apache.org > Subject: Re: True master-master fail-over without data gaps (choosing CA > in CAP) > > Hi, > > > ----- Original Message ---- > > From: Walter Underwood <wun...@wunderwood.org> > > > On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote: > > > > > You mean it's not possible to have 2 masters that are in nearly > real-time > >sync? > > > How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs > (their > >edit > > > > > logs) in sync to avoid the current NN SPOF, for example, so I'm > thinking > >this > > > > > could be doable with Solr masters, too, no? > > > > If you add fault-tolerant, you run into the CAP Theorem. Consistency, > > >availability, partition: choose two. You cannot have it all. > > Right, so I'll take Consistency and Availability, and I'll put my 2 > masters in > the same rack (which has redundant switches, power supply, etc.) and > thus > minimize/avoid partitioning. > Assuming the above actually works, I think my Q remains: > > How do you set up 2 Solr masters so they are in near real-time sync? > DRBD? > > But here is maybe a simpler scenario that more people may be > considering: > > Imagine 2 masters on 2 different servers in 1 rack, pointing to the same > index > on the shared storage (SAN) that also happens to live in the same rack. > 2 Solr masters are behind 1 LB VIP that indexer talks to. > The VIP is configured so that all requests always get routed to the > primary > master (because only 1 master can be modifying an index at a time), > except when > this primary is down, in which case the requests are sent to the > secondary > master. > > So in this case my Q is around automation of this, around Lucene index > locks, > around the need for manual intervention, and such. > Concretely, if you have these 2 master instances, the primary master has > the > Lucene index lock in the index dir. When the secondary master needs to > take > over (i.e., when it starts receiving documents via LB), it needs to be > able to > write to that same index. But what if that lock is still around? One > could use > the Native lock to make the lock disappear if the primary master's JVM > exited > unexpectedly, and in that case everything *should* work and be > completely > transparent, right? That is, the secondary will start getting new docs, > it will > use its IndexWriter to write to that same shared index, which won't be > locked > for writes because the lock is gone, and everyone will be happy. Did I > miss > something important here? > > Assuming the above is correct, what if the lock is *not* gone because > the > primary master's JVM is actually not dead, although maybe unresponsive, > so LB > thinks the primary master is dead. Then the LB will route indexing > requests to > the secondary master, which will attempt to write to the index, but be > denied > because of the lock. So a human needs to jump in, remove the lock, and > manually > reindex failed docs if the upstream component doesn't buffer docs that > failed to > get indexed and doesn't retry indexing them automatically. Is this > correct or > is there a way to avoid humans here? > > Thanks, > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ >