SAN vendors make high-priced super-fast shared file system hardware. They don't use NFS, usually they have a kernel drop-in file system.
On 4/14/11, Parker Johnson <parker_john...@gap.com> wrote: > > Otis and Erick, > > Thanks for the responses and for thinking over my potential scenarios. > > The big draw for me on 2 repeaters idea is that I can: > > 1. Maximize my hardware. I don't need a standby master. Instead, I can > use the "second" repeater to field customer requests. > 2. After the primary repeater failure, I neither need to fumble with > multiple solconfig.xml edits (we're also using cores) or worry about > manually replicating or copying indexes around. > > In a sense, although, perhaps not by design, a repeater solves those > problems. > > We considered centralized storage and a standby master with access to > shared filesystem, but what are you using for a shared filesystem? (NFS? > Egh...) > > -Parker > > On 4/12/11 6:19 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: > >>I think the repeaters are misleading you a bit here. The purpose of a >>repeater is >>usually to replicate across a slow network, say in a remote data >>center, then slaves at that center can get more timely updates. I don't >>think >>they add anything to your disaster recovery scenario. >> >>So I'll ignore repeaters for a bit here. The only difference between a >>master >>and a slave is a bit of configuration, and usually you'll allocate, say, >>memory >>differently on the two machines when you start the JVM. You might disable >>caches on the master (since they're used for searching). You may...... >> >>Let's say >>I have master M, and slaves S1, S2, S3. The slaves have an >>up-to-date index as of the last replication (just like your repeater >>would have). If any slave goes down, you can simply bring up another >>machine as a slave, point it at your master, wait for replication on that >>slave and then let your load balancer know it's there. This is the >>HOST2-4 failure you outlined.... >> >>Should the master fail you have two choices, >>depending upon how long you can wait for *new* content to be searchable. >>Let's say you can wait half a day in this situation. Spin up a new >>machine, >>copy the index over from one of the slaves (via a simple copy or by >>replicating). Point your indexing process at the master, point your slaves >>at the master for replication and you're done. >> >>Let's say you can't wait very long at all (and remember this better be >>quite >>a rare >>event). Then you could take a slave (let's say S1) it out of the loop that >>serves >>searches. Copy in the configuration files you use for your >>masters to it, point the indexer and searchers at it and you're done. >>Now spin up a new slave as above and your old configuration is back. >> >>Note that in two of these cases, you temporarily have 2 slaves doing the >>work >>that 3 used to, so a bit of over-capacity may be in order. >> >>But a really good question here is how to be sure all your data is in your >>index. >>After all, the slaves (and repeater for that matter) are only current up >>to >>the last >>replication. The simplest thing to do is simply re-index everything from >>the >>last >>known commit point. Assuming you have a <uniqueKey> defined, if you index >>documents that are already in the index, they'll just be replaced, no harm >>done. >>So let's say your replication interval is 10 minutes (picking a number >>from >>thin >>air). When your system is back and you restart your indexer, restart >>indexing from, >>say, the time you noticed your master went down - 1 hour as the restart >>point for >>your indexer. You can be more deterministic than this by examining the log >>on >>the machine you're using to replace the master with and noting the last >>replication >>time and subtract your hour (or whatever) from that. >> >>Anyway, hope I haven't confused you unduly! The take-away is that a that >>a >>slave can be made into a master as fast as a repeater can, the replication >>process is the same and I just don't see what a repeater buys you in the >>scenario you described. >> >>Best >>Erick >> >> >>On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson >><parker_john...@gap.com>wrote: >> >>> >>> >>> I am hoping to get some feedback on the architecture I've been planning >>> for a medium to high volume site. This is my first time working >>> with Solr, so I want to be sure what I'm planning isn't totally weird, >>> unsupported, etc. >>> >>> We've got a a pair of F5 loadbalancers and 4 hosts. 2 of those hosts >>>will >>> be repeaters (master+slave), and 2 of those hosts will be pure slaves. >>>One >>> of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2 >>> will be "downed" and not taking traffic from that vip. The second vip, >>> "Search-vip" will have 3 members: HOST2, HOST3, and HOST4. The >>> "Index-vip" is intended to be used to post and commit index changes. >>>The >>> "Search-vip" is intended to be customer facing. >>> >>> Here is some ASCII art. The line with the "X"'s thru it denotes a >>> "downed" member of a vip, one that isn't taking any traffic. The "M:" >>> denotes the value in the solrconfig.xml that the host uses as the >>>master. >>> >>> >>> Index-vip Search-vip >>> / \ / | \ >>> / X / | \ >>> / \ / | \ >>> / X / | \ >>> / \ / | \ >>> / X / | \ >>> / \ / | \ >>> HOST1 HOST2 HOST3 HOST4 >>> REPEATER REPEATER SLAVE SLAVE >>> M:Index-vip M:Index-vip M:Index-vip M:Index-vip >>> >>> >>> I've been working through a couple failure scenarios. Recovering from a >>> failure of HOST2, HOST3, or HOST4 is pretty straightforward. Loosing >>> HOST1 is my major concern. My plan for recovering from a failure of >>>HOST1 >>> is as follows: Enable HOST2 as a member of the Index-vip, while >>>disabling >>> member HOST1. HOST2 effectively becomes the Master. HOST2, 3, and 4 >>> continue fielding customer requests and pulling indexes from >>>"Index-vip." >>> Since HOST2 is now in charge of crunching indexes and fielding customer >>> requests, I assume load will increase on that box. >>> >>> When we recover HOST1, we will simply make sure it has replicated >>>against >>> "Index-vip" and then re-enable HOST1 as a member of the Index-vip and >>> disable HOST2. >>> >>> Hopefully this makes sense. If all goes correctly, I've managed to keep >>> all services up and running without loosing any index data. >>> >>> So, I have a few questions: >>> >>> 1. Has anyone else tried this dual repeater approach? >>> 2. Am I going to have any semaphore/blocking issues if a repeater is >>> pulling index data from itself? >>> 3. Is there a better way to do this? >>> >>> >>> Thanks, >>> Parker >>> >>> >>> >>> >>> >>> >>> > > > -- Lance Norskog goks...@gmail.com