Re: Vetting Our Architecture: 2 Repeaters and Slaves.

Lance Norskog Thu, 14 Apr 2011 17:54:36 -0700

SAN vendors make high-priced super-fast shared file system hardware.
They don't use NFS, usually they have a kernel drop-in file system.



On 4/14/11, Parker Johnson <parker_john...@gap.com> wrote:
>
> Otis and Erick,
>
> Thanks for the responses and for thinking over my potential scenarios.
>
> The big draw for me on 2 repeaters idea is that I can:
>
> 1. Maximize my hardware.  I don't need a standby master.  Instead, I can
> use the "second" repeater to field customer requests.
> 2. After the primary repeater failure, I neither need to fumble with
> multiple solconfig.xml edits (we're also using cores) or worry about
> manually replicating or copying indexes around.
>
> In a sense, although, perhaps not by design, a repeater solves those
> problems.
>
> We considered centralized storage and a standby master with access to
> shared filesystem, but what are you using for a shared filesystem? (NFS?
> Egh...)
>
> -Parker
>
> On 4/12/11 6:19 PM, "Erick Erickson" <erickerick...@gmail.com> wrote:
>
>>I think the repeaters are misleading you a bit here. The purpose of a
>>repeater is
>>usually to replicate across a slow network, say in a remote data
>>center, then slaves at that center can get more timely updates. I don't
>>think
>>they add anything to your disaster recovery scenario.
>>
>>So I'll ignore repeaters for a bit here. The only difference between a
>>master
>>and a slave is a bit of configuration, and usually you'll allocate, say,
>>memory
>>differently on the two machines when you start the JVM. You might disable
>>caches on the master (since they're used for searching). You may......
>>
>>Let's say
>>I have master M, and slaves S1, S2, S3. The slaves have an
>>up-to-date index as of the last replication (just like your repeater
>>would have). If any slave goes down, you can simply bring up another
>>machine as a slave, point it at your master, wait for replication on that
>>slave and then let your load balancer know it's there. This is the
>>HOST2-4 failure you outlined....
>>
>>Should the master fail you have two choices,
>>depending upon how long you can wait for *new* content to be searchable.
>>Let's say you can wait half a day in this situation. Spin up a new
>>machine,
>>copy the index over from one of the slaves (via a simple copy or by
>>replicating). Point your indexing process at the master, point your slaves
>>at the master for replication and you're done.
>>
>>Let's say you can't wait very long at all (and remember this better be
>>quite
>>a rare
>>event). Then you could take a slave (let's say S1) it out of the loop that
>>serves
>>searches. Copy in the configuration files you use for your
>>masters to it, point the indexer and searchers at it and you're done.
>>Now spin up a new slave as above and your old configuration is back.
>>
>>Note that in two of these cases, you temporarily have 2 slaves doing the
>>work
>>that 3 used to, so a bit of over-capacity may be in order.
>>
>>But a really good question here is how to be sure all your data is in your
>>index.
>>After all, the slaves (and repeater for that matter) are only current up
>>to
>>the last
>>replication. The simplest thing to do is simply re-index everything from
>>the
>>last
>>known commit point. Assuming you have a <uniqueKey> defined, if you index
>>documents that are already in the index, they'll just be replaced, no harm
>>done.
>>So let's say your replication interval is 10 minutes (picking a number
>>from
>>thin
>>air). When your system is back and you restart your indexer, restart
>>indexing from,
>>say, the time you noticed your master went down - 1 hour as the restart
>>point for
>>your indexer. You can be more deterministic than this by examining the log
>>on
>>the machine you're using to replace the master with and noting the last
>>replication
>>time and subtract your hour (or whatever) from that.
>>
>>Anyway, hope I haven't confused you unduly! The take-away is that a that
>>a
>>slave can be made into a master as fast as a repeater can, the replication
>>process is the same and I just don't see what a repeater buys you in the
>>scenario you described.
>>
>>Best
>>Erick
>>
>>
>>On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson
>><parker_john...@gap.com>wrote:
>>
>>>
>>>
>>> I am hoping to get some feedback on the architecture I've been planning
>>> for a medium to high volume site.  This is my first time working
>>> with Solr, so I want to be sure what I'm planning isn't totally weird,
>>> unsupported, etc.
>>>
>>> We've got a a pair of F5 loadbalancers and 4 hosts.  2 of those hosts
>>>will
>>> be repeaters (master+slave), and 2 of those hosts will be pure slaves.
>>>One
>>> of the F5 vips, "Index-vip" will have members HOST1 and HOST2, but HOST2
>>> will be "downed" and not taking traffic from that vip.  The second vip,
>>> "Search-vip" will have 3 members: HOST2, HOST3, and HOST4.  The
>>> "Index-vip" is intended to be used to post and commit index changes.
>>>The
>>> "Search-vip" is intended to be customer facing.
>>>
>>> Here is some ASCII art.  The line with the "X"'s thru it denotes a
>>> "downed" member of a vip, one that isn't taking any traffic.  The "M:"
>>> denotes the value in the solrconfig.xml that the host uses as the
>>>master.
>>>
>>>
>>>              Index-vip         Search-vip
>>>                 / \             /   |   \
>>>                /   X           /    |    \
>>>               /     \         /     |     \
>>>              /       X       /      |      \
>>>             /         \     /       |       \
>>>            /           X   /        |        \
>>>           /             \ /         |         \
>>>         HOST1          HOST2      HOST3      HOST4
>>>       REPEATER        REPEATER    SLAVE      SLAVE
>>>      M:Index-vip    M:Index-vip M:Index-vip  M:Index-vip
>>>
>>>
>>> I've been working through a couple failure scenarios.  Recovering from a
>>> failure of HOST2, HOST3, or HOST4 is pretty straightforward.  Loosing
>>> HOST1 is my major concern.  My plan for recovering from a failure of
>>>HOST1
>>> is as follows: Enable HOST2 as a member of the Index-vip, while
>>>disabling
>>> member HOST1.  HOST2 effectively becomes the Master.  HOST2, 3, and 4
>>> continue fielding customer requests and pulling indexes from
>>>"Index-vip."
>>> Since HOST2 is now in charge of crunching indexes and fielding customer
>>> requests, I assume load will increase on that box.
>>>
>>> When we recover HOST1, we will simply make sure it has replicated
>>>against
>>> "Index-vip" and then re-enable HOST1 as a member of the Index-vip and
>>> disable HOST2.
>>>
>>> Hopefully this makes sense.  If all goes correctly, I've managed to keep
>>> all services up and running without loosing any index data.
>>>
>>> So, I have a few questions:
>>>
>>> 1. Has anyone else tried this dual repeater approach?
>>> 2. Am I going to have any semaphore/blocking issues if a repeater is
>>> pulling index data from itself?
>>> 3. Is there a better way to do this?
>>>
>>>
>>> Thanks,
>>> Parker
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>
>


-- 
Lance Norskog
goks...@gmail.com

Re: Vetting Our Architecture: 2 Repeaters and Slaves.

Reply via email to