----- Original Message ----- > From: "Giuseppe Ragusa" <giuseppe.rag...@hotmail.com> > To: fsimo...@redhat.com > Cc: users@ovirt.org > Sent: Wednesday, May 21, 2014 5:15:30 PM > Subject: sanlock + gluster recovery -- RFE > > Hi, > > > ----- Original Message ----- > > > From: "Ted Miller" <tmiller at hcjb.org> > > > To: "users" <users at ovirt.org> > > > Sent: Tuesday, May 20, 2014 11:31:42 PM > > > Subject: [ovirt-users] sanlock + gluster recovery -- RFE > > > > > > As you are aware, there is an ongoing split-brain problem with running > > > sanlock on replicated gluster storage. Personally, I believe that this is > > > the 5th time that I have been bitten by this sanlock+gluster problem. > > > > > > I believe that the following are true (if not, my entire request is > > > probably > > > off base). > > > > > > > > > * ovirt uses sanlock in such a way that when the sanlock storage is > > > on a > > > replicated gluster file system, very small storage disruptions can > > > result in a gluster split-brain on the sanlock space > > > > Although this is possible (at the moment) we are working hard to avoid it. > > The hardest part here is to ensure that the gluster volume is properly > > configured. > > > > The suggested configuration for a volume to be used with ovirt is: > > > > Volume Name: (...) > > Type: Replicate > > Volume ID: (...) > > Status: Started > > Number of Bricks: 1 x 3 = 3 > > Transport-type: tcp > > Bricks: > > (...three bricks...) > > Options Reconfigured: > > network.ping-timeout: 10 > > cluster.quorum-type: auto > > > > The two options ping-timeout and quorum-type are really important. > > > > You would also need a build where this bug is fixed in order to avoid any > > chance of a split-brain: > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1066996 > > It seems that the aforementioned bug is peculiar to 3-bricks setups. > > I understand that a 3-bricks setup can allow proper quorum formation without > resorting to "first-configured-brick-has-more-weight" convention used with > only 2 bricks and quorum "auto" (which makes one node "special", so not > properly any-single-fault tolerant).
Correct. > But, since we are on ovirt-users, is there a similar suggested configuration > for a 2-hosts setup oVirt+GlusterFS with oVirt-side power management > properly configured and tested-working? > I mean a configuration where "any" host can go south and oVirt (through the > other one) fences it (forcibly powering it off with confirmation from IPMI > or similar) then restarts HA-marked vms that were running there, all the > while keeping the underlying GlusterFS-based storage domains responsive and > readable/writeable (maybe apart from a lapse between detected other-node > unresposiveness and confirmed fencing)? We already had a discussion with gluster asking if it was possible to add fencing to the replica 2 quorum/consistency mechanism. The idea is that as soon as you can't replicate a write you have to freeze all IO until either the connection is re-established or you know that the other host has been killed. Adding Vijay. -- Federico _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users