Re: [Xen-API] xcp SR and drbd

Tim Wed, 05 Dec 2012 15:14:30 -0800

Hi Denis,

I'm wondering if it could be possible to extend lvm SR to integrate aDRBD primary/secondary redundancy for a two nodes installation.
Let me explains the context and the a possible solution.
we've been using drbd for some years on iscsi SAN with Xen/XCP, having2 redundant iscsi SAN and 2 Xen/XCP nodes.
This kind of setup works great, however for smaller setups it is kindof overkill.

We used to do exactly the same, using two switches and multi-path todeal with the switch redundancy. Feeling it was overkill, we switched toa similar setup to you..

So we started to integrate drbd directly on XCP nodes (there are somedocs at linbits about it). However being a little bit paranoid aboutsplit brain scenario, we have always been using primary/secondarysetups (SR1 primary on XCP1 and SR2 primary on XCP2).

We are using a similar setup, but with dual primary. The two servers areconnected directly via a cable so there is little possibility of the twobeing disconnected. We have only run into a problem with split-brainonce where we were doing something rather silly - using vdi-copy tomigrate virtual disks to a local software raid5 array while running VMson the same raid5 array. In short the whole machine stopped respondingto the network for minutes at a time while the copy took place.Fortunately we were able to recover pretty easily - lesson learned: stayaway from software raid5 on the hypervisors.

This kind of setup has a big drawback : VMs with VDI on SR1 have torun on XCP1 and VMs with VDI on SR2 have to run on XCP2. There is aloss of flexibility and a loss of transparancy for XCP admins.

Generally, we have not had a problem with dual primary - we uselive-migrate and run VMs on either node. When the hypervisors boot, wemake sure everything is connected, switch both to primary and plug inthe PBDs.

Our plan for recovering from split-brain it to pick the host with themost changed VDIs (the most data) as the new primary. Use DRBD with anexternal meta-disk to replicate the changed VDIs to the new primary.Invalidate the data on the junked host and run a full re-sync. Not fun,but recoverable as long as you know which VMs have been running on whichhost and you keep an eye out for split-brain (there are hooks you canuse to notify you when things happen to drbd).

So I'd want to extend the lvm SR to integrate DRBD primary secondary,and I'd like to have some input on this kind of scenario :
* one each lvm, create a drbd resource, and when a vbd is brough upthe drbd resource is switch to primary.* when migrating a vm to the second node, turn drbd on first nodesecondary, turn drbd on the second node primary, and get on withresuming the VM.* when a VMs is brought down, pbd is brought down and drbd resource isswitch to secondary.

My understanding is the following: An LVM SR is effectively an LVMVolume group attached via a PBD. If you unplug the PBD when shuttingdown the VM, you are taking the whole SR offline. If you had one PBD perVM, you would need one SR per VDI therefore one VG per VDI whichwouldn't work... You wouldn't be able to do snapshots, resize VDIsetc.......

What you might do (which is what I think you're angling at) is thefollowing:

Both hosts handle their own LVM VG and corresponding LVs. Every time youcreate a VDI, each host would create the LV locally, set up a DRBDresource and start syncing. Both VDIs could sit in "Secondary" modewhile the VM was down, and would only switch to primary while the VM wasrunning on that particular host.

Provisions would have to be made for making sure the DRBD/LVM config waskept concurrent across the two hosts. Most of the DRBD config (syncrates, protcol, passwords, data integrity algorithm etc.....) could bestored in the SR config, but one would need to be able to re-build awhole SR should a disk on one of the hosts fail.

How would you deal with split-brain? If XCP2 appeared down, would XCP1put a VDI into primary without knowing the state of the VDI on XCP2?


Please feel to correct me should my understanding be at fault in any way.

That would make a lot of drbd resource when accounting for snapshotand all, but if it could be possible to be done, it would be atremendous addition for smaller setups for SMBs.

So far, we have only thought about a two node setup, but I suppose youcould go further. You would need a system of keeping track of which VDIwas mirrored on which which host, automatic re-distribution of VDIs toanother host in the pool should a another fail (or be removed from thepool), each VM would be limited to two hosts in the pool (so you may runinto problems with multiple host failures), each host would have to havea large amount of local storage. There are some limitations, but itmight be workable......


My two cents!

I'd be glad to have some input from the dev if possible. By the way,kudos to the devs for XCP 1.6, it really rocks.

Good to hear - thanks to the devs from this corner too. I'm lookingforward to playing with the new features soon.


Regards,

Tim


_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

Re: [Xen-API] xcp SR and drbd

Reply via email to