Hey Matthew, I think it's VDSM that handles the pausing & resuming of the VMs.
An analogous small-scale scenario...the Gluster layer for one of our smaller oVirt clusters temporarily lost quorum the other week, locking all I/O for about 30 minutes. The VMs all went into pause & then resumed automatically when quorum was restored. To my surprise/relief, not a single one of the 10 odd VMs reported any errors. YMMV Doug On 6 June 2017 at 13:45, Matthew Trent <matthew.tr...@lewiscountywa.gov> wrote: > Thanks for the replies, all! > > Yep, Chris is right. TrueNAS HA is active/passive and there isn't a way > around that when failing between heads. > > Sven: In my experience with iX support, they have directed me to reboot > the active node to initiate failover. There's "hactl takeover" and "hactl > giveback" commends, but reboot seems to be their preferred method. > > VMs going into a paused state and resuming when storage is back online > sounds great. As long as oVirt's pause/resume isn't significantly slower > than the 30-or-so seconds the TrueNAS takes to complete its failover, > that's a pretty tolerable interruption for my needs. So my next questions > are: > > 1) Assuming the SAN failover DOES work correctly, can anyone comment on > their experience with oVirt pausing/thawing VMs in an NFS-based > active/passive SAN failover scenario? Does it work reliably without > intervention? Is it reasonably fast? > > 2) Is there anything else in the oVirt stack that might cause it to "freak > out" rather than gracefully pause/unpause VMs? > > 2a) Particularly: I'm running hosted engine on the same TrueNAS storage. > Does that change anything WRT to timeouts and oVirt's HA and fencing and > sanlock and such? > > 2b) Is there a limit to how long oVirt will wait for storage before doing > something more drastic than just pausing VMs? > > -- > Matthew Trent > Network Engineer > Lewis County IT Services > 360.740.1247 - Helpdesk > 360.740.3343 - Direct line > > ________________________________________ > From: users-boun...@ovirt.org <users-boun...@ovirt.org> on behalf of > Chris Adams <c...@cmadams.net> > Sent: Tuesday, June 6, 2017 7:21 AM > To: users@ovirt.org > Subject: Re: [ovirt-users] Seamless SAN HA failovers with oVirt? > > Once upon a time, Juan Pablo <pablo.localh...@gmail.com> said: > > Chris, if you have active-active with multipath: you upgrade one system, > > reboot it, check it came active again, then upgrade the other. > > Yes, but that's still not how a TrueNAS (and most other low- to > mid-range SANs) works, so is not relevant. The TrueNAS only has a > single active node talking to the hard drives at a time, because having > two nodes talking to the same storage at the same time is a hard problem > to solve (typically requires custom hardware with active cache coherency > and such). > > You can (and should) use multipath between servers and a TrueNAS, and > that protects against NIC, cable, and switch failures, but does not help > with a controller failure/reboot/upgrade. Multipath is also used to > provide better bandwidth sharing between links than ethernet LAGs. > > -- > Chris Adams <c...@cmadams.net> > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > -- Doug
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users