I asked on the Dell Storage Forum and they recommend the following: *I recommend not using a numeric value for the "no_path_retry" variable within /etc/multipath.conf as once that numeric value is reached, if no healthy LUNs were discovered during that defined time multipath will disable the I/O queue altogether.*
*I do recommend, however, changing the variable value from "12" (or even "60") to "queue" which will then allow multipathd to continue queing I/O until a healthy LUN is discovered (time of fail-over between controllers) and I/O is allowed to flow once again.* Can you see any issues with this recommendation as far as Ovirt is concerned ? Thanks again *Gary Lloyd* ________________________________________________ I.T. Systems:Keele University Finance & IT Directorate Keele:Staffs:IC1 Building:ST5 5NB:UK +44 1782 733063 <%2B44%201782%20733073> ________________________________________________ On 4 October 2016 at 19:11, Nir Soffer <nsof...@redhat.com> wrote: > On Tue, Oct 4, 2016 at 10:51 AM, Gary Lloyd <g.ll...@keele.ac.uk> wrote: > >> Hi >> >> We have Ovirt 3.65 with a Dell Equallogic SAN and we use Direct Luns for >> all our VMs. >> At the weekend during early hours an Equallogic controller failed over to >> its standby on one of our arrays and this caused about 20 of our VMs to be >> paused due to IO problems. >> >> I have also noticed that this happens during Equallogic firmware upgrades >> since we moved onto Ovirt 3.65. >> >> As recommended by Dell disk timeouts within the VMs are set to 60 seconds >> when they are hosted on an EqualLogic SAN. >> >> Is there any other timeout value that we can configure in vdsm.conf to >> stop VMs from getting paused when a controller fails over ? >> > > You can set the timeout in multipath.conf. > > With current multipath configuration (deployed by vdsm), when all paths to > a device > are lost (e.g. you take down all ports on the server during upgrade), all > io will fail > immediately. > > If you want to allow 60 seconds gracetime in such case, you can configure: > > no_path_retry 12 > > This will continue to monitor the paths 12 times, each 5 seconds > (assuming polling_interval=5). If some path recover during this time, the > io > can complete and the vm will not be paused. > > If no path is available after these retries, io will fail and vms with > pending io > will pause. > > Note that this will also cause delays in vdsm in various flows, increasing > the chance > of timeouts in engine side, or delays in storage domain monitoring. > > However, the 60 seconds delay is expected only on the first time all paths > become > faulty. Once the timeout has expired, any access to the device will fail > immediately. > > To configure this, you must add the # VDSM PRIVATE tag at the second line > of > multipath.conf, otherwise vdsm will override your configuration in the > next time > you run vdsm-tool configure. > > multipath.conf should look like this: > > # VDSM REVISION 1.3 > # VDSM PRIVATE > > defaults { > polling_interval 5 > no_path_retry 12 > user_friendly_names no > flush_on_last_del yes > fast_io_fail_tmo 5 > dev_loss_tmo 30 > max_fds 4096 > } > > devices { > device { > all_devs yes > no_path_retry 12 > } > } > > This will use 12 retries (60 seconds) timeout for any device. If you like > to > configure only your specific device, you can add a device section for > your specific server instead. > > >> >> Also is there anything that we can tweak to automatically unpause the VMs >> once connectivity with the arrays is re-established ? >> > > Vdsm will resume the vms when storage monitor detect that storage became > available again. > However we cannot guarantee that storage monitoring will detect that > storage was down. > This should be improved in 4.0. > > >> At the moment we are running a customized version of storageServer.py, as >> Ovirt has yet to include iscsi multipath support for Direct Luns out of the >> box. >> > > Would you like to share this code? > > Nir >
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users