We see exactly the same, and it does not seem to be Vendor dependend. - Equallogic Controller Failover -> VM get paused and maybe unpaused but most dont - Nexenta ZFS iSCSI with RSF1 HA -> same - FreeBSD ctld iscsi-target + Heartbeat -> same - CentOS + iscsi-target + Heartbeat -> same
Multipath Settings are, where available, modified to match the best practice supplied by the Vendor. On Open Source Solutions we started with known working multipath/iscsi Settings, and meanwhile nearly every possible setting has been tested. Without much success. To me it looks like Ovirt/Rhev is way to sensitive to iSCSI Interruptions, and it feels like gambling what the engine might do to your VM (or not). Am 11/23/2015 um 8:37 PM schrieb Duckworth, Douglas C: > Hello -- > > Not sure if y'all can help with this issue we've been seeing with RHEV... > > On 11/13/2015, during Code Upgrade of Compellent SAN at our Disaster > Recovery Site, we Failed Over to Secondary SAN Controller. Most Virtual > Machines in our DR Cluster Resumed automatically after Pausing except VM > "BADVM" on Host "BADHOST." > > In Engine.log you can see that BADVM was sent into "VM_PAUSED_EIO" state > at 10:47:57: > > "VM BADVM has paused due to storage I/O problem." > > On this Red Hat Enterprise Virtualization Hypervisor 6.6 > (20150512.0.el6ev) Host, two other VMs paused but then automatically > resumed without System Administrator intervention... > > In our DR Cluster, 22 VMs also resumed automatically... > > None of these Guest VMs are engaged in high I/O as these are DR site VMs > not currently doing anything. > > We sent this information to Dell. Their response: > > "The root cause may reside within your virtualization solution, not the > parent OS (RHEV-Hypervisor disc) or Storage (Dell Compellent.)" > > We are doing this Failover again on Sunday November 29th so we would > like to know how to mitigate this issue, given we have to manually > resume paused VMs that don't resume automatically. > > Before we initiated SAN Controller Failover, all iSCSI paths to Targets > were present on Host tulhv2p03. > > VM logs on Host show in /var/log/libvirt/qemu/badhost.log that Storage > error was reported: > > block I/O error in device 'drive-virtio-disk0': Input/output error (5) > block I/O error in device 'drive-virtio-disk0': Input/output error (5) > block I/O error in device 'drive-virtio-disk0': Input/output error (5) > block I/O error in device 'drive-virtio-disk0': Input/output error (5) > > All disks used by this Guest VM are provided by single Storage Domain > COM_3TB4_DR with serial "270." In syslog we do see that all paths for > that Storage Domain Failed: > > Nov 13 16:47:40 multipathd: 36000d310005caf000000000000000270: remaining > active paths: 0 > > Though these recovered later: > > Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: sdbg - > tur checker reports path is up > Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: remaining > active paths: 8 > > Does anyone have an idea of why the VM would fail to automatically > resume if the iSCSI paths used by its Storage Domain recovered? > > Thanks > Doug > _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users