Re: [ovirt-users] One RHEV Virtual Machine does not Automatically Resume following Compellent SAN Controller Failover

Yaniv Dary Mon, 30 May 2016 01:40:24 -0700

Can you reply on my question?

Yaniv Dary
Technical Product Manager
Red Hat Israel Ltd.
34 Jerusalem Road
Building A, 4th floor
Ra'anana, Israel 4350109


Tel : +972 (9) 7692306
        8272306
Email: yd...@redhat.com
IRC : ydary


On Thu, May 26, 2016 at 9:14 AM, Yaniv Dary <yd...@redhat.com> wrote:

> What DR solution are you using?
>
> Yaniv Dary
> Technical Product Manager
> Red Hat Israel Ltd.
> 34 Jerusalem Road
> Building A, 4th floor
> Ra'anana, Israel 4350109
>
> Tel : +972 (9) 7692306
>         8272306
> Email: yd...@redhat.com
> IRC : ydary
>
>
> On Wed, Nov 25, 2015 at 1:15 PM, Simone Tiraboschi <stira...@redhat.com>
> wrote:
>
>> Adding Nir who knows it far better than me.
>>
>>
>> On Mon, Nov 23, 2015 at 8:37 PM, Duckworth, Douglas C <du...@tulane.edu>
>> wrote:
>>
>>> Hello --
>>>
>>> Not sure if y'all can help with this issue we've been seeing with RHEV...
>>>
>>> On 11/13/2015, during Code Upgrade of Compellent SAN at our Disaster
>>> Recovery Site, we Failed Over to Secondary SAN Controller.  Most Virtual
>>> Machines in our DR Cluster Resumed automatically after Pausing except VM
>>> "BADVM" on Host "BADHOST."
>>>
>>> In Engine.log you can see that BADVM was sent into "VM_PAUSED_EIO" state
>>> at 10:47:57:
>>>
>>> "VM BADVM has paused due to storage I/O problem."
>>>
>>> On this Red Hat Enterprise Virtualization Hypervisor 6.6
>>> (20150512.0.el6ev) Host, two other VMs paused but then automatically
>>> resumed without System Administrator intervention...
>>>
>>> In our DR Cluster, 22 VMs also resumed automatically...
>>>
>>> None of these Guest VMs are engaged in high I/O as these are DR site VMs
>>> not currently doing anything.
>>>
>>> We sent this information to Dell.  Their response:
>>>
>>> "The root cause may reside within your virtualization solution, not the
>>> parent OS (RHEV-Hypervisor disc) or Storage (Dell Compellent.)"
>>>
>>> We are doing this Failover again on Sunday November 29th so we would
>>> like to know how to mitigate this issue, given we have to manually
>>> resume paused VMs that don't resume automatically.
>>>
>>> Before we initiated SAN Controller Failover, all iSCSI paths to Targets
>>> were present on Host tulhv2p03.
>>>
>>> VM logs on Host show in /var/log/libvirt/qemu/badhost.log that Storage
>>> error was reported:
>>>
>>> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>>> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>>> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>>> block I/O error in device 'drive-virtio-disk0': Input/output error (5)
>>>
>>> All disks used by this Guest VM are provided by single Storage Domain
>>> COM_3TB4_DR with serial "270."  In syslog we do see that all paths for
>>> that Storage Domain Failed:
>>>
>>> Nov 13 16:47:40 multipathd: 36000d310005caf000000000000000270: remaining
>>> active paths: 0
>>>
>>> Though these recovered later:
>>>
>>> Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: sdbg -
>>> tur checker reports path is up
>>> Nov 13 16:59:17 multipathd: 36000d310005caf000000000000000270: remaining
>>> active paths: 8
>>>
>>> Does anyone have an idea of why the VM would fail to automatically
>>> resume if the iSCSI paths used by its Storage Domain recovered?
>>>
>>> Thanks
>>> Doug
>>>
>>> --
>>> Thanks
>>>
>>> Douglas Charles Duckworth
>>> Unix Administrator
>>> Tulane University
>>> Technology Services
>>> 1555 Poydras Ave
>>> NOLA -- 70112
>>>
>>> E: du...@tulane.edu
>>> O: 504-988-9341
>>> F: 504-988-8505
>>> _______________________________________________
>>> Users mailing list
>>> Users@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/users
>>
>>
>

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] One RHEV Virtual Machine does not Automatically Resume following Compellent SAN Controller Failover

Reply via email to