I have an HCI cluster running on Gluster storage. I exposed an NFS share into 
oVirt as a storage domain so that I could clone all of my VMs (I'm preparing to 
move physically to a new datacenter). I got 3-4 VMs cloned perfectly fine 
yesterday. But then this evening, I tried to clone a big VM, and it caused the 
disk to lock up. The VM went totally unresponsive, and I didn't see a way to 
cancel the clone. Nagios NRPE (on the client VM) was reporting server load over 
65+, but I was never able to establish an SSH connection. 

Eventually, I tried restarting the ovirt-engine, per 
https://access.redhat.com/solutions/396753. When that didn't work, I powered 
down the VM completely. But the disks were still locked. So I then tried to put 
the storage domain into maintenance mode, but that wound up putting the entire 
domain into a "locked" state. Finally, eventually, the disks unlocked, and I 
was able to power the VM back online.

>From start to finish, my VM was down for about 45 minutes, including the time 
>when NRPE was still sending data to Nagios.

What logs should I look at, and how can I troubleshoot what went wrong here, 
and hopefully avoid this from happening again?

Sent with ProtonMail Secure Email.

Attachment: publickey - dmwhite823@protonmail.com - 0x320CD582.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ASEENELT4TRTXQ7MF4FKB6L75D3H75AN/

Reply via email to