Just like that cat in a box. The observer needs to open the box to learn if the cat is alive. :-)
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, 1 February 2019 22:25, Ivan Kudryavtsev <kudryavtsev...@bw-sw.com> wrote: > Yes, only after the VM shutdown, the image is corrupted. > > пт, 1 февр. 2019 г., 15:01 Sean Lair sl...@ippathways.com: > >> Hello, >> >> We are using NFS storage. It is actually native NFS mounts on a NetApp >> storage system. We haven't seen those log entries, but we also don't always >> know when a VM gets corrupted... When we finally get a call that a VM is >> having issues, we've found that it was corrupted a while ago. >> >> -----Original Message----- >> From: cloudstack-fan [mailto:cloudstack-...@protonmail.com.INVALID] >> Sent: Sunday, January 27, 2019 1:45 PM >> To: users@cloudstack.apache.org >> Cc: d...@cloudstack.apache.org >> Subject: Re: Snapshots on KVM corrupting disk images >> >> Hello Sean, >> >> It seems that you've encountered the same issue that I've been facing during >> the last 5-6 years of using ACS with KVM hosts (see this thread, if you're >> interested in additional details: >> https://mail-archives.apache.org/mod_mbox/cloudstack-users/201807.mbox/browser). >> >> I'd like to state that creating snapshots of a running virtual machine is a >> bit risky. I've implemented some workarounds in my environment, but I'm >> still not sure that they are 100% effective. >> >> I have a couple of questions, if you don't mind. What kind of storage do you >> use, if it's not a secret? Does you storage use XFS as a filesystem? Did you >> see something like this in your log-files? >> [***.***] XFS: qemu-kvm(***) possible memory allocation deadlock size 65552 >> in kmem_realloc (mode:0x250) [***.***] XFS: qemu-kvm(***) possible memory >> allocation deadlock size 65552 in kmem_realloc (mode:0x250) [***.***] XFS: >> qemu-kvm(***) possible memory allocation deadlock size 65552 in kmem_realloc >> (mode:0x250) Did you see any unusual messages in your log-file when the >> disaster happened? >> >> I hope, things will be well. Wish you good luck and all the best! >> >> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ >> On Tuesday, 22 January 2019 18:30, Sean Lair <sl...@ippathways.com> wrote: >> >>> Hi all, >>> >>> We had some instances where VM disks are becoming corrupted when using KVM >>> snapshots. We are running CloudStack 4.9.3 with KVM on CentOS 7. >>> >>> The first time was when someone mass-enabled scheduled snapshots on a lot >>> of large number VMs and secondary storage filled up. We had to restore all >>> those VM disks... But believed it was just our fault with letting secondary >>> storage fill up. >>> >>> Today we had an instance where a snapshot failed and now the disk image is >>> corrupted and the VM can't boot. here is the output of some commands: >>> >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ------------------------------------------------ >>> >>> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check >>> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': >>> Could not read snapshots: File too large >>> >>> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info >>> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': >>> Could not read snapshots: File too large >>> >>> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh >>> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> -rw-r--r--. 1 root root 73G Jan 22 11:04 >>> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ----------------------------------------------------------- >>> >>> We tried restoring to before the snapshot failure, but still have strange >>> errors: >>> >>> ---------------------------------------------------------------------- >>> -------------- >>> >>> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh >>> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> -rw-r--r--. 1 root root 73G Jan 22 11:04 >>> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> >>> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info >>> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> file format: qcow2 >>> virtual size: 50G (53687091200 bytes) >>> disk size: 73G >>> cluster_size: 65536 >>> Snapshot list: >>> ID TAG VM SIZE DATE VM CLOCK >>> 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 >>> 3099:35:55.242 >>> 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 >>> 3431:52:23.942 Format specific information: >>> compat: 1.1 >>> lazy refcounts: false >>> >>> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check >>> ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> tcmalloc: large alloc 1539750010880 bytes == (nil) @ 0x7fb9cbbf7bf3 >>> 0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc 0x55d16ddf2541 >>> 0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db 0x55d16de373e6 0x7fb9c63a3c05 >>> 0x55d16ddd9f7d No errors were found on the image. >>> >>> [root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img >>> snapshot -l ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80 >>> Snapshot list: >>> ID TAG VM SIZE DATE VM CLOCK >>> 1 a8fdf99f-8219-4032-a9c8-87a6e09e7f95 3.7G 2018-12-23 11:01:43 >>> 3099:35:55.242 >>> 2 b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd 3.8G 2019-01-06 11:03:16 >>> 3431:52:23.942 >>> >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> ---------------------------------------------------------------------- >>> --------------------------------------------------------------- >>> >>> Everyone is now extremely hesitant to use snapshots in KVM.... We tried >>> deleting the snapshots in the restored disk image, but it errors out... >>> >>> Does anyone else have issues with KVM snapshots? We are considering just >>> disabling this functionality now... >>> >>> Thanks >>> Sean