Hi all,

We had some instances where VM disks are becoming corrupted when using KVM 
snapshots.  We are running CloudStack 4.9.3 with KVM on CentOS 7.

The first time was when someone mass-enabled scheduled snapshots on a lot of 
large number VMs and secondary storage filled up.  We had to restore all those 
VM disks...  But believed it was just our fault with letting secondary storage 
fill up.

Today we had an instance where a snapshot failed and now the disk image is 
corrupted and the VM can't boot.  here is the output of some commands:

-----------------------
[root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check 
./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': Could not 
read snapshots: File too large

[root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info 
./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
qemu-img: Could not open './184aa458-9d4b-4c1b-a3c6-23d28ea28e80': Could not 
read snapshots: File too large

[root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh 
./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
-rw-r--r--. 1 root root 73G Jan 22 11:04 ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
-----------------------

We tried restoring to before the snapshot failure, but still have strange 
errors:

----------------------
[root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# ls -lh 
./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
-rw-r--r--. 1 root root 73G Jan 22 11:04 ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80

[root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img info 
./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
image: ./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
file format: qcow2
virtual size: 50G (53687091200 bytes)
disk size: 73G
cluster_size: 65536
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         a8fdf99f-8219-4032-a9c8-87a6e09e7f95   3.7G 2018-12-23 11:01:43 
3099:35:55.242
2         b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd   3.8G 2019-01-06 11:03:16 
3431:52:23.942
Format specific information:
    compat: 1.1
    lazy refcounts: false

[root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img check 
./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
tcmalloc: large alloc 1539750010880 bytes == (nil) @  0x7fb9cbbf7bf3 
0x7fb9cbc19488 0x7fb9cb71dc56 0x55d16ddf1c77 0x55d16ddf1edc 0x55d16ddf2541 
0x55d16ddf465e 0x55d16ddf8ad1 0x55d16de336db 0x55d16de373e6 0x7fb9c63a3c05 
0x55d16ddd9f7d
No errors were found on the image.

[root@cloudkvm02 c3be0ae5-2248-3ed6-a0c7-acffe25cc8d3]# qemu-img snapshot -l 
./184aa458-9d4b-4c1b-a3c6-23d28ea28e80
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         a8fdf99f-8219-4032-a9c8-87a6e09e7f95   3.7G 2018-12-23 11:01:43 
3099:35:55.242
2         b4d74338-b0e3-4eeb-8bf8-41f6f75d9abd   3.8G 2019-01-06 11:03:16 
3431:52:23.942
--------------------------

Everyone is now extremely hesitant to use snapshots in KVM....  We tried 
deleting the snapshots in the restored disk image, but it errors out...


Does anyone else have issues with KVM snapshots?  We are considering just 
disabling this functionality now...

Thanks
Sean






Reply via email to