So, I did more digging and now I know how to reproduce it. I created a VM and added a disk on local ssd using scratchpad hook, formatted and mounted this scratchdisk. Now, when I try to do heavy IO on this scratchdisk on local ssd, like, dd if=/dev/zero of=/mnt/scratchdisk/test bs=1M count=10000, qemu pauses VM. Debug logs in libvirt shows
2021-09-23 11:04:32.765+0000: 463319: debug : virThreadJobSet:94 : Thread 463319 (rpc-worker) is now running job remoteDispatchNodeGetFreePages 2021-09-23 11:04:32.765+0000: 463319: debug : virNodeGetFreePages:1614 : conn=0x7f8620018ba0, npages=3, pages=0x7f8670009960, startCell=4294967295, cellCount=1, counts=0x7f8670007db0, flags=0x0 2021-09-23 11:04:32.765+0000: 463319: debug : virThreadJobClear:119 : Thread 463319 (rpc-worker) finished job remoteDispatchNodeGetFreePages with ret=0 2021-09-23 11:04:34.235+0000: 488774: debug : qemuMonitorJSONIOProcessLine:220 : Line [{"timestamp": {"seconds": 1632395074, "microseconds": 235454}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-3-format", "reason": "Input/output error", "operation": "write", "action": "stop"}}] 2021-09-23 11:04:34.235+0000: 488774: info : qemuMonitorJSONIOProcessLine:235 : QEMU_MONITOR_RECV_EVENT: mon=0x7f860c14b700 event={"timestamp": {"seconds": 1632395074, "microseconds": 235454}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-3-format", "reason": "Input/output error", "operation": "write", "action": "stop"}} 2021-09-23 11:04:34.235+0000: 488774: debug : qemuMonitorJSONIOProcessEvent:181 : mon=0x7f860c14b700 obj=0x7f860c0e7450 2021-09-23 11:04:34.235+0000: 488774: debug : qemuMonitorEmitEvent:1166 : mon=0x7f860c14b700 event=BLOCK_IO_ERROR 2021-09-23 11:04:34.235+0000: 488774: debug : qemuProcessHandleEvent:581 : vm=0x7f86201d6df0 2021-09-23 11:04:34.235+0000: 488774: debug : virObjectEventNew:624 : obj=0x7f860c0d82f0 2021-09-23 11:04:34.235+0000: 488774: debug : qemuMonitorJSONIOProcessEvent:206 : handle BLOCK_IO_ERROR handler=0x7f8639c77a90 data=0x7f860c0661c0 To confirm the local ssd is fine, have enough space where scratch disk is located and I could run dd in host without any issues. This happens on other storages as well. So this seems like an issue with qemu when heavy IO is happening on a disk. On Thu, Sep 23, 2021 at 7:19 AM Tommy Sway <sz_cui...@163.com> wrote: > > Another option with (still tech preview) is Managed Block Storage (Cinder > based storage). > > It still tech preview in 4.4 ?? > > > > > > > > -----Original Message----- > From: users-boun...@ovirt.org <users-boun...@ovirt.org> On Behalf Of Nir > Soffer > Sent: Wednesday, August 11, 2021 4:26 AM > To: Shantur Rathore <shantur.rath...@gmail.com> > Cc: users <users@ovirt.org>; Roman Bednar <rbed...@redhat.com> > Subject: [ovirt-users] Re: Sparse VMs from Templates - Storage issues > > On Tue, Aug 10, 2021 at 4:24 PM Shantur Rathore <shantur.rath...@gmail.com> > wrote: > > > > Hi all, > > > > I have a setup as detailed below > > > > - iSCSI Storage Domain > > - Template with Thin QCOW2 disk > > - Multiple VMs from Template with Thin disk > > Note that a single template disk used by many vms can become a performance > bottleneck, and is a single point of failure. Cloning the template when > creating vms avoids such issues. > > > oVirt Node 4.4.4 > > 4.4.4 is old, you should upgrade to 4.4.7. > > > When the VMs boots up it downloads some data to it and that leads to > > increase in volume size. > > I see that every few seconds the VM gets paused with > > > > "VM X has been paused due to no Storage space error." > > > > and then after few seconds > > > > "VM X has recovered from paused back to up" > > This is normal operation when a vm writes too quickly and oVirt cannot extend > the disk quick enough. To mitigate this, you can increase the volume chunk > size. > > Created this configuration drop in file: > > # cat /etc/vdsm/vdsm.conf.d/99-local.conf > [irs] > volume_utilization_percent = 25 > volume_utilization_chunk_mb = 2048 > > And restart vdsm. > > With this setting, when free space in a disk is 1.5g, the disk will be > extended by 2g. With the default setting, when free space is 0.5g the disk > was extended by 1g. > > If this does not eliminate the pauses, try a larger chunk size like 4096. > > > Sometimes after a many pause and recovery the VM dies with > > > > "VM X is down with error. Exit message: Lost connection with qemu process." > > This means qemu has crashed. You can find more info in the vm log at: > /var/log/libvirt/qemu/vm-name.log > > We know about bugs in qemu that cause such crashes when vm disk is extended. > I think the latest bug was fixed in 4.4.6, so upgrading to 4.4.7 will fix > this issue. > > Even with these settings, if you have a very bursty io in the vm, it may > become paused. The only way to completely avoid these pauses is to use a > preallocated disk, or use file storage (e.g. NFS). Preallocated disk can be > thin provisioned on the server side so it does not mean you need more > storage, but you will not be able to use shared templates in the way you use > them now. You can create vm from template, but the template is cloned to the > new vm. > > Another option with (still tech preview) is Managed Block Storage (Cinder > based storage). If your storage server is supported by Cinder, we can managed > it using cinderlib. In this setup every disk is a LUN, which may be thin > provisioned on the storage server. This can also offload storage operations > to the server, like cloning disks, which may be much faster and more > efficient. > > Nir > _______________________________________________ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: > https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/W653KLDZMLUNMKLE242UFH5LY4KQ6LD5/ > _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/A3F7KD6CYKB6ZXIGQQPYNDZOBPTQKPLO/