On Mon, Jul 5, 2021 at 12:50 PM Gianluca Cecchi <gianluca.cec...@gmail.com> wrote: > > On Sun, Jul 4, 2021 at 1:01 PM Nir Soffer <nsof...@redhat.com> wrote: >> >> On Sun, Jul 4, 2021 at 11:30 AM Strahil Nikolov <hunter86...@yahoo.com> >> wrote: >> > >> > Isn't it better to strace it before killing qemu-img . >> >> It may be too late, but it may help to understand why this qemu-img >> run got stuck. >> > > Hi, thanks for your answers and suggestions. > That env was a production one and so I was forced to power off the hypervisor > and power on it again (it was a maintenance window with all the VMs powered > down anyway). I was also unable to put the host into maintenance because it > replied that there were some tasks running, even after the kill, because the > 2 processes (the VM had 2 disks to export and so two qemu-img processes) > remained in defunct and after several minutes no change in web admin feedback > about the process.... > > My first suspicion was something related to fw congestion because the > hypervisor network and the nas appliance were in different networks and I > wasn't sure if a fw was in place for it.... > But on a test oVirt environment with same oVirt version and with the same > network for hypervisors I was able to put a Linux server with the same > network as the nas and configure it as nfs server. > And the export went with a throughput of about 50MB/s, so no fw problem. > A VM with 55Gb disk exported in 19 minutes. > > So I got the rights to mount the nas on the test env and mounted it as export > domain and now I have the same problems I can debug. > The same VM with only one disk (55Gb). The process: > > vdsm 14342 3270 0 11:17 ? 00:00:03 /usr/bin/qemu-img convert -p > -t none -T none -f raw > /rhev/data-center/mnt/blockSD/679c0725-75fb-4af7-bff1-7c447c5d789c/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379 > -O raw -o preallocation=falloc > /rhev/data-center/mnt/172.16.1.137:_nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/d2a89b5e-7d62-4695-96d8-b762ce52b379
-o preallocation + NFS 4.0 + very slow NFS is your problem. qemu-img is using posix-fallocate() to preallocate the entire image at the start of the copy. With NFS 4.2 this uses fallocate() linux specific syscall that allocates the space very efficiently in no time. With older NFS versions, this becomes a very slow loop, writing one byte for every 4k block. If you see -o preallocation, it means you are using an old vdsm version, we stopped using -o preallocation in 4.4.2, see https://bugzilla.redhat.com/1850267. > On the hypervisor the ls commands quite hang, so from another hypervisor I > see that the disk size seems to remain at 4Gb even if timestamp updates... > > # ll > /rhev/data-center/mnt/172.16.1.137\:_nas_EXPORT-DOMAIN/20433d5d-9d82-4079-9252-0e746ce54106/images/530b3e7f-4ce4-4051-9cac-1112f5f9e8b5/ > total 4260941 > -rw-rw----. 1 nobody nobody 4363202560 Jul 5 11:23 > d2a89b5e-7d62-4695-96d8-b762ce52b379 > -rw-r--r--. 1 nobody nobody 261 Jul 5 11:17 > d2a89b5e-7d62-4695-96d8-b762ce52b379.meta > > On host console I see a throughput of 4mbit/s... > > # strace -p 14342 This shows only the main thread use -f use -f to show all threads. > strace: Process 14342 attached > ppoll([{fd=9, events=POLLIN|POLLERR|POLLHUP}], 1, NULL, NULL, 8 > > # ll /proc/14342/fd > hangs... > > # nfsstat -v > Client packet stats: > packets udp tcp tcpconn > 0 0 0 0 > > Client rpc stats: > calls retrans authrefrsh > 31171856 0 31186615 > > Client nfs v4: > null read write commit open open_conf > 0 0% 2339179 7% 14872911 47% 7233 0% 74956 0% 2 0% > open_noat open_dgrd close setattr fsinfo renew > 2312347 7% 0 0% 2387293 7% 24 0% 23 0% 5 0% > setclntid confirm lock lockt locku access > 3 0% 3 0% 8 0% 8 0% 5 0% 1342746 4% > getattr lookup lookup_root remove rename link > 3031001 9% 71551 0% 7 0% 74590 0% 6 0% 0 0% > symlink create pathconf statfs readlink readdir > 0 0% 9 0% 16 0% 4548231 14% 0 0% 98506 0% > server_caps delegreturn getacl setacl fs_locations rel_lkowner > 39 0% 14 0% 0 0% 0 0% 0 0% 0 0% > secinfo exchange_id create_ses destroy_ses sequence get_lease_t > 0 0% 0 0% 4 0% 2 0% 1 0% 0 0% > reclaim_comp layoutget getdevinfo layoutcommit layoutreturn getdevlist > 0 0% 2 0% 0 0% 0 0% 0 0% 0 0% > (null) > 5 0% > > > # vmstat 3 > procs -----------memory---------- ---swap-- -----io---- -system-- > ------cpu----- > r b swpd free buff cache si so bi bo in cs us sy id wa > st > 3 1 0 82867112 437548 7066580 0 0 54 1 0 0 0 0 > 100 0 0 > 0 1 0 82867024 437548 7066620 0 0 1708 0 3720 8638 0 0 95 > 4 0 > 4 1 0 82868728 437552 7066616 0 0 875 9 3004 8457 0 0 95 > 4 0 > 0 1 0 82869600 437552 7066636 0 0 1785 6 2982 8359 0 0 95 > 4 0 > > I see the blocked process that is my qemu-img one... > > In messages of hypervisor > > Jul 5 11:33:06 node4 kernel: INFO: task qemu-img:14343 blocked for more than > 120 seconds. > Jul 5 11:33:06 node4 kernel: "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > Jul 5 11:33:06 node4 kernel: qemu-img D ffff9d960e7e1080 0 14343 > 3328 0x00000080 > Jul 5 11:33:06 node4 kernel: Call Trace: > Jul 5 11:33:06 node4 kernel: [<ffffffffa72de185>] ? sched_clock_cpu+0x85/0xc0 > Jul 5 11:33:06 node4 kernel: [<ffffffffa72da830>] ? > try_to_wake_up+0x190/0x390 > Jul 5 11:33:06 node4 kernel: [<ffffffffa7988089>] > schedule_preempt_disabled+0x29/0x70 > Jul 5 11:33:06 node4 kernel: [<ffffffffa7985ff7>] > __mutex_lock_slowpath+0xc7/0x1d0 > Jul 5 11:33:06 node4 kernel: [<ffffffffa79853cf>] mutex_lock+0x1f/0x2f > Jul 5 11:33:06 node4 kernel: [<ffffffffc0db5489>] > nfs_start_io_write+0x19/0x40 [nfs] > Jul 5 11:33:06 node4 kernel: [<ffffffffc0dad0d1>] nfs_file_write+0x81/0x1e0 > [nfs] > Jul 5 11:33:06 node4 kernel: [<ffffffffa744d063>] do_sync_write+0x93/0xe0 > Jul 5 11:33:06 node4 kernel: [<ffffffffa744db50>] vfs_write+0xc0/0x1f0 > Jul 5 11:33:06 node4 kernel: [<ffffffffa744eaf2>] SyS_pwrite64+0x92/0xc0 > Jul 5 11:33:06 node4 kernel: [<ffffffffa7993ec9>] ? > system_call_after_swapgs+0x96/0x13a > Jul 5 11:33:06 node4 kernel: [<ffffffffa7993f92>] > system_call_fastpath+0x25/0x2a > Jul 5 11:33:06 node4 kernel: [<ffffffffa7993ed5>] ? > system_call_after_swapgs+0xa2/0x13a Looks like qemu-img is stuck writing to the NFS server. > Possibly problems with NFSv4? I see that it mounts as nfsv4: > > # mount > . . . > 172.16.1.137:/nas/EXPORT-DOMAIN on > /rhev/data-center/mnt/172.16.1.137:_nas_EXPORT-DOMAIN type nfs4 > (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=192.168.50.52,local_lock=none,addr=172.16.1.137) > > This is a test oVirt env so I can wait and eventually test something... > Let me know your suggestions I would start by changing the NFS storage domain to version 4.2. 1. kill the hang qemu-img (it will probably cannot be killed, but worth trying) 2. deactivate the storage domain 3. fix the ownership on the storage domain (should be vdsm:kvm, not nobody:nobody)3. 4. in ovirt engine: manage storage domain -> advanced options -> nfs version: 4.2 5. activate the storage domain 6. try again to export the disk Finally I think we have a management issue here. It does not make sense to use a preallocated disk on export domain. Using a preallocated disk makes sense on a data domain when you want to prevent the case of failing with ENSPC when VM writes to the disk. Disks on the export domain are never used by a running VM so there is no reason to preallocate them. The system should always use sparse disks when copying to export domain. When importing disks from export domain, the system should reconstruct the original disk configuration (e.g. raw-preallocated). Nir _______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5TBHHV4MKUR53HHHXGNKGI6OLSP5NKID/