Public bug reported: We have found in our test for SLES15 with cloud-init installed, if we attach a ISO file with the VM before VM is boot, it often takes more than 10 minutes to start the SLES OS. Sometimes it failed to start the SLES OS at all.
We've root caused it is due to the "is_cdrom_ovf()" func of "tools/ds- identify". In this function, there is the following logic to detect if an ISO contains certain string: > local idstr="http://schemas.dmtf.org/ovf/environment/1" > grep --quiet --ignore-case "$idstr" "${PATH_ROOT}$dev" ref: https://git.launchpad.net/cloud-init/tree/tools/ds-identify It is trying to grep the who ISO file for a certain string, which causes intense IO pressure for the system. What is worse is that sometimes the ISO file is large (e.g. >5GB for installer DVD) and it is mounted over NFS. The "grep" process often consume 99% CPU and seems hang. Then the systemd starts more and more "grep" process which smoke the CPU and consumes all the IO bandwidth for the ISO file. Then the system may hang for a long time and sometimes failed to start. To fix this issue, I suggest that we should not grep for the entire ISO file. Rather then we should just check if the file/dir exists with os.path.exists(). -------------------------debug log snip------------------------ pek2-gosv-16-dhcp180:~ # ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 13:32 ? 00:00:04 /usr/lib/systemd/systemd --switched-root --system --deserialize 24 … root 474 1 0 13:34 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 482 474 2 13:34 ? 00:00:15 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 root 1020 1 0 13:35 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 1039 1020 1 13:35 ? 00:00:07 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 polkitd 1049 1 0 13:37 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug root 1051 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd --systemd --foreground root 1052 1 0 13:37 ? 00:00:00 /usr/lib/systemd/systemd-logind root 1054 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd-nanny --systemd --foreground root 1073 1 0 13:37 ? 00:00:00 /usr/bin/vmtoolsd root 1097 1 0 13:37 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 1110 1097 1 13:37 ? 00:00:04 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 root 1304 1 0 13:38 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 1312 1304 1 13:38 ? 00:00:03 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 root 1537 1 0 13:40 ? 00:00:00 /usr/bin/plymouth --wait root 1613 1 0 13:40 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 1645 1613 0 13:40 ? 00:00:02 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 … Grep use nearly 100% cpu, system very slow. top - 13:46:37 up 26 min, 2 users, load average: 14.14, 15.03, 10.57 Tasks: 225 total, 6 running, 219 sleeping, 0 stopped, 0 zombie %Cpu(s): 40.1 us, 49.3 sy, 0.0 ni, 0.0 id, 1.4 wa, 0.0 hi, 9.1 si, 0.0 st KiB Mem : 1000916 total, 64600 free, 355880 used, 580436 buff/cache KiB Swap: 1288168 total, 1285600 free, 2568 used. 492688 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4427 root 20 0 40100 3940 3084 R 99.90 0.394 0:27.41 top 1016 root 20 0 197796 4852 3400 R 99.90 0.485 1:26.44 vmtoolsd 1723 root 20 0 7256 1860 1556 D 99.90 0.186 0:28.44 grep 484 root 20 0 7256 1684 1396 D 99.90 0.168 1:51.22 grep 1278 root 20 0 7256 1856 1556 D 99.90 0.185 0:38.44 grep 1398 root 20 0 7256 1860 1556 R 99.90 0.186 0:28.53 grep 1061 root 20 0 7256 1856 1556 D 99.90 0.185 0:56.62 grep -------------------------debug log snip------------------------ ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1806701 Title: cloud-init may hang OS boot process due to grep for the entire ISO file when it is attached Status in cloud-init: New Bug description: We have found in our test for SLES15 with cloud-init installed, if we attach a ISO file with the VM before VM is boot, it often takes more than 10 minutes to start the SLES OS. Sometimes it failed to start the SLES OS at all. We've root caused it is due to the "is_cdrom_ovf()" func of "tools /ds-identify". In this function, there is the following logic to detect if an ISO contains certain string: > local idstr="http://schemas.dmtf.org/ovf/environment/1" > grep --quiet --ignore-case "$idstr" "${PATH_ROOT}$dev" ref: https://git.launchpad.net/cloud-init/tree/tools/ds-identify It is trying to grep the who ISO file for a certain string, which causes intense IO pressure for the system. What is worse is that sometimes the ISO file is large (e.g. >5GB for installer DVD) and it is mounted over NFS. The "grep" process often consume 99% CPU and seems hang. Then the systemd starts more and more "grep" process which smoke the CPU and consumes all the IO bandwidth for the ISO file. Then the system may hang for a long time and sometimes failed to start. To fix this issue, I suggest that we should not grep for the entire ISO file. Rather then we should just check if the file/dir exists with os.path.exists(). -------------------------debug log snip------------------------ pek2-gosv-16-dhcp180:~ # ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 13:32 ? 00:00:04 /usr/lib/systemd/systemd --switched-root --system --deserialize 24 … root 474 1 0 13:34 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 482 474 2 13:34 ? 00:00:15 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 root 1020 1 0 13:35 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 1039 1020 1 13:35 ? 00:00:07 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 polkitd 1049 1 0 13:37 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug root 1051 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd --systemd --foreground root 1052 1 0 13:37 ? 00:00:00 /usr/lib/systemd/systemd-logind root 1054 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd-nanny --systemd --foreground root 1073 1 0 13:37 ? 00:00:00 /usr/bin/vmtoolsd root 1097 1 0 13:37 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 1110 1097 1 13:37 ? 00:00:04 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 root 1304 1 0 13:38 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 1312 1304 1 13:38 ? 00:00:03 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 root 1537 1 0 13:40 ? 00:00:00 /usr/bin/plymouth --wait root 1613 1 0 13:40 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify root 1645 1613 0 13:40 ? 00:00:02 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1 … Grep use nearly 100% cpu, system very slow. top - 13:46:37 up 26 min, 2 users, load average: 14.14, 15.03, 10.57 Tasks: 225 total, 6 running, 219 sleeping, 0 stopped, 0 zombie %Cpu(s): 40.1 us, 49.3 sy, 0.0 ni, 0.0 id, 1.4 wa, 0.0 hi, 9.1 si, 0.0 st KiB Mem : 1000916 total, 64600 free, 355880 used, 580436 buff/cache KiB Swap: 1288168 total, 1285600 free, 2568 used. 492688 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4427 root 20 0 40100 3940 3084 R 99.90 0.394 0:27.41 top 1016 root 20 0 197796 4852 3400 R 99.90 0.485 1:26.44 vmtoolsd 1723 root 20 0 7256 1860 1556 D 99.90 0.186 0:28.44 grep 484 root 20 0 7256 1684 1396 D 99.90 0.168 1:51.22 grep 1278 root 20 0 7256 1856 1556 D 99.90 0.185 0:38.44 grep 1398 root 20 0 7256 1860 1556 R 99.90 0.186 0:28.53 grep 1061 root 20 0 7256 1856 1556 D 99.90 0.185 0:56.62 grep -------------------------debug log snip------------------------ To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1806701/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp