Also I've rebuilt the most recent master c1e90def01 about ~55 commits newer than 6.0-rc2. As in the experiments of Tommy I was unable to reproduce it there. But with the data from the tests before it is very likely that this is more likely an accident by having a slightly different timing than a fix (to be clear I'd appreciate if there is a fix, I'm just unable to derive from this being good I could e.g. bisect).
export CFLAGS="-O0 -g -fPIC" ../configure --enable-system --disable-xen --disable-werror --disable-docs --disable-libudev --disable-guest-agent --disable-sdl --disable-gtk --disable-vnc --disable-xen --disable-brlapi --disable-hax --disable-vde --disable-netmap --disable-rbd --disable-libiscsi --disable-libnfs --disable-smartcard --disable-libusb --disable-usb-redir --disable-seccomp --disable-glusterfs --disable-tpm --disable-numa --disable-opengl --disable-virglrenderer --disable-xfsctl --disable-slirp --disable-blobs --disable-rdma --disable-pvrdma --disable-attr --disable-vhost-net --disable-vhost-vsock --disable-vhost-scsi --disable-vhost-crypto --disable-vhost-user --disable-spice --disable-qom-cast-debug --disable-bochs --disable-cloop --disable-dmg --disable-qcow1 --disable-vdi --disable-vvfat --disable-qed --disable-parallels --disable-sheepdog --disable-avx2 --disable-nettle --disable-gnutls --disable-capstone --enable-tools --disable-libssh --disable-libpmem --disable-cap-ng --disable-vte --disable-iconv --disable-curses --disable-linux-aio --disable-linux-io-uring --disable-kvm --disable-replication --audio-drv-list="" --disable-vhost-kernel --disable-vhost-vdpa --disable-live-block-migration --disable-keyring --disable-auth-pam --disable-curl --disable-strip --enable-fdt --target-list="riscv64-softmmu" make -j10 Just like the package build that configures as coroutine backend: ucontext coroutine pool: YES 5/5 runs with that were ok But since we know it is racy I'm unsure if that implies much :-/ P.S. I have not yet went into a build-option bisect, but chances are it could be related. But that is too much stabbing in the dark, maybe someone experienced in the coroutines code can already make sense of all the info we have gathered so far. I'll update the bug description and add an upstream task to have all the info we have get mirrored to the qemu mailing lists. ** Summary changed: - Recent update broke qemu-system-riscv64 + Coroutines are racy for risc64 emu on arm64 - crash on Assertion ** Description changed: + Note: this could as well be "riscv64 on arm64" for being slow@slow and affect + other architectures as well. + + The following case triggers on a Raspberry Pi4 running with arm64 on + Ubuntu 21.04 [1][2]. It might trigger on other environments as well, + but that is what we have seen it so far. + + $ wget https://github.com/carlosedp/riscv-bringup/releases/download/v1.0/UbuntuFocal-riscv64-QemuVM.tar.gz + $ tar xzf UbuntuFocal-riscv64-QemuVM.tar.gz + $ ./run_riscvVM.sh + (wait ~2 minutes) + [ OK ] Reached target Local File Systems (Pre). + [ OK ] Reached target Local File Systems. + Starting udev Kernel Device Manager... + qemu-system-riscv64: ../../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed. + + This is often, but not 100% reproducible and the cases differ slightly we + see either of: + - qemu-system-riscv64: ../../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed. + - qemu-system-riscv64: ../../block/aio_task.c:64: aio_task_pool_wait_one: Assertion `qemu_coroutine_self() == pool->main_co' failed. + + Rebuilding working cases has shown to make them fail, as well as rebulding + (or even reinstalling) bad cases has made them work. Also the same builds on + different arm64 CPUs behave differently. TL;DR: The full list of conditions + influencing good/bad case here are not yet known. + + [1]: https://ubuntu.com/tutorials/how-to-install-ubuntu-on-your-raspberry-pi#1-overview + [2]: http://cdimage.ubuntu.com/daily-preinstalled/pending/hirsute-preinstalled-desktop-arm64+raspi.img.xz + + + --- --- original report --- --- + I regularly run a RISC-V (RV64GC) QEMU VM, but an update a few days ago broke it. Now when I launch it, it hits an assertion: - - OpenSBI v0.6 - ____ _____ ____ _____ - / __ \ / ____| _ \_ _| - | | | |_ __ ___ _ __ | (___ | |_) || | - | | | | '_ \ / _ \ '_ \ \___ \| _ < | | - | |__| | |_) | __/ | | |____) | |_) || |_ - \____/| .__/ \___|_| |_|_____/|____/_____| - | | - |_| - + OpenSBI v0.6 + ____ _____ ____ _____ + / __ \ / ____| _ \_ _| + | | | |_ __ ___ _ __ | (___ | |_) || | + | | | | '_ \ / _ \ '_ \ \___ \| _ < | | + | |__| | |_) | __/ | | |____) | |_) || |_ + \____/| .__/ \___|_| |_|_____/|____/_____| + | | + |_| + ... - Found /boot/extlinux/extlinux.conf - Retrieving file: /boot/extlinux/extlinux.conf - 618 bytes read in 2 ms (301.8 KiB/s) - RISC-V Qemu Boot Options - 1: Linux kernel-5.5.0-dirty - 2: Linux kernel-5.5.0-dirty (recovery mode) - Enter choice: 1: Linux kernel-5.5.0-dirty - Retrieving file: /boot/initrd.img-5.5.0-dirty - qemu-system-riscv64: ../../block/aio_task.c:64: aio_task_pool_wait_one: Assertion `qemu_coroutine_self() == pool->main_co' failed. + Found /boot/extlinux/extlinux.conf + Retrieving file: /boot/extlinux/extlinux.conf + 618 bytes read in 2 ms (301.8 KiB/s) + RISC-V Qemu Boot Options + 1: Linux kernel-5.5.0-dirty + 2: Linux kernel-5.5.0-dirty (recovery mode) + Enter choice: 1: Linux kernel-5.5.0-dirty + Retrieving file: /boot/initrd.img-5.5.0-dirty + qemu-system-riscv64: ../../block/aio_task.c:64: aio_task_pool_wait_one: Assertion `qemu_coroutine_self() == pool->main_co' failed. ./run.sh: line 31: 1604 Aborted (core dumped) qemu-system-riscv64 -machine virt -nographic -smp 8 -m 8G -bios fw_payload.bin -device virtio-blk-devi ce,drive=hd0 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-device,rng=rng0 -drive file=riscv64-UbuntuFocal-qemu.qcow2,format=qcow2,id=hd0 -devi - ce virtio-net-device,netdev=usernet -netdev user,id=usernet,$ports + ce virtio-net-device,netdev=usernet -netdev user,id=usernet,$ports Interestingly this doesn't happen on the AMD64 version of Ubuntu 21.04 (fully updated). - Think you have everything already, but just in case: $ lsb_release -rd Description: Ubuntu Hirsute Hippo (development branch) Release: 21.04 $ uname -a Linux minimacvm 5.11.0-11-generic #12-Ubuntu SMP Mon Mar 1 19:27:36 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux (note this is a VM running on macOS/M1) $ apt-cache policy qemu qemu: - Installed: 1:5.2+dfsg-9ubuntu1 - Candidate: 1:5.2+dfsg-9ubuntu1 - Version table: - *** 1:5.2+dfsg-9ubuntu1 500 - 500 http://ports.ubuntu.com/ubuntu-ports hirsute/universe arm64 Packages - 100 /var/lib/dpkg/status + Installed: 1:5.2+dfsg-9ubuntu1 + Candidate: 1:5.2+dfsg-9ubuntu1 + Version table: + *** 1:5.2+dfsg-9ubuntu1 500 + 500 http://ports.ubuntu.com/ubuntu-ports hirsute/universe arm64 Packages + 100 /var/lib/dpkg/status ProblemType: Bug DistroRelease: Ubuntu 21.04 Package: qemu 1:5.2+dfsg-9ubuntu1 ProcVersionSignature: Ubuntu 5.11.0-11.12-generic 5.11.0 Uname: Linux 5.11.0-11-generic aarch64 ApportVersion: 2.20.11-0ubuntu61 Architecture: arm64 CasperMD5CheckResult: unknown CurrentDmesg: - Error: command ['pkexec', 'dmesg'] failed with exit code 127: polkit-agent-helper-1: error response to PolicyKit daemon: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: No session for cookie - Error executing command as another user: Not authorized - - This incident has been reported. + Error: command ['pkexec', 'dmesg'] failed with exit code 127: polkit-agent-helper-1: error response to PolicyKit daemon: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: No session for cookie + Error executing command as another user: Not authorized + + This incident has been reported. Date: Mon Mar 29 02:33:25 2021 Dependencies: - + KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND Lspci-vt: - -[0000:00]-+-00.0 Apple Inc. Device f020 - +-01.0 Red Hat, Inc. Virtio network device - +-05.0 Red Hat, Inc. Virtio console - +-06.0 Red Hat, Inc. Virtio block device - \-07.0 Red Hat, Inc. Virtio RNG + -[0000:00]-+-00.0 Apple Inc. Device f020 + +-01.0 Red Hat, Inc. Virtio network device + +-05.0 Red Hat, Inc. Virtio console + +-06.0 Red Hat, Inc. Virtio block device + \-07.0 Red Hat, Inc. Virtio RNG Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: - + Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: ProcEnviron: - TERM=screen - PATH=(custom, no user) - XDG_RUNTIME_DIR=<set> - LANG=C.UTF-8 - SHELL=/bin/bash + TERM=screen + PATH=(custom, no user) + XDG_RUNTIME_DIR=<set> + LANG=C.UTF-8 + SHELL=/bin/bash ProcKernelCmdLine: console=hvc0 root=/dev/vda SourcePackage: qemu UpgradeStatus: Upgraded to hirsute on 2020-12-30 (88 days ago) acpidump: - Error: command ['pkexec', '/usr/share/apport/dump_acpi_tables.py'] failed with exit code 127: polkit-agent-helper-1: error response to PolicyKit daemon: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: No session for cookie - Error executing command as another user: Not authorized - - This incident has been reported. + Error: command ['pkexec', '/usr/share/apport/dump_acpi_tables.py'] failed with exit code 127: polkit-agent-helper-1: error response to PolicyKit daemon: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: No session for cookie + Error executing command as another user: Not authorized + + This incident has been reported. ** Also affects: qemu Importance: Undecided Status: New ** Changed in: qemu (Ubuntu) Importance: Undecided => Low -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1921664 Title: Coroutines are racy for risc64 emu on arm64 - crash on Assertion Status in QEMU: New Status in qemu package in Ubuntu: Confirmed Bug description: Note: this could as well be "riscv64 on arm64" for being slow@slow and affect other architectures as well. The following case triggers on a Raspberry Pi4 running with arm64 on Ubuntu 21.04 [1][2]. It might trigger on other environments as well, but that is what we have seen it so far. $ wget https://github.com/carlosedp/riscv-bringup/releases/download/v1.0/UbuntuFocal-riscv64-QemuVM.tar.gz $ tar xzf UbuntuFocal-riscv64-QemuVM.tar.gz $ ./run_riscvVM.sh (wait ~2 minutes) [ OK ] Reached target Local File Systems (Pre). [ OK ] Reached target Local File Systems. Starting udev Kernel Device Manager... qemu-system-riscv64: ../../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed. This is often, but not 100% reproducible and the cases differ slightly we see either of: - qemu-system-riscv64: ../../util/qemu-coroutine-lock.c:57: qemu_co_queue_wait_impl: Assertion `qemu_in_coroutine()' failed. - qemu-system-riscv64: ../../block/aio_task.c:64: aio_task_pool_wait_one: Assertion `qemu_coroutine_self() == pool->main_co' failed. Rebuilding working cases has shown to make them fail, as well as rebulding (or even reinstalling) bad cases has made them work. Also the same builds on different arm64 CPUs behave differently. TL;DR: The full list of conditions influencing good/bad case here are not yet known. [1]: https://ubuntu.com/tutorials/how-to-install-ubuntu-on-your-raspberry-pi#1-overview [2]: http://cdimage.ubuntu.com/daily-preinstalled/pending/hirsute-preinstalled-desktop-arm64+raspi.img.xz --- --- original report --- --- I regularly run a RISC-V (RV64GC) QEMU VM, but an update a few days ago broke it. Now when I launch it, it hits an assertion: OpenSBI v0.6 ____ _____ ____ _____ / __ \ / ____| _ \_ _| | | | |_ __ ___ _ __ | (___ | |_) || | | | | | '_ \ / _ \ '_ \ \___ \| _ < | | | |__| | |_) | __/ | | |____) | |_) || |_ \____/| .__/ \___|_| |_|_____/|____/_____| | | |_| ... Found /boot/extlinux/extlinux.conf Retrieving file: /boot/extlinux/extlinux.conf 618 bytes read in 2 ms (301.8 KiB/s) RISC-V Qemu Boot Options 1: Linux kernel-5.5.0-dirty 2: Linux kernel-5.5.0-dirty (recovery mode) Enter choice: 1: Linux kernel-5.5.0-dirty Retrieving file: /boot/initrd.img-5.5.0-dirty qemu-system-riscv64: ../../block/aio_task.c:64: aio_task_pool_wait_one: Assertion `qemu_coroutine_self() == pool->main_co' failed. ./run.sh: line 31: 1604 Aborted (core dumped) qemu-system-riscv64 -machine virt -nographic -smp 8 -m 8G -bios fw_payload.bin -device virtio-blk-devi ce,drive=hd0 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-device,rng=rng0 -drive file=riscv64-UbuntuFocal-qemu.qcow2,format=qcow2,id=hd0 -devi ce virtio-net-device,netdev=usernet -netdev user,id=usernet,$ports Interestingly this doesn't happen on the AMD64 version of Ubuntu 21.04 (fully updated). Think you have everything already, but just in case: $ lsb_release -rd Description: Ubuntu Hirsute Hippo (development branch) Release: 21.04 $ uname -a Linux minimacvm 5.11.0-11-generic #12-Ubuntu SMP Mon Mar 1 19:27:36 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux (note this is a VM running on macOS/M1) $ apt-cache policy qemu qemu: Installed: 1:5.2+dfsg-9ubuntu1 Candidate: 1:5.2+dfsg-9ubuntu1 Version table: *** 1:5.2+dfsg-9ubuntu1 500 500 http://ports.ubuntu.com/ubuntu-ports hirsute/universe arm64 Packages 100 /var/lib/dpkg/status ProblemType: Bug DistroRelease: Ubuntu 21.04 Package: qemu 1:5.2+dfsg-9ubuntu1 ProcVersionSignature: Ubuntu 5.11.0-11.12-generic 5.11.0 Uname: Linux 5.11.0-11-generic aarch64 ApportVersion: 2.20.11-0ubuntu61 Architecture: arm64 CasperMD5CheckResult: unknown CurrentDmesg: Error: command ['pkexec', 'dmesg'] failed with exit code 127: polkit-agent-helper-1: error response to PolicyKit daemon: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: No session for cookie Error executing command as another user: Not authorized This incident has been reported. Date: Mon Mar 29 02:33:25 2021 Dependencies: KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND Lspci-vt: -[0000:00]-+-00.0 Apple Inc. Device f020 +-01.0 Red Hat, Inc. Virtio network device +-05.0 Red Hat, Inc. Virtio console +-06.0 Red Hat, Inc. Virtio block device \-07.0 Red Hat, Inc. Virtio RNG Lsusb: Error: command ['lsusb'] failed with exit code 1: Lsusb-t: Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1: ProcEnviron: TERM=screen PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=C.UTF-8 SHELL=/bin/bash ProcKernelCmdLine: console=hvc0 root=/dev/vda SourcePackage: qemu UpgradeStatus: Upgraded to hirsute on 2020-12-30 (88 days ago) acpidump: Error: command ['pkexec', '/usr/share/apport/dump_acpi_tables.py'] failed with exit code 127: polkit-agent-helper-1: error response to PolicyKit daemon: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: No session for cookie Error executing command as another user: Not authorized This incident has been reported. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1921664/+subscriptions