bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-05 Thread Ludovic Courtès
Hello! On AMD EPYC processors, as found on the build nodes of ci.guix.gnu.org, childhurd VMs fail to boot when running with ‘qemu-system-i386 -enable-kvm’ (the kvm-amd Linux kernel module is used), with the Hurd startup process hanging before /hurd/exec has been started: --8<---cut he

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-06 Thread Ludovic Courtès
Hi! As suggested by Samuel on IRC, I did that early on in kdb: debug traps /on such that it would stop on each trap, hopefully allowing me to see why exec is not starting. --8<---cut here---start->8--- module 0: ext2fs --multiboot-command-line=${kernel-comm

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-06 Thread Ludovic Courtès
Samuel Thibault skribis: > Ludovic Courtès, le jeu. 06 oct. 2022 15:14:13 +0200, a ecrit: >> such that it would stop on each trap, hopefully allowing me to see why >> exec is not starting. > > Also, better use exec.static to have static addresses. Thanks for the hint. Of course, the thing boots

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-07 Thread Ludovic Courtès
Hi! Samuel Thibault skribis: > Ludovic Courtès, le ven. 07 oct. 2022 00:10:15 +0200, a ecrit: [...] >> Of course, the thing boots just fine on that machine when using >> ‘exec.static’. > > Uh. At least you have a workaround :) Yup. :-) >> So the issue might be somewhere in ld.so, or trigger

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-08 Thread Samuel Thibault
Ludovic Courtès, le jeu. 06 oct. 2022 15:14:13 +0200, a ecrit: > such that it would stop on each trap, hopefully allowing me to see why > exec is not starting. Also, better use exec.static to have static addresses. We use the dynamic version of exec "just because we can", but that makes debugging

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-08 Thread Samuel Thibault
Ludovic Courtès, le ven. 07 oct. 2022 00:10:15 +0200, a ecrit: > Samuel Thibault skribis: > > > Ludovic Courtès, le jeu. 06 oct. 2022 15:14:13 +0200, a ecrit: > >> such that it would stop on each trap, hopefully allowing me to see why > >> exec is not starting. > > > > Also, better use exec.stati

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-08 Thread Samuel Thibault
Ludovic Courtès, le ven. 07 oct. 2022 10:24:22 +0200, a ecrit: > trap, eip 0xc10305c1 > > Breakpoint at task_terminate: pushl %ebp >

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-08 Thread Ludovic Courtès
Hi Samuel, Samuel Thibault skribis: > About the backtrace: > >> user space < > 0x1000(bf24,0,0,1160b,0) > 0x11627(bf9c,0,0,0,2) > 0x11bb() > > That is quite surprising actually: in my ld.so there is nothing useful > at 0x1000. Perhaps you can check what 0x11627 is all about? Sur

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-09 Thread Ludovic Courtès
Hi! Ludovic Courtès skribis: > $ addr2line -e > /gnu/store/m8afvcgwmrfhvjpd7b0xllk8vv5isd6j-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1 > 0x1000 0x11627 0x11bb > ??:0 > /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/glibc-2.33/elf/dl-misc.c:333 > :? > > > That’s ‘_dl_fatal_printf’ calling ‘_

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-09 Thread Samuel Thibault
Ludovic Courtès, le dim. 09 oct. 2022 18:09:07 +0200, a ecrit: > So it would seem that ‘_dl_start’ is called and somehow then a tail-call > to ‘_dl_fatal_printf’ is made. Perhaps you can build glibc without tail-call optimization? (-fno-optimize-sibling-calls) Samuel

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-10 Thread Ludovic Courtès
Ludovic Courtès skribis: > Through a dichotomy I tried to see how far it goes. The info I have so > far is that ld.so errors out from elf/rtld.c:563 (line 565 is not > reached): > > 558: if (bootstrap_map.l_addr || ! > bootstrap_map.l_info[VALIDX(DT_GNU_PRELINKED)]) > 559:{ > 560: /*

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-17 Thread Ludovic Courtès
Hi, Ludovic Courtès skribis: > … so ‘exec_load’ is doing its job, it seems. Turns out that may not be the case. Here’s a *bad* mapping on the second ‘task_resume’ breakpoint (when ‘exec’ is about to start): --8<---cut here---start->8--- db> show all threa

bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd)

2022-10-23 Thread Ludovic Courtès
Hi, Ludovic Courtès skribis: > Of course, the thing boots just fine on that machine when using > ‘exec.static’. It’s frustrating I did not get to the bottom of it, but time passes, so I pushed this workaround in Guix commit 3fb3bd3da530a5f82a169b1fa451474f9d90c3b6. Ludo’.