On Thu, Dec 11, 2025 at 1:15 PM Jan Beulich <[email protected]> wrote: > > On 11.12.2025 11:29, Mykola Kvach wrote: > > While working on an arm64 s2ram series for Xen I have hit what looks > > like very strange behaviour in symbols_lookup() as exercised by > > test-symbols. > > > > The series is in the branch referenced at [1]. All patches there except > > the last one build and pass CI; adding only the last patch makes the CI > > job referenced at [2] start failing. > > > > Note that the tests in that job are built without CONFIG_SYSTEM_SUSPEND > > enabled, so most of the code introduced by the s2ram branch is not > > compiled at all for that configuration. That is why I initially did not > > expect my series to affect this job. > > > > To investigate, I tried to reproduce the issue locally. I downloaded the > > xen-config artifact from the failing job [3] and used it to build Xen > > with my local aarch64 cross compiler. With this local toolchain > > I could not reproduce the failure, and the resulting .config changed > > slightly > > compared to the job's config. The relevant part of the diff looks like this: > > > > diff --git a/xen/.config b/xen-config > > index 057553f510..44dcf6bacc 100644 > > --- a/xen/.config > > +++ b/xen-config > > @@ -3,11 +3,11 @@ > > # Xen/arm 4.22-unstable Configuration > > # > > CONFIG_CC_IS_GCC=y > > -CONFIG_GCC_VERSION=130300 > > +CONFIG_GCC_VERSION=120201 > > CONFIG_CLANG_VERSION=0 > > CONFIG_LD_IS_GNU=y > > CONFIG_CC_HAS_ASM_INLINE=y > > -CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y > > +CONFIG_GCC_ASM_GOTO_OUTPUT_BROKEN=y > > CONFIG_FUNCTION_ALIGNMENT_4B=y > > CONFIG_FUNCTION_ALIGNMENT=4 > > CONFIG_ARM_64=y > > > > So there is at least a difference in GCC version and asm-goto related > > Kconfig options between the CI environment and my local one. > > > > After that I tried rebuilding inside the same Docker image that GitLab > > CI uses: > > > > registry.gitlab.com/xen-project/xen/alpine:3.18-arm64v8 > > > > When I build Xen in that container, using the same branch, the problem > > reproduces in the same way as in the CI job. > > > > Even more confusingly, adding extra prints in test_symbols just before > > the calls to test_lookup() makes the problem disappear. This made me > > suspect some undefined behaviour or logic issue that is very sensitive > > to optimisation or layout changes. > > All symptoms described make me suspect you're hitting a problem we're > already in the process of hunting down. Can you please take [1], make > the small adjustment necessary to Arm's linking rule, and see whether > you get a build failure in the case where right now you get a boot time > crash? Of course no other changes to code or data layout should be done, > or else you may observe false negatives.
I tested the issue with the provided patch, and it is still reproducible. This is my working branch: e8d5baab50 (HEAD -> reg) symbols: check table sizes don't change between linking passes 2 and 3 e53439fdfc (xen_gitlab/reg) xen/arm: Add support for system suspend triggered by hardware domain eaa461f3b5 xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface) 4236fff9a4 xen/arm: Save/restore context on suspend/resume a150f3d4bb xen/arm: Resume memory management on Xen resume You can find the following line in the attached Xen boot log: (XEN) [ 0.010785] Latest ChangeSet: Tue Dec 9 11:11:40 2025 +0100 git:e8d5baab50 > > Jan > > [1] https://lists.xen.org/archives/html/xen-devel/2025-12/msg00390.html Best regards, Mykola
(XEN) Checking for initrd in /chosen (XEN) Initrd 00000000e20f4000-00000000ececd9e2 (XEN) RAM: 0000000000200000 - 00000000efffffff (XEN) RAM: 0000000100000000 - 00000003fbffffff (XEN) RAM: 00000003fc500000 - 00000003ffefffff (XEN) (XEN) MODULE[0]: 0000000049000000 - 000000004916bfff Xen (XEN) MODULE[1]: 00000000e20ea000 - 00000000e20f0fff Device Tree (XEN) MODULE[2]: 00000000e20f4000 - 00000000ececd9e2 Ramdisk (XEN) MODULE[3]: 0000000002000000 - 0000000005ffffff Kernel (XEN) MODULE[4]: 0000000006000000 - 000000000600ffff XSM Policy (XEN) RESVD[0]: 00000000e20f4000 - 00000000ececd9e2 (XEN) (XEN) (XEN) Command line: xen-llc-colors=0-,4,2 llc-coloring=1 dom0_mem=2048M console=dtuart dtuart=serial2 dom0_max_vcpus=2 bootscrub=0 loglvl=all maxcpus=2 hmp-unsafe=true xsm=dummy console_timestamps=boot sync_console=yes pci-passthrough=yes iommu=on (XEN) parameter "xen-llc-colors" unknown! (XEN) parameter "llc-coloring" unknown! (XEN) [000000033f4e5d84] parameter "pci-passthrough" unknown! (XEN) [000000033fac1d51] Domain heap initialised (XEN) [000000033fac1e46] Booting using Device Tree (XEN) [000000033fac825a] Platform: Generic System (XEN) [ 0.000016] Looking for dtuart at "serial2", options "" Xen 4.22-unstable (XEN) [ 0.009682] Xen version 4.22-unstable (root@) (gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924) debug=y Thu Dec 11 11:38:43 UTC 2025 (XEN) [ 0.010785] Latest ChangeSet: Tue Dec 9 11:11:40 2025 +0100 git:e8d5baab50 (XEN) [ 0.011423] build-id: dce6d1882a57c340b9e4f3dd996b6df81b1d4459 (XEN) [ 0.011970] Console output is synchronous. (XEN) [ 0.012367] Processor: 00000000412fd050: "ARM Limited", variant: 0x2, part 0xd05,rev 0x0 (XEN) [ 0.013110] 64-bit Execution: (XEN) [ 0.013410] Processor Features: 0000000011112222 0000000000000010 (XEN) [ 0.013995] Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32 (XEN) [ 0.014625] Extensions: FloatingPoint AdvancedSIMD GICv3-SysReg (XEN) [ 0.015210] Debug Features: 0000000010305408 0000000000000000 (XEN) [ 0.015765] Auxiliary Features: 0000000000000000 0000000000000000 (XEN) [ 0.016350] Memory Model Features: 0000000000101122 0000000010212122 (XEN) [ 0.016957] ISA Features: 0000100010211120 0000000000100001 (XEN) [ 0.017505] 32-bit Execution: (XEN) [ 0.017805] Processor Features: 0000000010000131:0000000010011011 (XEN) [ 0.018390] Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle (XEN) [ 0.018975] Extensions: GenericTimer Security (XEN) [ 0.019425] Debug Features: 0000000004010088 (XEN) [ 0.019853] Auxiliary Features: 0000000000000000 (XEN) [ 0.020310] Memory Model Features: 0000000010201105 0000000040000000 (XEN) [ 0.020917] 0000000001260000 0000000002122211 (XEN) [ 0.021525] ISA Features: 0000000002101110 0000000013112111 0000000021232042 (XEN) [ 0.022192] 0000000001112131 0000000000011142 0000000001011121 (XEN) [ 0.022866] Using SMC Calling Convention v1.5 (XEN) [ 0.023286] Using PSCI v1.1 (XEN) [ 0.023571] SMP: Allowing 2 CPUs (XEN) [ 0.023894] enabled workaround for: ARM erratum 1530923 (XEN) [ 0.024408] Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 24000 KHz (XEN) [ 0.025143] GICv3 initialization: (XEN) [ 0.025473] gic_dist_addr=0x000000fe600000 (XEN) [ 0.025916] gic_maintenance_irq=25 (XEN) [ 0.026298] gic_rdist_stride=0 (XEN) [ 0.026650] gic_rdist_regions=1 (XEN) [ 0.027010] redistributor regions: (XEN) [ 0.027393] - region 0: 0x000000fe680000 - 0x000000fe780000 (XEN) [ 0.027978] GICv3: 512 lines, (IID 0201743b). (XEN) [ 0.028427] GICv3: CPU0: Found redistributor in region 0 @00000a004001c000 (XEN) [ 0.029066] XSM Framework v1.0.1 initialized (XEN) [ 0.029477] xsm: Policy len = 0x0000000000010000 start at 0x0000000006000000 (XEN) [ 0.030796] Using scheduler: SMP Credit Scheduler rev2 (credit2) (XEN) [ 0.031358] Initializing Credit2 scheduler (XEN) [ 0.031756] load_precision_shift: 18 (XEN) [ 0.032115] load_window_shift: 30 (XEN) [ 0.032453] underload_balance_tolerance: 0 (XEN) [ 0.032858] overload_balance_tolerance: -3 (XEN) [ 0.033263] runqueues arrangement: socket (XEN) [ 0.033660] cap enforcement granularity: 10ms (XEN) [ 0.034088] load tracking window length 1073741824 ns (XEN) [ 0.034698] Allocated console ring of 16 KiB. (XEN) [ 0.035119] CPU0: Guest atomics will try 2 times before pausing the domain (XEN) [ 0.035807] Bringing up CPU1 (XEN) [ 0.036195] GICv3: CPU1: Found redistributor in region 0 @00000a004003c000 (XEN) [ 0.036831] CPU1: Guest atomics will try 8 times before pausing the domain (XEN) [ 0.037469] Brought up 2 CPUs (XEN) [ 0.037769] CPU 1 booted. (XEN) [ 0.038169] I/O virtualisation disabled (XEN) [ 0.038543] P2M: 40-bit IPA with 40-bit PA and 16-bit VMID (XEN) [ 0.039061] P2M: 3 levels with order-1 root, VTCR 0x00000000800a3558 (XEN) [ 0.039703] Scheduling granularity: cpu, 1 CPU per sched-resource (XEN) [ 0.040275] Initializing Credit2 scheduler (XEN) [ 0.040672] load_precision_shift: 18 (XEN) [ 0.041032] load_window_shift: 30 (XEN) [ 0.041370] underload_balance_tolerance: 0 (XEN) [ 0.041775] overload_balance_tolerance: -3 (XEN) [ 0.042179] runqueues arrangement: socket (XEN) [ 0.042578] cap enforcement granularity: 10ms (XEN) [ 0.043005] load tracking window length 1073741824 ns (XEN) [ 0.043488] Adding cpu 0 to runqueue 0 (XEN) [ 0.043855] First cpu on runqueue, activating (XEN) [ 0.044289] Adding cpu 1 to runqueue 0 (XEN) [ 0.044666] Using SCMI with SMC ID: 0x82000010 (XEN) [ 0.045481] alternatives: Patching with alt table 00000a00002eead0 -> 00000a00002f0000 (XEN) [ 0.046546] SCMI: d0 init (XEN) [ 0.046938] *** LOADING DOMAIN 0 *** (XEN) [ 0.047291] Loading d0 kernel from boot module @ 0000000002000000 (XEN) [ 0.047860] Loading ramdisk from boot module @ 00000000e20f4000 (XEN) [ 0.048422] Grant table range: 0x00000049000000-0x00000049040000 (XEN) [ 0.048985] Allocating 1:1 mappings totalling 2048MB for dom0: (XEN) [ 0.523065] BANK[0] 0x00000060000000-0x000000e0000000 (2048MB) (XEN) [ 0.536595] Allocating PPI 16 for event channel interrupt (XEN) [ 0.537255] d0: extended region 0: 0x200000->0x49000000 (XEN) [ 0.537750] d0: extended region 1: 0x49200000->0x60000000 (XEN) [ 0.538259] d0: extended region 2: 0x100000000->0x3fc000000 (XEN) [ 0.539933] Loading zImage from 0000000002000000 to 0000000060000000-0000000064000000 (XEN) [ 1.185001] Loading d0 initrd from 00000000e20f4000 to 0x0000000068200000-0x0000000072fd99e3 (XEN) [ 2.913163] Loading d0 DTB to 0x0000000068000000-0x0000000068005ba6 (XEN) [ 2.914910] Initial low memory virq threshold set at 0x4000 pages. (XEN) [ 2.915778] (XEN) [ 2.915950] **************************************** (XEN) [ 2.916423] Panic on CPU 0: (XEN) [ 2.916708] test_symbols: non-zero offset (0x24) unexpected (XEN) [ 2.917233] **************************************** (XEN) [ 2.917705] (XEN) [ 2.917878] Reboot in five seconds...
