On Thu, Dec 11, 2025 at 1:15 PM Jan Beulich <[email protected]> wrote:
>
> On 11.12.2025 11:29, Mykola Kvach wrote:
> > While working on an arm64 s2ram series for Xen I have hit what looks
> > like very strange behaviour in symbols_lookup() as exercised by 
> > test-symbols.
> >
> > The series is in the branch referenced at [1]. All patches there except
> > the last one build and pass CI; adding only the last patch makes the CI
> > job referenced at [2] start failing.
> >
> > Note that the tests in that job are built without CONFIG_SYSTEM_SUSPEND
> > enabled, so most of the code introduced by the s2ram branch is not
> > compiled at all for that configuration. That is why I initially did not
> > expect my series to affect this job.
> >
> > To investigate, I tried to reproduce the issue locally. I downloaded the
> > xen-config artifact from the failing job [3] and used it to build Xen
> > with my local aarch64 cross compiler. With this local toolchain
> > I could not reproduce the failure, and the resulting .config changed 
> > slightly
> > compared to the job's config. The relevant part of the diff looks like this:
> >
> >     diff --git a/xen/.config b/xen-config
> >     index 057553f510..44dcf6bacc 100644
> >     --- a/xen/.config
> >     +++ b/xen-config
> >     @@ -3,11 +3,11 @@
> >      # Xen/arm 4.22-unstable Configuration
> >      #
> >      CONFIG_CC_IS_GCC=y
> >     -CONFIG_GCC_VERSION=130300
> >     +CONFIG_GCC_VERSION=120201
> >      CONFIG_CLANG_VERSION=0
> >      CONFIG_LD_IS_GNU=y
> >      CONFIG_CC_HAS_ASM_INLINE=y
> >     -CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
> >     +CONFIG_GCC_ASM_GOTO_OUTPUT_BROKEN=y
> >      CONFIG_FUNCTION_ALIGNMENT_4B=y
> >      CONFIG_FUNCTION_ALIGNMENT=4
> >      CONFIG_ARM_64=y
> >
> > So there is at least a difference in GCC version and asm-goto related
> > Kconfig options between the CI environment and my local one.
> >
> > After that I tried rebuilding inside the same Docker image that GitLab
> > CI uses:
> >
> >     registry.gitlab.com/xen-project/xen/alpine:3.18-arm64v8
> >
> > When I build Xen in that container, using the same branch, the problem
> > reproduces in the same way as in the CI job.
> >
> > Even more confusingly, adding extra prints in test_symbols just before
> > the calls to test_lookup() makes the problem disappear. This made me
> > suspect some undefined behaviour or logic issue that is very sensitive
> > to optimisation or layout changes.
>
> All symptoms described make me suspect you're hitting a problem we're
> already in the process of hunting down. Can you please take [1], make
> the small adjustment necessary to Arm's linking rule, and see whether
> you get a build failure in the case where right now you get a boot time
> crash? Of course no other changes to code or data layout should be done,
> or else you may observe false negatives.

I tested the issue with the provided patch, and it is still reproducible.

This is my working branch:

e8d5baab50 (HEAD -> reg) symbols: check table sizes don't change
between linking passes 2 and 3
e53439fdfc (xen_gitlab/reg) xen/arm: Add support for system suspend
triggered by hardware domain
eaa461f3b5 xen/arm: Implement PSCI SYSTEM_SUSPEND call (host interface)
4236fff9a4 xen/arm: Save/restore context on suspend/resume
a150f3d4bb xen/arm: Resume memory management on Xen resume

You can find the following line in the attached Xen boot log:

(XEN) [ 0.010785] Latest ChangeSet: Tue Dec 9 11:11:40 2025 +0100 git:e8d5baab50

>
> Jan
>
> [1] https://lists.xen.org/archives/html/xen-devel/2025-12/msg00390.html

Best regards,
Mykola
(XEN) Checking for initrd in /chosen
(XEN) Initrd 00000000e20f4000-00000000ececd9e2
(XEN) RAM: 0000000000200000 - 00000000efffffff
(XEN) RAM: 0000000100000000 - 00000003fbffffff
(XEN) RAM: 00000003fc500000 - 00000003ffefffff
(XEN) 
(XEN) MODULE[0]: 0000000049000000 - 000000004916bfff Xen         
(XEN) MODULE[1]: 00000000e20ea000 - 00000000e20f0fff Device Tree 
(XEN) MODULE[2]: 00000000e20f4000 - 00000000ececd9e2 Ramdisk     
(XEN) MODULE[3]: 0000000002000000 - 0000000005ffffff Kernel      
(XEN) MODULE[4]: 0000000006000000 - 000000000600ffff XSM Policy  
(XEN)  RESVD[0]: 00000000e20f4000 - 00000000ececd9e2
(XEN) 
(XEN) 
(XEN) Command line: xen-llc-colors=0-,4,2 llc-coloring=1 dom0_mem=2048M console=dtuart dtuart=serial2 dom0_max_vcpus=2 bootscrub=0 loglvl=all maxcpus=2 hmp-unsafe=true xsm=dummy console_timestamps=boot sync_console=yes pci-passthrough=yes iommu=on
(XEN) parameter "xen-llc-colors" unknown!
(XEN) parameter "llc-coloring" unknown!
(XEN) [000000033f4e5d84] parameter "pci-passthrough" unknown!
(XEN) [000000033fac1d51] Domain heap initialised
(XEN) [000000033fac1e46] Booting using Device Tree
(XEN) [000000033fac825a] Platform: Generic System
(XEN) [    0.000016] Looking for dtuart at "serial2", options ""
Xen 4.22-unstable
(XEN) [    0.009682] Xen version 4.22-unstable (root@) (gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924) debug=y Thu Dec 11 11:38:43 UTC 2025
(XEN) [    0.010785] Latest ChangeSet: Tue Dec 9 11:11:40 2025 +0100 git:e8d5baab50
(XEN) [    0.011423] build-id: dce6d1882a57c340b9e4f3dd996b6df81b1d4459
(XEN) [    0.011970] Console output is synchronous.
(XEN) [    0.012367] Processor: 00000000412fd050: "ARM Limited", variant: 0x2, part 0xd05,rev 0x0
(XEN) [    0.013110] 64-bit Execution:
(XEN) [    0.013410]   Processor Features: 0000000011112222 0000000000000010
(XEN) [    0.013995]     Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN) [    0.014625]     Extensions: FloatingPoint AdvancedSIMD GICv3-SysReg
(XEN) [    0.015210]   Debug Features: 0000000010305408 0000000000000000
(XEN) [    0.015765]   Auxiliary Features: 0000000000000000 0000000000000000
(XEN) [    0.016350]   Memory Model Features: 0000000000101122 0000000010212122
(XEN) [    0.016957]   ISA Features:  0000100010211120 0000000000100001
(XEN) [    0.017505] 32-bit Execution:
(XEN) [    0.017805]   Processor Features: 0000000010000131:0000000010011011
(XEN) [    0.018390]     Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN) [    0.018975]     Extensions: GenericTimer Security
(XEN) [    0.019425]   Debug Features: 0000000004010088
(XEN) [    0.019853]   Auxiliary Features: 0000000000000000
(XEN) [    0.020310]   Memory Model Features: 0000000010201105 0000000040000000
(XEN) [    0.020917]                          0000000001260000 0000000002122211
(XEN) [    0.021525]   ISA Features: 0000000002101110 0000000013112111 0000000021232042
(XEN) [    0.022192]                 0000000001112131 0000000000011142 0000000001011121
(XEN) [    0.022866] Using SMC Calling Convention v1.5
(XEN) [    0.023286] Using PSCI v1.1
(XEN) [    0.023571] SMP: Allowing 2 CPUs
(XEN) [    0.023894] enabled workaround for: ARM erratum 1530923
(XEN) [    0.024408] Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 24000 KHz
(XEN) [    0.025143] GICv3 initialization:
(XEN) [    0.025473]       gic_dist_addr=0x000000fe600000
(XEN) [    0.025916]       gic_maintenance_irq=25
(XEN) [    0.026298]       gic_rdist_stride=0
(XEN) [    0.026650]       gic_rdist_regions=1
(XEN) [    0.027010]       redistributor regions:
(XEN) [    0.027393]         - region 0: 0x000000fe680000 - 0x000000fe780000
(XEN) [    0.027978] GICv3: 512 lines, (IID 0201743b).
(XEN) [    0.028427] GICv3: CPU0: Found redistributor in region 0 @00000a004001c000
(XEN) [    0.029066] XSM Framework v1.0.1 initialized
(XEN) [    0.029477] xsm: Policy len = 0x0000000000010000 start at 0x0000000006000000
(XEN) [    0.030796] Using scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) [    0.031358] Initializing Credit2 scheduler
(XEN) [    0.031756]  load_precision_shift: 18
(XEN) [    0.032115]  load_window_shift: 30
(XEN) [    0.032453]  underload_balance_tolerance: 0
(XEN) [    0.032858]  overload_balance_tolerance: -3
(XEN) [    0.033263]  runqueues arrangement: socket
(XEN) [    0.033660]  cap enforcement granularity: 10ms
(XEN) [    0.034088] load tracking window length 1073741824 ns
(XEN) [    0.034698] Allocated console ring of 16 KiB.
(XEN) [    0.035119] CPU0: Guest atomics will try 2 times before pausing the domain
(XEN) [    0.035807] Bringing up CPU1
(XEN) [    0.036195] GICv3: CPU1: Found redistributor in region 0 @00000a004003c000
(XEN) [    0.036831] CPU1: Guest atomics will try 8 times before pausing the domain
(XEN) [    0.037469] Brought up 2 CPUs
(XEN) [    0.037769] CPU 1 booted.
(XEN) [    0.038169] I/O virtualisation disabled
(XEN) [    0.038543] P2M: 40-bit IPA with 40-bit PA and 16-bit VMID
(XEN) [    0.039061] P2M: 3 levels with order-1 root, VTCR 0x00000000800a3558
(XEN) [    0.039703] Scheduling granularity: cpu, 1 CPU per sched-resource
(XEN) [    0.040275] Initializing Credit2 scheduler
(XEN) [    0.040672]  load_precision_shift: 18
(XEN) [    0.041032]  load_window_shift: 30
(XEN) [    0.041370]  underload_balance_tolerance: 0
(XEN) [    0.041775]  overload_balance_tolerance: -3
(XEN) [    0.042179]  runqueues arrangement: socket
(XEN) [    0.042578]  cap enforcement granularity: 10ms
(XEN) [    0.043005] load tracking window length 1073741824 ns
(XEN) [    0.043488] Adding cpu 0 to runqueue 0
(XEN) [    0.043855]  First cpu on runqueue, activating
(XEN) [    0.044289] Adding cpu 1 to runqueue 0
(XEN) [    0.044666] Using SCMI with SMC ID: 0x82000010
(XEN) [    0.045481] alternatives: Patching with alt table 00000a00002eead0 -> 00000a00002f0000
(XEN) [    0.046546] SCMI: d0 init
(XEN) [    0.046938] *** LOADING DOMAIN 0 ***
(XEN) [    0.047291] Loading d0 kernel from boot module @ 0000000002000000
(XEN) [    0.047860] Loading ramdisk from boot module @ 00000000e20f4000
(XEN) [    0.048422] Grant table range: 0x00000049000000-0x00000049040000
(XEN) [    0.048985] Allocating 1:1 mappings totalling 2048MB for dom0:
(XEN) [    0.523065] BANK[0] 0x00000060000000-0x000000e0000000 (2048MB)
(XEN) [    0.536595] Allocating PPI 16 for event channel interrupt
(XEN) [    0.537255] d0: extended region 0: 0x200000->0x49000000
(XEN) [    0.537750] d0: extended region 1: 0x49200000->0x60000000
(XEN) [    0.538259] d0: extended region 2: 0x100000000->0x3fc000000
(XEN) [    0.539933] Loading zImage from 0000000002000000 to 0000000060000000-0000000064000000
(XEN) [    1.185001] Loading d0 initrd from 00000000e20f4000 to 0x0000000068200000-0x0000000072fd99e3
(XEN) [    2.913163] Loading d0 DTB to 0x0000000068000000-0x0000000068005ba6
(XEN) [    2.914910] Initial low memory virq threshold set at 0x4000 pages.
(XEN) [    2.915778] 
(XEN) [    2.915950] ****************************************
(XEN) [    2.916423] Panic on CPU 0:
(XEN) [    2.916708] test_symbols: non-zero offset (0x24) unexpected
(XEN) [    2.917233] ****************************************
(XEN) [    2.917705] 
(XEN) [    2.917878] Reboot in five seconds...

Reply via email to