Hi all,
While working on an arm64 s2ram series for Xen I have hit what looks
like very strange behaviour in symbols_lookup() as exercised by test-symbols.
The series is in the branch referenced at [1]. All patches there except
the last one build and pass CI; adding only the last patch makes the CI
job referenced at [2] start failing.
Note that the tests in that job are built without CONFIG_SYSTEM_SUSPEND
enabled, so most of the code introduced by the s2ram branch is not
compiled at all for that configuration. That is why I initially did not
expect my series to affect this job.
To investigate, I tried to reproduce the issue locally. I downloaded the
xen-config artifact from the failing job [3] and used it to build Xen
with my local aarch64 cross compiler. With this local toolchain
I could not reproduce the failure, and the resulting .config changed slightly
compared to the job's config. The relevant part of the diff looks like this:
diff --git a/xen/.config b/xen-config
index 057553f510..44dcf6bacc 100644
--- a/xen/.config
+++ b/xen-config
@@ -3,11 +3,11 @@
# Xen/arm 4.22-unstable Configuration
#
CONFIG_CC_IS_GCC=y
-CONFIG_GCC_VERSION=130300
+CONFIG_GCC_VERSION=120201
CONFIG_CLANG_VERSION=0
CONFIG_LD_IS_GNU=y
CONFIG_CC_HAS_ASM_INLINE=y
-CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y
+CONFIG_GCC_ASM_GOTO_OUTPUT_BROKEN=y
CONFIG_FUNCTION_ALIGNMENT_4B=y
CONFIG_FUNCTION_ALIGNMENT=4
CONFIG_ARM_64=y
So there is at least a difference in GCC version and asm-goto related
Kconfig options between the CI environment and my local one.
After that I tried rebuilding inside the same Docker image that GitLab
CI uses:
registry.gitlab.com/xen-project/xen/alpine:3.18-arm64v8
When I build Xen in that container, using the same branch, the problem
reproduces in the same way as in the CI job.
Even more confusingly, adding extra prints in test_symbols just before
the calls to test_lookup() makes the problem disappear. This made me
suspect some undefined behaviour or logic issue that is very sensitive
to optimisation or layout changes.
At this point, to me it looks like something might be wrong in the
logic inside symbols_lookup() (or in how the test drives it), but I may
well be missing an important detail about the expected behaviour here or
about the toolchain assumptions.
Could someone familiar with symbols_lookup() and the test-symbols code
please take a look or suggest what else I should check? If there is a
maintainer who would be willing to own this issue, I would be happy to
provide more data or try additional experiments as needed.
Thanks in advance for any hints or guidance.
Best regards,
Mykola
[1] https://gitlab.com/xen-project/people/mykola_kvach/xen/-/commits/reg
[2] https://gitlab.com/xen-project/people/mykola_kvach/xen/-/jobs/12394355047
[3]
https://gitlab.com/xen-project/people/mykola_kvach/xen/-/jobs/12394354611/artifacts/file/xen-config