On Tue, 2022-05-03 at 10:41 +0000, Bezdeka, Florian via Xenomai wrote:
> Hi all,
> 
> it seems that I'm able to reproduce a register (or stack) corruption on
> x86.
> 
> The problem does not appear when running the Xenomai testsuite
> (especially switchtest) without any additional load. Stressing Linux
> with stress-ng makes the test fail.
> 
> Kernel: 4.19.231-cip68
> Xenomai: 3.2.1
> Hardware:
>  - Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
>  - 32 cores
> stress-ng cmdline:
>  stress-ng --cpu 16 --io 8 --vm 4 --vm-bytes 128M --fork 8
> 
> Any ideas how to debug that? Any additional config options that
> could/should be enabled?
> 
> Any advice is welcome...
> 
> Adding Richard to CC, he mentioned some undiscovered possible stack
> corruption as well. As registers are stored on the stack, there might
> be a pattern.
> 
> dmesg (from one xeno-test run):
> [  184.461138] sched: RT throttling activated
> [  250.243970] arch/x86/xenomai/ipipe/include/asm/xenomai/fptest.h:43: 
> Warning: Linux is compiled to use FPU in kernel-s.
> [  250.243970] For this reason, switchtest can not test using FPU in Linux 
> kernel-space.
> [  250.244148] r1: 2147483648 != 5
> [  250.375609] r2: 2147483648 != 5
> [  250.394381] r3: 2147483648 != 5
> [  250.413155] r4: 2147483648 != 5
> [  250.431924] r5: 2147483648 != 5
> [  250.450694] r6: 2147483648 != 5
> [  250.469466] r7: 2147483648 != 5
> [  250.488240] r4: 2147483648 != 5
> [  250.507011] r5: 2147483648 != 5
> [  250.525784] r6: 2147483648 != 5
> [  250.544555] r7: 2147483648 != 5
> [  250.563325] r6: 2147483648 != 5
> [  250.582097] r7: 2147483648 != 5
> [  250.600869] r5: 2147483648 != 5
> [  250.619643] r6: 2147483648 != 5
> [  250.638412] r7: 2147483648 != 5
> [  250.657184] r2: 2147483648 != 5
> [  250.675957] r3: 2147483648 != 5
> [  250.694728] r4: 2147483648 != 5
> [  250.713500] r5: 2147483648 != 5
> [  250.732271] r6: 2147483648 != 5
> [  250.751043] r7: 2147483648 != 5
> [  250.769816] r7: 2147483648 != 5
> [  250.788587] r4: 2147483648 != 6
> [  250.807360] r5: 2147483648 != 6
> [  250.826130] r6: 2147483648 != 6
> [  250.844902] r7: 2147483648 != 6
> [  250.863675] r6: 2147483648 != 5
> [  250.882447] r7: 2147483648 != 5
> [  250.901219] r2: 2147483648 != 5
> [  250.919990] r3: 2147483648 != 5
> [  250.938762] r4: 2147483648 != 5
> [  250.957534] r5: 2147483648 != 5
> [  250.976305] r6: 2147483648 != 5
> [  250.995076] r7: 2147483648 != 5
> [  251.013853] r6: 2147483648 != 5
> [  251.032621] r7: 2147483648 != 5
> [  251.051393] r6: 2147483648 != 6
> [  251.070164] r7: 2147483648 != 6
> [  251.088935] r7: 2147483648 != 6
> [  251.107709] r5: 2147483648 != 6
> [  251.126480] r6: 2147483648 != 6
> [  251.145252] r7: 2147483648 != 6

Some more logs from the switchtest itself, when running "switchtest
only" after the first failed xeno-test above:

/usr/lib/xenomai/testsuite/switchtest -T 30
== Testing FPU check routines...
r0: 1 != 2
r1: 1 != 2
r2: 1 != 2
r3: 1 != 2
r4: 1 != 2
r5: 1 != 2
r6: 1 != 2
r7: 1 != 2
ymm0: 1/1 != 2/2
ymm1: 1/1 != 2/2
ymm2: 1/1 != 2/2
ymm3: 1/1 != 2/2
ymm4: 1/1 != 2/2
ymm5: 1/1 != 2/2
ymm6: 1/1 != 2/2
ymm7: 1/1 != 2/2
== FPU check routines: OK.
== Threads: [snip]
Error after context switch from task 4(rtk_fp4-4) to task 5(rtk_fp_ufpp4-5),
FPU registers were set to 2147483648 (unidentified task)
Error after context switch from task 4(rtk_fp12-4) to task 5(rtk_fp_ufpp12-5),
FPU registers were set to 2147483648 (unidentified task)
Error after context switch from task 4(rtk_fp24-4) to task 5(rtk_fp_ufpp24-5),
FPU registers were set to 2147483648 (unidentified task)
Error after context switch from task 4(rtk_fp17-4) to task 5(rtk_fp_ufpp17-5),
FPU registers were set to 2147483648 (unidentified task)
Error after context switch from task 4(rtk_fp18-4) to task 5(rtk_fp_ufpp18-5),
FPU registers were set to 2147483648 (unidentified task)
Error after context switch from task 4(rtk_fp21-4) to task 5(rtk_fp_ufpp21-5),
FPU registers were set to 2147483648 (unidentified task)
Error after context switch from task 5(rtk_fp_ufpp28-5) to task 
6(rtk_fp_ufpp28-6),
FPU registers were set to 2147483648 (unidentified task)

while dmesg holds:
[ 5472.456523] r7: 2147483648 != 5
[ 5472.475313] r6: 2147483648 != 5
[ 5472.494083] r7: 2147483648 != 5
[ 5472.512854] r5: 2147483648 != 5
[ 5472.531625] r6: 2147483648 != 5
[ 5472.550398] r7: 2147483648 != 5
[ 5472.569168] r7: 2147483648 != 5
[ 5472.587941] r0: 2147483648 != 5
[ 5472.606713] r1: 2147483648 != 5
[ 5472.625485] r2: 2147483648 != 5
[ 5472.644257] r3: 2147483648 != 5
[ 5472.663028] r4: 2147483648 != 5
[ 5472.681800] r5: 2147483648 != 5
[ 5472.700571] r6: 2147483648 != 5
[ 5472.719342] r7: 2147483648 != 5
[ 5472.738114] r6: 2147483648 != 5
[ 5472.756887] r7: 2147483648 != 5
[ 5472.775657] r1: 2147483648 != 5
[ 5472.794430] r2: 2147483648 != 5
[ 5472.813200] r3: 2147483648 != 5
[ 5472.831974] r4: 2147483648 != 5
[ 5472.850744] r5: 2147483648 != 5
[ 5472.869517] r6: 2147483648 != 5
[ 5472.888289] r7: 2147483648 != 5
[ 5472.907061] r6: 2147483648 != 6
[ 5472.925833] r7: 2147483648 != 6

> 
> Best regards,
> Florian

Reply via email to