On Tue, 2022-05-03 at 10:41 +0000, Bezdeka, Florian via Xenomai wrote: > Hi all, > > it seems that I'm able to reproduce a register (or stack) corruption on > x86. > > The problem does not appear when running the Xenomai testsuite > (especially switchtest) without any additional load. Stressing Linux > with stress-ng makes the test fail. > > Kernel: 4.19.231-cip68 > Xenomai: 3.2.1 > Hardware: > - Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz > - 32 cores > stress-ng cmdline: > stress-ng --cpu 16 --io 8 --vm 4 --vm-bytes 128M --fork 8 > > Any ideas how to debug that? Any additional config options that > could/should be enabled? > > Any advice is welcome... > > Adding Richard to CC, he mentioned some undiscovered possible stack > corruption as well. As registers are stored on the stack, there might > be a pattern. > > dmesg (from one xeno-test run): > [ 184.461138] sched: RT throttling activated > [ 250.243970] arch/x86/xenomai/ipipe/include/asm/xenomai/fptest.h:43: > Warning: Linux is compiled to use FPU in kernel-s. > [ 250.243970] For this reason, switchtest can not test using FPU in Linux > kernel-space. > [ 250.244148] r1: 2147483648 != 5 > [ 250.375609] r2: 2147483648 != 5 > [ 250.394381] r3: 2147483648 != 5 > [ 250.413155] r4: 2147483648 != 5 > [ 250.431924] r5: 2147483648 != 5 > [ 250.450694] r6: 2147483648 != 5 > [ 250.469466] r7: 2147483648 != 5 > [ 250.488240] r4: 2147483648 != 5 > [ 250.507011] r5: 2147483648 != 5 > [ 250.525784] r6: 2147483648 != 5 > [ 250.544555] r7: 2147483648 != 5 > [ 250.563325] r6: 2147483648 != 5 > [ 250.582097] r7: 2147483648 != 5 > [ 250.600869] r5: 2147483648 != 5 > [ 250.619643] r6: 2147483648 != 5 > [ 250.638412] r7: 2147483648 != 5 > [ 250.657184] r2: 2147483648 != 5 > [ 250.675957] r3: 2147483648 != 5 > [ 250.694728] r4: 2147483648 != 5 > [ 250.713500] r5: 2147483648 != 5 > [ 250.732271] r6: 2147483648 != 5 > [ 250.751043] r7: 2147483648 != 5 > [ 250.769816] r7: 2147483648 != 5 > [ 250.788587] r4: 2147483648 != 6 > [ 250.807360] r5: 2147483648 != 6 > [ 250.826130] r6: 2147483648 != 6 > [ 250.844902] r7: 2147483648 != 6 > [ 250.863675] r6: 2147483648 != 5 > [ 250.882447] r7: 2147483648 != 5 > [ 250.901219] r2: 2147483648 != 5 > [ 250.919990] r3: 2147483648 != 5 > [ 250.938762] r4: 2147483648 != 5 > [ 250.957534] r5: 2147483648 != 5 > [ 250.976305] r6: 2147483648 != 5 > [ 250.995076] r7: 2147483648 != 5 > [ 251.013853] r6: 2147483648 != 5 > [ 251.032621] r7: 2147483648 != 5 > [ 251.051393] r6: 2147483648 != 6 > [ 251.070164] r7: 2147483648 != 6 > [ 251.088935] r7: 2147483648 != 6 > [ 251.107709] r5: 2147483648 != 6 > [ 251.126480] r6: 2147483648 != 6 > [ 251.145252] r7: 2147483648 != 6
Some more logs from the switchtest itself, when running "switchtest only" after the first failed xeno-test above: /usr/lib/xenomai/testsuite/switchtest -T 30 == Testing FPU check routines... r0: 1 != 2 r1: 1 != 2 r2: 1 != 2 r3: 1 != 2 r4: 1 != 2 r5: 1 != 2 r6: 1 != 2 r7: 1 != 2 ymm0: 1/1 != 2/2 ymm1: 1/1 != 2/2 ymm2: 1/1 != 2/2 ymm3: 1/1 != 2/2 ymm4: 1/1 != 2/2 ymm5: 1/1 != 2/2 ymm6: 1/1 != 2/2 ymm7: 1/1 != 2/2 == FPU check routines: OK. == Threads: [snip] Error after context switch from task 4(rtk_fp4-4) to task 5(rtk_fp_ufpp4-5), FPU registers were set to 2147483648 (unidentified task) Error after context switch from task 4(rtk_fp12-4) to task 5(rtk_fp_ufpp12-5), FPU registers were set to 2147483648 (unidentified task) Error after context switch from task 4(rtk_fp24-4) to task 5(rtk_fp_ufpp24-5), FPU registers were set to 2147483648 (unidentified task) Error after context switch from task 4(rtk_fp17-4) to task 5(rtk_fp_ufpp17-5), FPU registers were set to 2147483648 (unidentified task) Error after context switch from task 4(rtk_fp18-4) to task 5(rtk_fp_ufpp18-5), FPU registers were set to 2147483648 (unidentified task) Error after context switch from task 4(rtk_fp21-4) to task 5(rtk_fp_ufpp21-5), FPU registers were set to 2147483648 (unidentified task) Error after context switch from task 5(rtk_fp_ufpp28-5) to task 6(rtk_fp_ufpp28-6), FPU registers were set to 2147483648 (unidentified task) while dmesg holds: [ 5472.456523] r7: 2147483648 != 5 [ 5472.475313] r6: 2147483648 != 5 [ 5472.494083] r7: 2147483648 != 5 [ 5472.512854] r5: 2147483648 != 5 [ 5472.531625] r6: 2147483648 != 5 [ 5472.550398] r7: 2147483648 != 5 [ 5472.569168] r7: 2147483648 != 5 [ 5472.587941] r0: 2147483648 != 5 [ 5472.606713] r1: 2147483648 != 5 [ 5472.625485] r2: 2147483648 != 5 [ 5472.644257] r3: 2147483648 != 5 [ 5472.663028] r4: 2147483648 != 5 [ 5472.681800] r5: 2147483648 != 5 [ 5472.700571] r6: 2147483648 != 5 [ 5472.719342] r7: 2147483648 != 5 [ 5472.738114] r6: 2147483648 != 5 [ 5472.756887] r7: 2147483648 != 5 [ 5472.775657] r1: 2147483648 != 5 [ 5472.794430] r2: 2147483648 != 5 [ 5472.813200] r3: 2147483648 != 5 [ 5472.831974] r4: 2147483648 != 5 [ 5472.850744] r5: 2147483648 != 5 [ 5472.869517] r6: 2147483648 != 5 [ 5472.888289] r7: 2147483648 != 5 [ 5472.907061] r6: 2147483648 != 6 [ 5472.925833] r7: 2147483648 != 6 > > Best regards, > Florian
