On 10/06/2014 12:27 AM, Gilles Chanteperdrix wrote:
> Hi Jan,
>
> I noticed two problems with the fp_regs_set routine. First, in
> kernel-space, it does not set all the values of the "vec" array,
> it only sets the first one, before running the series of vmovupd
> instructions. I believe this issue can be avoided with the following
> patch:
>
> diff --git a/kernel/cobalt/arch/x86/include/asm/xenomai/uapi/fptest.h
> b/kernel/cobalt/arch/x86/include/asm/xenomai/uapi/fptest.h
> index d6a9a68..d406cc3 100644
> --- a/kernel/cobalt/arch/x86/include/asm/xenomai/uapi/fptest.h
> +++ b/kernel/cobalt/arch/x86/include/asm/xenomai/uapi/fptest.h
> @@ -39,7 +39,7 @@ static inline void fp_regs_set(int features, unsigned int
> val)
> "vmovupd %0,%%ymm5;"
> "vmovupd %0,%%ymm6;"
> "vmovupd %0,%%ymm7;"
> - : : "m" (vec[0]));
> + : : "m"(vec[0]), "m"(vec[1]), "m"(vec[2]), "m"(vec[3]));
> } else if (features & __COBALT_HAVE_SSE2) {
> __asm__ __volatile__(
> "movupd %0,%%xmm0;"
> @@ -50,7 +50,7 @@ static inline void fp_regs_set(int features, unsigned int
> val)
> "movupd %0,%%xmm5;"
> "movupd %0,%%xmm6;"
> "movupd %0,%%xmm7;"
> - : : "m" (vec[0]));
> + : : "m"(vec[0]), "m"(vec[1]), "m"(vec[2]), "m"(vec[3]));
> }
> }
>
> The other issue is that in user-space, when compiling with -Os, the
> following code is generated for fp_regs_set:
>
> 0000000000000000 <fp_regs_set>:
> 0: 89 74 24 dc mov %esi,-0x24(%rsp)
> 4: 89 f6 mov %esi,%esi
> 6: 48 c7 44 24 e8 00 00 movq $0x0,-0x18(%rsp)
> d: 00 00
> f: 48 89 74 24 e0 mov %rsi,-0x20(%rsp)
> 14: 48 89 74 24 f0 mov %rsi,-0x10(%rsp)
> 19: 48 c7 44 24 f8 00 00 movq $0x0,-0x8(%rsp)
> 20: 00 00
> 22: db 44 24 dc fildl -0x24(%rsp)
> 26: db 44 24 dc fildl -0x24(%rsp)
> 2a: db 44 24 dc fildl -0x24(%rsp)
> 2e: db 44 24 dc fildl -0x24(%rsp)
> 32: db 44 24 dc fildl -0x24(%rsp)
> 36: db 44 24 dc fildl -0x24(%rsp)
> 3a: db 44 24 dc fildl -0x24(%rsp)
> 3e: db 44 24 dc fildl -0x24(%rsp)
> 42: 40 f6 c7 02 test $0x2,%dil
> 46: 74 31 je 79 <fp_regs_set+0x79>
> 48: c5 fd 10 44 24 e0 vmovupd -0x20(%rsp),%ymm0
> 4e: c5 fd 10 4c 24 e0 vmovupd -0x20(%rsp),%ymm1
> 54: c5 fd 10 54 24 e0 vmovupd -0x20(%rsp),%ymm2
> 5a: c5 fd 10 5c 24 e0 vmovupd -0x20(%rsp),%ymm3
> 60: c5 fd 10 64 24 e0 vmovupd -0x20(%rsp),%ymm4
> 66: c5 fd 10 6c 24 e0 vmovupd -0x20(%rsp),%ymm5
> 6c: c5 fd 10 74 24 e0 vmovupd -0x20(%rsp),%ymm6
> 72: c5 fd 10 7c 24 e0 vmovupd -0x20(%rsp),%ymm7
> 78: c3 retq
> 79: 40 80 e7 01 and $0x1,%dil
> 7d: 74 30 je af <fp_regs_set+0xaf>
> 7f: 66 0f 10 44 24 e0 movupd -0x20(%rsp),%xmm0
> 85: 66 0f 10 4c 24 e0 movupd -0x20(%rsp),%xmm1
> 8b: 66 0f 10 54 24 e0 movupd -0x20(%rsp),%xmm2
> 91: 66 0f 10 5c 24 e0 movupd -0x20(%rsp),%xmm3
> 97: 66 0f 10 64 24 e0 movupd -0x20(%rsp),%xmm4
> 9d: 66 0f 10 6c 24 e0 movupd -0x20(%rsp),%xmm5
> a3: 66 0f 10 74 24 e0 movupd -0x20(%rsp),%xmm6
> a9: 66 0f 10 7c 24 e0 movupd -0x20(%rsp),%xmm7
> af: c3 retq
>
> Unless I missed some details, a negative offset of the rsp does not
> make sense. I have not found a workaround for this issue (other than
> not using the -Os option), however I believe, somehow, something is
> missing to the assembly to tell gcc that it should allocate this
> space on stack, because we are using it.
>
> Any ideas?
Never mind, it seems on x86_64, there is a 128 bytes red zone below the
stack, so this is valid (we are using 4 x 9 = 36 bytes). I guess the bug
I observed in user-space was the same as in kernel-space (vec vector
only partially filled if gcc is not told that the 4 values are used).
--
Gilles.
_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai