Daniel, Thomas, first of all I'm sorry it took me so long to reply. ( Also, if there are some stupid mistakes wrt arm assembly, please forgive me. I've tried to make my analisys resilent to maybe-present off-by-ones, and reply sooner than later, not spending more 20min*several-days to be 100% correct. )
On Thu, Dec 06, 2012 at 02:23:25PM +0100, Thomas Preud'homme wrote: > Le jeudi 6 décembre 2012 04:50:55, vous avez écrit : > > > > > > Hi Kirill, > > > > > > did you make any progress on the issue since Daniel's comments? Could you > > > let me know when you push a patch so that I can test it and bump master > > > to equal the mob branch? > > > > Hi Thomas, > > > > Unfortunately not that much. I was trying to investigate > > bounds-checking failures for `tcc -b -run tcc ...` tests first and it looks > > I'm close to understand what is going on (old bcheck code is not adapted > > to /proc/sys/kernel/randomize_va_space = 2). The plan was to go to arm > > issues then, but I'm going very slow, because for tcc I have only 15-20 > > minutes per day and it is all in underground. Sorry. Could this issue > > please wait me for some time? (probably week or so...) > > That's a worthwhile bug to fix. The ARM issue can wait, sure. We can also > disable the test for ARM as this code never worked for ARM and it's thus not > a > regression. On Fri, Nov 30, 2012 at 12:43:07AM +0100, Daniel Glöckner wrote: > On Thu, Nov 29, 2012 at 09:43:33PM +0400, Kirill Smelkov wrote: > > tcc first saves r0 & r1, and only then fp: > > > > $ ./arm-eabi-tcc -c y.c > > $ arm-linux-gnueabi-objdump -d y.o > > > > 00000000 <f>: > > 0: e1a0c00d mov ip, sp > > 4: e92d0003 push {r0, r1} > > 8: e92d5800 push {fp, ip, lr} > > c: e28db00c add fp, sp, #12 > > 10: e1a00000 nop ; (mov r0, r0) > > 14: e91ba800 ldmdb fp, {fp, sp, pc} > > I chose this stack frame format because it simplifies access to the > arguments from within tinycc. At that time I didn't think of trying to > be compatible to anything just to allow stack traces. I see. Let's first see what is happenning with gcc and decide how it should be, then come back here to maybe fixing/adjusting tcc. > > gcc does not, but it will save r5,r6,etc. before fp(=r13 iirc) ... > > > > $ arm-linux-gnueabihf-gcc-4.7 -marm -c y.c > > $ arm-linux-gnueabihf-objdump -d y.o > > > > 00000000 <f>: > > 0: e52db004 push {fp} ; (str fp, [sp, #-4]!) > > 4: e28db000 add fp, sp, #0 > > 8: e24dd00c sub sp, sp, #12 > > c: e50b0008 str r0, [fp, #-8] > > 10: e50b100c str r1, [fp, #-12] > > 14: e28bd000 add sp, fp, #0 > > 18: e8bd0800 pop {fp} > > 1c: e12fff1e bx lr > > You should not look at a leaf function to derive the GCC stack frame. > It is probably different from the generic stack frame because GCC > knows this function will never be part of a stack trace done by > another function of the final program. Fair enough, thanks. > Take a look at arch/arm/kernel/stacktrace.c inside the Linux kernel. > They assume that every function pushes {fp, sp, lr, pc} and that > fp points to the address where pc is stored. I don't know why > GCC pushes pc. I can only imagine this being done to keep the > stack aligned to 8 bytes. Or maybe it is for exception handling > or association of stack frame debug info.. from arch/arm/kernel/stacktrace.c: * With framepointer enabled, a simple function prologue looks like this: * mov ip, sp * stmdb sp!, {fp, ip, lr, pc} * sub fp, ip, #4 * * A simple function epilogue looks like this: * ldm sp, {fp, sp, pc} so since fp is initialized from ip-4 (old_sp-4) it points to (near ?) fp, not pc, right? also linux kernel is compiled with -mapcs -mno-sched-prolog -mabi=apcs-gnu and this changes frame layout compared to what gcc produces by default. (sorry for that near - i don't remember stack and stmdb arm details good enough) > > > 8: e92d0870 push {r4, r5, r6, fp} ; NOTE r4... go > > before fp > > > 14: e28db00c add fp, sp, #12 > > > 5c: e59b0000 ldr r0, [fp] ; BUG here! > > Why is this a bug in GCC? It will load the old value of fp to r0. Hmm, after push here is how stack looks like: r4 <- old_sp r5 <- sp + 12 r6 fp <- sp (again, off-by-one error probably, but anywayy that sp+12 can't point to old fp). ~~~~ Anyway, I've prepared more simple complete example to demonstrate the bug: ---- 8< ---- fp.c #include <stdio.h> void *func(int x) { char *fp1 = __builtin_frame_address(1); printf("Hello World! %i\n", x); fp1 += x; return fp1; } int main() { void *fp0 = __builtin_frame_address(0); void *fp1 = func(0); printf("fp0: %p fp1: %p\n", fp0, fp1); return 0; } The code should output Hello World! 0 fp0: <addr> fp1: <the-same-addr!> right? on i386: $ gcc -o fp_i386 -O2 fp.c $ ./fp_i386 Hello World! 0 fp0: 0xafeda538 fp1: 0xafeda538 but it does not on arm, look: $ arm-linux-gnueabi-gcc-4.4 -marm -o fp_arm -O2 fp.c $ qemu-arm -L /usr/arm-linux-gnueabi ./fp_arm Hello World! 0 fp0: 0x4080010c fp1: 0x8414 and the problem here is that gcc, as I think, incorrectly setups fp: $ arm-linux-gnueabi-gcc-4.4 -marm -o fp.s -S -O2 fp.c $ cat fp.s .cpu arm9tdmi .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 18, 4 .file "fp.c" .text .align 2 .global func .type func, %function func: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 stmfd sp!, {r4, r5, fp, lr} add fp, sp, #12 mov r4, r0 ldr r5, [fp, #0] ; <- bug here, should be [fp, #-4] mov r1, r4 ldr r0, .L3 bl printf add r0, r5, r4 sub sp, fp, #12 ldmfd sp!, {r4, r5, fp, lr} bx lr .L4: .align 2 .L3: .word .LC0 .size func, .-func .align 2 .global main .type main, %function main: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 stmfd sp!, {fp, lr} mov r0, #0 add fp, sp, #4 bl func mov r1, fp mov r2, r0 ldr r0, .L7 bl printf mov r0, #0 sub sp, fp, #4 ldmfd sp!, {fp, lr} bx lr .L8: .align 2 .L7: .word .LC1 .size main, .-main .section .rodata.str1.4,"aMS",%progbits,1 .align 2 .LC0: .ascii "Hello World! %i\012\000" .space 3 .LC1: .ascii "fp0: %p fp1: %p\012\000" .ident "GCC: (Debian 4.4.6-11) 4.4.6" .section .note.GNU-stack,"",%progbits and if I apply the following patch diff --git a/t/fp.s b/t/fp1.s index 2047b05..85bb402 100644 --- a/t/fp.s +++ b/t/fp1.s @@ -20,7 +20,7 @@ func: stmfd sp!, {r4, r5, fp, lr} add fp, sp, #12 mov r4, r0 - ldr r5, [fp, #0] + ldr r5, [fp, #-4] mov r1, r4 ldr r0, .L3 bl printf it does work: $ arm-linux-gnueabi-gcc-4.4 -o fp fp.s $ arm-linux-gnueabi-gcc-4.4 -o fp1 fp1.s $ qemu-arm -L /usr/arm-linux-gnueabi ./fp Hello World! 0 fp0: 0x4080012c fp1: 0x8414 $ qemu-arm -L /usr/arm-linux-gnueabi ./fp1 Hello World! 0 fp0: 0x4080012c fp1: 0x4080012c that's why I think gcc generated wrong code here. And to the reference, here is how code would look like if being part of linux: $ arm-linux-gnueabi-gcc-4.4 -o fpapcs.s -marm -mapcs -mno-sched-prolog -S -O2 fp.c $ arm-linux-gnueabi-gcc-4.4 -o fpapcs_gnu.s -marm -mapcs -mno-sched-prolog -mabi=apcs-gnu -S -O2 fp.c $ diff -u fp.s fpapcs.s --- fp.s 2012-12-13 20:34:09.000000000 +0400 +++ fpapcs.s 2012-12-13 20:34:09.000000000 +0400 @@ -17,16 +17,17 @@ @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 - stmfd sp!, {r4, r5, fp, lr} - add fp, sp, #12 + mov ip, sp + stmfd sp!, {r4, r5, fp, ip, lr, pc} + sub fp, ip, #4 mov r4, r0 ldr r5, [fp, #0] mov r1, r4 ldr r0, .L3 bl printf add r0, r5, r4 - sub sp, fp, #12 - ldmfd sp!, {r4, r5, fp, lr} + sub sp, fp, #20 + ldmfd sp, {r4, r5, fp, sp, lr} bx lr .L4: .align 2 @@ -40,17 +41,18 @@ @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 1, uses_anonymous_args = 0 - stmfd sp!, {fp, lr} + mov ip, sp + stmfd sp!, {fp, ip, lr, pc} + sub fp, ip, #4 mov r0, #0 - add fp, sp, #4 bl func mov r1, fp mov r2, r0 ldr r0, .L7 bl printf mov r0, #0 - sub sp, fp, #4 - ldmfd sp!, {fp, lr} + sub sp, fp, #12 + ldmfd sp, {fp, sp, lr} bx lr .L8: .align 2 $ diff -u fpapcs.s fpapcs_gnu.s --- fpapcs.s 2012-12-13 20:34:09.000000000 +0400 +++ fpapcs_gnu.s 2012-12-13 20:32:51.000000000 +0400 @@ -1,13 +1,3 @@ - .cpu arm9tdmi - .fpu softvfp - .eabi_attribute 20, 1 - .eabi_attribute 21, 1 - .eabi_attribute 23, 3 - .eabi_attribute 24, 1 - .eabi_attribute 25, 1 - .eabi_attribute 26, 2 - .eabi_attribute 30, 2 - .eabi_attribute 18, 4 .file "fp.c" .text .align 2 but it does not work either: $ arm-linux-gnueabi-gcc-4.4 -o fpapcs -mapcs -mno-sched-prolog fpapcs.s $ qemu-arm -L /usr/arm-linux-gnueabi ./fpapcs Hello World! 0 fp0: 0x4080011c fp1: 0x83e0 ~~~~ There are many arm abis and maybe I'm doing something stupid, but to me all this looks like gcc generates incorrect code. > > ... as probably required by arm calling convention (not looked at the spec > > yet, but it seems reasanoble, given how push is really a block str and > > that str stores register in ascending order). > > I have once seen a page in MSDN describing the ARM stack frame of > Windows CE. I don't know if ARM specified how stack frames should look > like. >From gcc manual: `-mapcs-frame' Generate a stack frame that is compliant with the ARM Procedure Call Standard for all functions, even if this is not strictly necessary for correct execution of the code. Specifying `-fomit-frame-pointer' with this option will cause the stack frames not to be generated for leaf functions. The default is `-mno-apcs-frame'. so at least there is "ARM Procedure Call Standard". Sorry, no time to search for it... Thanks, Undertimed Kirill _______________________________________________ Tinycc-devel mailing list Tinycc-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/tinycc-devel