[Bug c/40363] New: Nonoptimal save/restore registers

2009-06-06 Thread vvv at ru dot ru
-- Summary: Nonoptimal save/restore registers Product: gcc Version: 4.5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru http

[Bug target/40171] GCC does not pass -mtune and -march options to assembler!

2009-05-25 Thread vvv at ru dot ru
--- Comment #4 from vvv at ru dot ru 2009-05-25 19:54 --- (In reply to comment #2) This is very odd? What is the assembler doing that the compiler isn't? There are exist some optimizations impossible without exact knowledge of address and opcodes, One example avoiding of branch

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-20 Thread vvv at ru dot ru
--- Comment #49 from vvv at ru dot ru 2009-05-20 21:38 --- (In reply to comment #48) How this patches work? Is it required some special options? # /media/disk-1/B/bin/gcc --version gcc (GCC) 4.5.0 20090520 (experimental) # cat test.c void f(int i) { if (i == 1) F(1

[Bug c/40171] New: GCC does not pass -mtune and -march options to assembler!

2009-05-16 Thread vvv at ru dot ru
! Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40171

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-14 Thread vvv at ru dot ru
--- Comment #30 from vvv at ru dot ru 2009-05-14 09:01 --- Created an attachment (id=17863) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17863action=view) Testing tool. Here is results of my testing. Code: align 128 test_cikl: rept 14 ; 14 if SH=0, 15 if SH=1

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-14 Thread vvv at ru dot ru
--- Comment #34 from vvv at ru dot ru 2009-05-14 19:43 --- (In reply to comment #32) Please make sure that you only test nop paddings for branch insns, not nop paddings for branch targets, which prefer 16byte alignment. Additional tests (for Core2) results: 1. Execution time don't

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #19 from vvv at ru dot ru 2009-05-13 11:42 --- (In reply to comment #18) No, .p2align is the right thing to do, given that GCC doesn't have 100% accurate information about instruction sizes (for e.g. inline asms it can't have, for stuff where branch shortening can

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #21 from vvv at ru dot ru 2009-05-13 17:13 --- I guess! Your patch is absolutely correct for AMD AthlonTM 64 and AMD OpteronTM processors, but it is nonoptimal for Intel processors. Because: 1. AMD limitation for 16-bytes page (memory range XXX0 - XXXF), but Intel

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #25 from vvv at ru dot ru 2009-05-13 18:56 --- (In reply to comment #22) CCing H.J for Intel optimization issues. VVV 1. AMD limitation for 16-bytes page (memory range XXX0 - XXXF), but VVV Intel limitation for 16-bytes chunk (memory range - +10h

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #26 from vvv at ru dot ru 2009-05-13 19:05 --- (In reply to comment #23) Note that we need something that works for the generic model as well, which in this case looks like it is the same as for AMD models. There is processor property TARGET_FOUR_JUMP_LIMIT, may be create

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-13 Thread vvv at ru dot ru
--- Comment #28 from vvv at ru dot ru 2009-05-13 19:18 --- (In reply to comment #24) Using padding to avoid 4 branches in 16byte chunk may not be a good idea since it will increase code size. It's enough only one byte NOP per 16-byte chunk for padding. But, IMHO, four branches in 16

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-05-12 Thread vvv at ru dot ru
--- Comment #17 from vvv at ru dot ru 2009-05-12 16:40 --- (In reply to comment #16) Created an attachment (id=17783) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17783action=view) [edit] gcc45-pr39942.patch Patch that attempts to take into account .p2align directives

[Bug c/40093] New: Optimization by functios reordering.

2009-05-10 Thread vvv at ru dot ru
ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40093

[Bug c/40093] Optimization by functios reordering.

2009-05-10 Thread vvv at ru dot ru
--- Comment #1 from vvv at ru dot ru 2009-05-10 16:43 --- Created an attachment (id=17847) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17847action=view) Example direct/inverse calls Simple example. RDTSC ticks for direct and inverse sequence of calls. -- http://gcc.gnu.org

[Bug middle-end/40093] Optimization by functios reordering.

2009-05-10 Thread vvv at ru dot ru
--- Comment #3 from vvv at ru dot ru 2009-05-10 18:08 --- (In reply to comment #2) This should have been done already with cgraph order. Unfortunately, I can see inverse order only in separate source file. Inverse but not optimized. Example: // file order1.c #include stdio.h main(int

[Bug middle-end/40093] Optimization by functios reordering.

2009-05-10 Thread vvv at ru dot ru
--- Comment #5 from vvv at ru dot ru 2009-05-10 18:20 --- (In reply to comment #4) Well you need whole program to get the behavior which you want. Yes. Of course, it's no problem for small single-programmer project, but it's problem for big projects like Linux Kernel. -- http

[Bug c/40072] New: Nonoptimal code - CMOVxx %eax,%edi; mov %edi,%eax; retq

2009-05-08 Thread vvv at ru dot ru
Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40072

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-29 Thread vvv at ru dot ru
--- Comment #11 from vvv at ru dot ru 2009-04-29 07:46 --- (In reply to comment #8) From config/i386/i386.c: /* AMD Athlon works faster when RET is not destination of conditional jump or directly preceded by other jump instruction. We avoid the penalty by inserting NOP just

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-29 Thread vvv at ru dot ru
--- Comment #12 from vvv at ru dot ru 2009-04-29 07:55 --- (In reply to comment #9) So that explains it, Use -Os or attribute cold if you want NOPs to be gone. But my measurements on Core 2 Duo P8600 show that push %ebp mov %esp,%ebp leave ret _faster_ then push %ebp mov %esp

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-29 Thread vvv at ru dot ru
--- Comment #15 from vvv at ru dot ru 2009-04-29 19:16 --- One more example 5-bytes nop between leaveq and retq. # cat test.c void wait_for_enter() { int u = getchar(); while (!u) u = getchar()-13; } main() { wait_for_enter(); } # gcc -o t.out test.c

[Bug c/39942] New: Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
Component: c AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
--- Comment #2 from vvv at ru dot ru 2009-04-28 17:04 --- Created an attachment (id=17776) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17776action=view) Source file from Linx Kernel 2.6.29.1 See static void set_blitting_type -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
--- Comment #3 from vvv at ru dot ru 2009-04-28 17:10 --- Additional examples from Linux Kernel 2.6.29.1: (Note: conditional statement at the end of all fuctions!) = linux/drivers/video/console/bitblit.c void fbcon_set_bitops(struct fbcon_ops *ops) { ops-bmove

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
--- Comment #4 from vvv at ru dot ru 2009-04-28 17:15 --- Created an attachment (id=1) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=1action=view) Simple example from Linux See two functons: static void pre_schedule_rt static void switched_from_rt -- http

[Bug target/39942] Nonoptimal code - leaveq; xchg %ax,%ax; retq

2009-04-28 Thread vvv at ru dot ru
--- Comment #6 from vvv at ru dot ru 2009-04-28 21:18 --- Let's compile file test.c //#file test.c extern int F(int m); void func(int x) { int u = F(x); while (u) u = F(u)*3+1; } # gcc -o t.out test.c -c -O2 # objdump -d t.out t.out: file format

[Bug c/39549] New: Nonoptimal byte load. mov (%rdi),%al better then movzbl (%rdi),%eax

2009-03-24 Thread vvv at ru dot ru
AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39549

[Bug c/39520] New: Empty function translated to repz retq.

2009-03-22 Thread vvv at ru dot ru
: vvv at ru dot ru http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39520