--- Comment #47 from Joey dot ye at intel dot com 2009-03-12 06:51 ---
(In reply to comment #46)
Created an attachment (id=17444)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17444action=view) [edit]
gcc.target/i386/stackalign/longlong-2.c for -mnostackalign on darwin10
/sw/src
--- Comment #35 from Joey dot ye at intel dot com 2009-03-04 01:41 ---
(In reply to comment #32)
I don't see the reason for optimize_function_for_size_p (cfun), care to
back
up with benchmarks that forcing dynamic realignment for long long variables
with -mpreferred-stack-boundary
--- Comment #3 from Joey dot ye at intel dot com 2009-02-27 02:53 ---
(In reply to comment #2)
Created an attachment (id=17368)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17368action=view) [edit]
A patch
Does this patch make sense?
It works fine.
--
http://gcc.gnu.org
--- Comment #31 from Joey dot ye at intel dot com 2009-02-23 03:15 ---
How about this patch?
1. Only reduce DI mode when -Os
2. Ignore TYPE_USER_ALIGN, so that stack realign happens for case in
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39137#c28, which IMHO is
acceptable.
Index
--- Comment #20 from Joey dot ye at intel dot com 2009-02-17 09:18 ---
(In reply to comment #19)
Just for the record, here is an unsuccessful attempt to avoid stack
realignment
just because of DImode for -m32 or because of DFmode at -m32 -Os. This patch
unfortunately caused
--- Comment #12 from Joey dot ye at intel dot com 2009-02-16 08:49 ---
Created an attachment (id=17305)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17305action=view)
New patch attached
Test finished. No regression with emx_avx_sim. Wait to checkin to 4.5
--
Joey dot ye
--- Comment #10 from Joey dot ye at intel dot com 2009-02-12 15:20 ---
(In reply to comment #8)
We still have push and mov. I guess it may be the best we can do.
But please run full 32 and 64bit testsuite with your patch as well
as under emx-avx-sim.
full 32/64 bit test pass
--- Comment #5 from Joey dot ye at intel dot com 2009-02-12 01:45 ---
Stack realign is finalized by
stack_realign = (incoming_stack_boundary
(current_function_is_leaf
? crtl-max_used_stack_slot_alignment
--- Comment #7 from Joey dot ye at intel dot com 2009-02-12 02:26 ---
Created an attachment (id=17283)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17283action=view)
A patch to fix this problem
Impact to other test unknown. Test undergoing.
HJ, can you also help to verify
--- Comment #6 from Joey dot ye at intel dot com 2009-02-12 02:33 ---
(In reply to comment #5)
If ACCUMULATE_OUTGOING_ARGS is off, ECX will be used
for stack alignment and it may lead to code size
increase due to register spill since ia32 has very
few registers.
The code increase
--- Comment #9 from Joey dot ye at intel dot com 2009-02-12 02:40 ---
(In reply to comment #8)
We still have push and mov. I guess it may be the best we can do.
I believe so too.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
--- Comment #10 from Joey dot ye at intel dot com 2009-02-11 01:03 ---
(In reply to comment #9)
Created an attachment (id=17279)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17279action=view) [edit]
A patch to add a new -malign-double= option
This patch looks OK to me
--- Comment #1 from Joey dot ye at intel dot com 2009-02-10 05:35 ---
Argument need 32 bytes alignment, No way to guarantee the argument won't be
spilled. That's why stack adjustment is there.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39146
--- Comment #1 from Joey dot ye at intel dot com 2009-02-04 02:17 ---
GCC doesn't follow x86-64 psABI on this case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39082
--- Comment #20 from Joey dot ye at intel dot com 2009-01-26 11:49 ---
(In reply to comment #10)
This is caused by stack alignment change, revision 138335. Joey and
Xuepeng will look into it after holiday, Feb. 1.
This must be stack alignment change. Looks we didn't handle stack
--- Comment #2 from Joey dot ye at intel dot com 2009-01-21 02:40 ---
Following case isn't vecterized with -O3 on x86_64 either, although arrays are
aligned:
#include stdio.h
float __attribute__((aligned(16))) in1[] = {
1.2, 3.5, 1.7, 2.8
};
float __attribute__((aligned
--- Comment #7 from Joey dot ye at intel dot com 2009-01-14 10:08 ---
(In reply to comment #5)
Joern, re. comment #4, Richi refers to my patch to enable PRE at -Os, see
[1].
An extension to this patch that we tested on x86 machines, is to disable PRE
for scalar integer registers
--- Comment #5 from Joey dot ye at intel dot com 2009-01-07 02:45 ---
More places with BIGGEST_ALIGN:
$ grep -r (aligned) .|grep attribute|grep -v testsuite|grep -v texi
./libstdc++-v3/libsupc++/eh_alloc.cc:typedef char
one_buffer[EMERGENCY_OBJ_SIZE] __attribute__((aligned));
./libjava
--- Comment #45 from Joey dot ye at intel dot com 2008-12-30 01:49 ---
(In reply to comment #44)
Does anyone have new numbers?
Fixed on both i386/x86_64:
x86_64:
4.4 (trunk 142847): 5.4s
4.3.2 release: 5.4s
4.2.4 release: 5.4s
i386:
4.4 (trunk 142847): 2.7s
4.3.2 release
--- Comment #6 from Joey dot ye at intel dot com 2008-12-30 02:50 ---
(In reply to comment #4)
Revision 141860 caused 30% slowdown on 454.calculix in SPEC CPU 2006
with -O2 -ffast-math on Linux/Intel64.
This regression has been fixed in some revision between 142187 and 142212
--- Comment #12 from Joey dot ye at intel dot com 2008-12-10 03:01 ---
Fixed at trunk 142631
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37948
--- Comment #8 from Joey dot ye at intel dot com 2008-12-01 02:18 ---
Yes. It fixes 416/481 on 32 bits and 481 on 64 bits.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280
--- Comment #6 from Joey dot ye at intel dot com 2008-11-28 15:11 ---
Patch at http://gcc.gnu.org/ml/gcc-patches/2008-11/msg01428.html fixed this
regression.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280
--- Comment #4 from Joey dot ye at intel dot com 2008-11-28 03:39 ---
142250 doesn't fix this regression. 416.gamess and 481.wrf still fail.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38280
--- Comment #8 from Joey dot ye at intel dot com 2008-11-21 12:00 ---
In short, set A={-favx, -ffma}, set B={-f3dnow, -f3dnowa, -fsse4a, -fsse5}. Any
option combination from both sets should be prohibited.
Please add more options into these set in case I missed any.
--
http
--- Comment #23 from Joey dot ye at intel dot com 2008-10-28 01:19 ---
(In reply to comment #22)
Created an attachment (id=16571)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16571action=view) [edit]
A patch to re-enable regmove
After applying this patch to re-enable regmove, I
--- Comment #18 from Joey dot ye at intel dot com 2008-10-24 08:36 ---
Created an attachment (id=16536)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16536action=view)
Reduced performance case from cpu2006/454.calculix
50% regression with IRA core2 on trunk revsion 140514
--- Comment #21 from Joey dot ye at intel dot com 2008-10-25 04:14 ---
To me scheduler is irrelevant here. GCC has no core2 pipeline description so
the instruction scheduling doesn't looks optimized. But for OOO processor like
core2, IMHO scheduling shouldn't make that much difference
--- Comment #17 from Joey dot ye at intel dot com 2008-10-23 08:42 ---
CPU2006/454.calculix has about 10% regression with IRA + core2 + fpmath=sse on
Core2 ix86:
IRAIRA_core2 NO_IRA_core2
454.calculix 1.00 0.901.01
Revision: trunk 140514
Options
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37571
--- Comment #1 from Joey dot ye at intel dot com 2008-09-18 16:01 ---
Root cause is that instruction length of fused jcc is set to 16, which prevent
the block from merging and copying. For some reason Core2 runs poorly with a
unmerged branch block under certain circonstances.
Following
--- Comment #11 from Joey dot ye at intel dot com 2008-08-28 06:14 ---
(In reply to comment #4)
We got
Running 416.gamess ref base lnx32-gcc default
416.gamess: copy #0 non-zero return code (rc=0, signal=11)
416.gamess: copy #0 non-zero return code (rc=0, signal=11)
416.gamess
--- Comment #7 from Joey dot ye at intel dot com 2008-08-27 08:07 ---
Created an attachment (id=16155)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16155action=view)
Test case from 2006.434.zeusmp
Though fail to extract a smaller case, hopeful it helpful.
Compile with gfortran
--- Comment #8 from Joey dot ye at intel dot com 2008-08-27 08:11 ---
GDB output:
(gdb) b tranx1_
Breakpoint 1 at 0x43a670
(gdb) r
Breakpoint 1, 0x0043a670 in tranx1_ ()
(gdb) b *0x43accd
Breakpoint 2 at 0x43accd
(gdb) b *0x43acf4
Breakpoint 3 at 0x43acf4
(gdb) b *0x43ad2f
--- Comment #1 from Joey dot ye at intel dot com 2008-08-19 08:19 ---
Check out such code in i386.c:
/* Figure out whether to use ordered or unordered fp comparisons.
Return the appropriate mode to use. */
enum machine_mode
ix86_fp_compare_mode (enum rtx_code code ATTRIBUTE_UNUSED
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37124
--- Comment #6 from Joey dot ye at intel dot com 2008-08-11 05:52 ---
(In reply to comment #4)
If you remove -ffast-math, does it miscompare?
Passes without -ffast-math.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983
--- Comment #3 from Joey dot ye at intel dot com 2008-08-07 07:55 ---
Although 138318 fixes the compiler ICE, it miscompile with -O3 -ffast-math on
x86-64:
Running 172.mgrid ref base o3 default
*** Miscompare of mgrid.out, see
/home/jye2/cpu2000/benchspec/CFP2000/172.mgrid/run
--- Comment #9 from Joey dot ye at intel dot com 2008-08-06 08:05 ---
Fixed
--
Joey dot ye at intel dot com changed:
What|Removed |Added
Status|NEW
--- Comment #18 from Joey dot ye at intel dot com 2008-08-04 07:24 ---
(In reply to comment #9)
Joey, I think the problem is the usage of STACK_BOUNDARY / BITS_PER_UNIT
for stack alignment. On MacOS, STACK_BOUNDARY 128 on ia32. Shouldn't
we use UNITS_PER_WORD in some cases? Please
--- Comment #6 from Joey dot ye at intel dot com 2008-08-04 08:28 ---
(In reply to comment #3)
Joey, when we compute frame layout, we don't count the duplicated
return address pushed onto stack when DRAP is used. Also when we
push return address, shouldn't we use -UNITS_PER_WORD
--- Comment #7 from Joey dot ye at intel dot com 2008-08-04 09:03 ---
This problem is associated with -mpreferred-stack-boundary=2, rather than with
stack alignment. Following case fails on trunk before merging with stack
branch:
$ cat y1.c
/* PR middle-end/37010 */
/* { dg-do run
--- Comment #8 from Joey dot ye at intel dot com 2008-08-04 09:11 ---
Root cause is that outgoing parameter frame is aligned based on stack pointer.
Namely, address_of_stack_param = SP + offset + fixed_padding.
With -mpreferred-stack-boundary=2, alignment of SP is only 4 bytes
--- Comment #11 from Joey dot ye at intel dot com 2008-08-04 14:11 ---
(In reply to comment #10)
Did you mean we needed 2 additional 'and $-16, sp insns to align the
stack? I don't think so.
Definitely not.
Solution 1: Just ignore it. __m128 parameter shouldn't be passed
--- Comment #15 from Joey dot ye at intel dot com 2008-08-05 01:01 ---
(In reply to comment #12)
I think the problem is in
/* Set offset to aligned because the realigned frame tarts from here. */
if (stack_realign_fp)
offset = (offset + stack_alignment_needed -1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36983
--- Comment #2 from Joey dot ye at intel dot com 2008-07-31 10:50 ---
Yes. Just notice that latest trunk passes.
--
Joey dot ye at intel dot com changed:
What|Removed |Added
miscompiles 447.dealII
Product: gcc
Version: 4.4.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http
--- Comment #1 from Joey dot ye at intel dot com 2008-07-31 11:33 ---
Created an attachment (id=15982)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15982action=view)
Preprocessed test case
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36986
--- Comment #1 from Joey dot ye at intel dot com 2008-07-16 13:14 ---
Fixed by revision 137859
--
Joey dot ye at intel dot com changed:
What|Removed |Added
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36835
--- Comment #1 from Joey dot ye at intel dot com 2008-07-11 05:46 ---
Created an attachment (id=15897)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=15897action=view)
Small test case reduced from cpu2006.464.h264ref
/home/jye2/work/bug-37665 gcc -v
Using built-in specs.
Target
--- Comment #2 from Joey dot ye at intel dot com 2008-07-11 05:49 ---
Effect of line 76
buffer_frame[0] = InitFullness;
is eliminated by optimizer due to bug in GCC.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36765
--- Comment #13 from Joey dot ye at intel dot com 2008-05-05 07:22 ---
It is helpful. Root cause is that memory allocated by new is only aligned to 8
bytes under i386. In your case, object Environment is allocated by new and its
constructor tried to use movdqa to initialize its members
--- Comment #14 from Joey dot ye at intel dot com 2008-05-05 07:29 ---
HJ,
AVX will have the similar problem on x86_64, whose new only returns object
aligned at 16 bytes. Dynamically allocated __m256 won't be guaranteed at 32
bytes boundary.
--
http://gcc.gnu.org/bugzilla
--- Comment #8 from Joey dot ye at intel dot com 2008-04-30 10:53 ---
(In reply to comment #6)
(In reply to comment #4)
have you tried to compile with -march=core2 -mfpmath=sse -msse?
Yes, I've compiled it as following:
% g++ -g -O3 -march=core2 -mfpmath=sse -msse -ftemplate
--- Comment #9 from Joey dot ye at intel dot com 2008-04-30 10:56 ---
(In reply to comment #8)
-m32 doesn't work. You have to use 4.3.0 release branch. Recent mainline
change
Correction: -m32 is a must, but doesn't fix all. Options I'm using:
g++ -g -O3 -march=core2 -mfpmath=sse
--- Comment #11 from Joey dot ye at intel dot com 2008-05-01 04:31 ---
Tim,
Since it doesn't link, I can only check the .s file. There are a couple of
constructor called Environment, which one is the problemetic function?
grep Environment kernel_build.s|grep glob
...
.globl
-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36078
--- Comment #5 from Joey dot ye at intel dot com 2008-04-29 10:41 ---
Can be related to http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36078, where I do
have a small case.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36074
--- Comment #5 from Joey dot ye at intel dot com 2008-01-23 01:45 ---
(In reply to comment #2)
I bet if you put jj in struct and don't have a nested function, this will be
the same issue.
Not the same. In fact it passes if not referenced by a nested function. The
root is in tree
--- Comment #1 from Joey dot ye at intel dot com 2008-01-22 06:38 ---
This patch should fix it:
Index: gcc/tree-nested.c
===
--- gcc/tree-nested.c (revision 131342)
+++ gcc/tree-nested.c (working copy)
@@ -183,6 +183,10
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi
--- Comment #28 from Joey dot ye at intel dot com 2007-10-23 02:23 ---
Got similar result on x86_64, Core 2 improves 24% from 129469 to 129504. That's
great.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32921
mf-runtime.h after make -j2 install
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: libmudflap
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
--- Comment #2 from Joey dot ye at intel dot com 2007-08-20 08:53 ---
(In reply to comment #1)
Nobody does make install with -j.
I guess so, that's why I set it minor. But does that mean error is expected
with -j? My script had -j by accident and it costed me hours to identify the
root
--- Comment #1 from Joey dot ye at intel dot com 2007-07-13 09:21 ---
Created an attachment (id=13909)
-- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13909action=view)
Reduced testcase
GCC crashes with gcc -O2 -fsee case-see.c -c
Fails at all recent 4.3 trunk.
--
http
compile CPU2000 with -fsee
Product: gcc
Version: 4.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: Joey dot ye at intel dot com
--- Comment #2 from Joey dot ye at intel dot com 2007-07-13 09:27 ---
Root cause looks like at see.c line 1643:
emit_insn_after (merged_ref, ref);
delete_insn (ref);
where merged_ref and ref have the same INSN_UID. delete_insn will clear the df
information of that UID
--- Comment #4 from Joey dot ye at intel dot com 2007-07-04 01:17 ---
126198 brought the regression
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32598
70 matches
Mail list logo