Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Mike Stump
On Mar 13, 2006, at 12:16 AM, Paolo Bonzini wrote: PR/21195 is about inlining the SSE builtins. These are special because, for example, you probably would prefer GDB to not step into them, but just execute them. :-) We have an APPLE LOCAL patch to remove the debug information associated

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Richard Guenther
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > On 3/13/06, Dan Kegel <[EMAIL PROTECTED]> wrote: > > Is there a bugzilla entry describing the bug Richard is fixing? > > If not, it'd be nice to have, if for no other reason than > > it would show up naturally when people look for bugs fixed

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Richard Guenther
On 3/13/06, Dan Kegel <[EMAIL PROTECTED]> wrote: > Is there a bugzilla entry describing the bug Richard is fixing? > If not, it'd be nice to have, if for no other reason than > it would show up naturally when people look for bugs fixed in gcc-4.1.1. > > I can create one, but it'd be better if someo

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Dan Kegel
Is there a bugzilla entry describing the bug Richard is fixing? If not, it'd be nice to have, if for no other reason than it would show up naturally when people look for bugs fixed in gcc-4.1.1. I can create one, but it'd be better if someone actually involved in the action did. - Dan -- Wine for

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread tbp
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > I don't think this is related, and a quick check with the patch shows > still unaligned > moves to the stack. Patience is a virtue i guess :) Is there good chances your inlining fix will hit mainline soon?

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Richard Guenther
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > > http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00739.html > > /me ventilates. > > You're my hero. > A double+ hero on top of that. > http://gcc.gnu

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread tbp
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00739.html > /me ventilates. > You're my hero. A double+ hero on top of that. http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00737.html I think i've hi

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread tbp
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00739.html /me ventilates. You're my hero.

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Richard Guenther
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > I see the bug and will have a fix in a moment. > You made my day. Or you're about to. Unless you're lying and i'll have > to curse you for 7 generations. http://gcc.gnu.org/ml/gcc-patches/2006-

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread tbp
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > I see the bug and will have a fix in a moment. You made my day. Or you're about to. Unless you're lying and i'll have to curse you for 7 generations.

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread tbp
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > Of course from 4.1.0 on you can easier stick an > __attribute__((flatten)) on the function you want everything inlined to > (finalblow) and get everything inlined into it. But that's not really what i'm after: i expect trivial functions to g

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Richard Guenther
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > > Starting with gcc 4.1.0 we have inline heuristics in place that will > > _always_ > > > inline such simple "wrappers". So, if thi

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Richard Guenther
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > Starting with gcc 4.1.0 we have inline heuristics in place that will > _always_ > > inline such simple "wrappers". So, if this still happens, there is a bug > > in the > > heuristics and tha

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread tbp
On 3/13/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > Starting with gcc 4.1.0 we have inline heuristics in place that will _always_ > inline such simple "wrappers". So, if this still happens, there is a bug in > the > heuristics and that should be reported. Before 4.1.0 the heuristics were

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Richard Guenther
On 3/13/06, tbp <[EMAIL PROTECTED]> wrote: > On 3/13/06, Paolo Bonzini <[EMAIL PROTECTED]> wrote: > >Wait wait. PR/21195 is about inlining > > the SSE builtins. > No. PR/21195 was really about inline heuristic going ballistic. > Those intrinsics are thin wrappers around builtins, and ultimately >

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread tbp
On 3/13/06, Paolo Bonzini <[EMAIL PROTECTED]> wrote: >Wait wait. PR/21195 is about inlining > the SSE builtins. No. PR/21195 was really about inline heuristic going ballistic. Those intrinsics are thin wrappers around builtins, and ultimately resolve to a couple of operations. Typical C++ (accesso

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-13 Thread Paolo Bonzini
tbp wrote: On 3/13/06, Andrew Pinski <[EMAIL PROTECTED]> wrote: Actually the best way of improving the inline heuristics is to get a real testcase (and not some benchmark) where the inline heuristics is messed up. Ah, you mean a brand new testcase because PR-21195 wasn't good enough? show u

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread tbp
On 3/13/06, Andrew Pinski <[EMAIL PROTECTED]> wrote: > Actually the best way of improving the inline heuristics is to get > a real testcase (and not some benchmark) where the inline heuristics > is messed up. Ah, you mean a brand new testcase because PR-21195 wasn't good enough? $ /usr/local/gcc-

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Gabriel Dos Reis
Andrew Pinski <[EMAIL PROTECTED]> writes: | > | > On 3/12/06, Steven Bosscher <[EMAIL PROTECTED]> wrote: | > > > Yes, why is the benchmark not valid? | > > | > > It is valid. We should understand why this behavior has changed so drastically. | > This benchmark maybe useless, it still exposes a

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Gabriel Dos Reis
tbp <[EMAIL PROTECTED]> writes: | On 3/12/06, Steven Bosscher <[EMAIL PROTECTED]> wrote: | > > Yes, why is the benchmark not valid? | > | > It is valid. We should understand why this behavior has changed so drastically. | This benchmark maybe useless, it still exposes a weakness of gcc4. At | le

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Andrew Pinski
> > On 3/12/06, Steven Bosscher <[EMAIL PROTECTED]> wrote: > > > Yes, why is the benchmark not valid? > > > > It is valid. We should understand why this behavior has changed so > > drastically. > This benchmark maybe useless, it still exposes a weakness of gcc4. At > least it's not news to me: >

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread tbp
On 3/12/06, Steven Bosscher <[EMAIL PROTECTED]> wrote: > > Yes, why is the benchmark not valid? > > It is valid. We should understand why this behavior has changed so > drastically. This benchmark maybe useless, it still exposes a weakness of gcc4. At least it's not news to me: http://gcc.gnu.org

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Richard Guenther
On 3/12/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > So, I tried to reproduce the slowdown and on i686 get all > memcpy/memset inlined on 3.3, 3.4, 4.0 and 4.1. On ppc I get calls to > memcpy/memset in all cases. This might be more a glibc issue I think. So my suggestion is to file a bugzil

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Richard Guenther
On 12 Mar 2006 18:09:26 +0100, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: > "Richard Guenther" <[EMAIL PROTECTED]> writes: > > [...] > > | this one should be measured. But note that the benchmark is a > | no-op and can be validly optimizes to int main() { return 0; } by the > | compiler. This is

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Gabriel Dos Reis
"Richard Guenther" <[EMAIL PROTECTED]> writes: [...] | this one should be measured. But note that the benchmark is a | no-op and can be validly optimizes to int main() { return 0; } by the | compiler. This is why I call it a stupid benchmark. please let's refrain from getting into that back ho

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Richard Guenther
On 3/12/06, Ernest L. Williams Jr. <[EMAIL PROTECTED]> wrote: > On Sun, 2006-03-12 at 15:17 +0100, Richard Guenther wrote: > > On 3/12/06, Ernest L. Williams Jr. <[EMAIL PROTECTED]> wrote: > > > > In any case: memcpy/memset inlining is broken in current GCC at least > > > > on athlon arch. > > > >

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Ernest L. Williams Jr.
On Sun, 2006-03-12 at 15:17 +0100, Richard Guenther wrote: > On 3/12/06, Ernest L. Williams Jr. <[EMAIL PROTECTED]> wrote: > > > In any case: memcpy/memset inlining is broken in current GCC at least > > > on athlon arch. > > let's say it changed. Also memcpy/memset "inlining" is not regular inlin

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Nickolay Kolchin
On 3/12/06, Steven Bosscher <[EMAIL PROTECTED]> wrote: > > It is valid. We should understand why this behavior has changed so > drastically. > I've attached assembler output from different compiler versions: 3.4.5-athlon-xp: gcc-3.4.5 -O3 -march=athlon-xp 3.4.5-pentium4: gcc-3.4.5 -O3 -march=pe

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Richard Guenther
On 3/12/06, Ernest L. Williams Jr. <[EMAIL PROTECTED]> wrote: > > In any case: memcpy/memset inlining is broken in current GCC at least > > on athlon arch. let's say it changed. Also memcpy/memset "inlining" is not regular inlining but driven by completely different heuristics. > Yes, why is the

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Steven Bosscher
> Yes, why is the benchmark not valid? It is valid. We should understand why this behavior has changed so drastically. Gr. Steven

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Ernest L. Williams Jr.
On Sun, 2006-03-12 at 16:55 +0300, Nickolay Kolchin wrote: > On 3/12/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > > On 3/12/06, Nickolay Kolchin <[EMAIL PROTECTED]> wrote: > > > During "bashmark" memory benchmark perfomance analyze, I found 100x >

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Nickolay Kolchin
On 3/12/06, Richard Guenther <[EMAIL PROTECTED]> wrote: > On 3/12/06, Nickolay Kolchin <[EMAIL PROTECTED]> wrote: > > During "bashmark" memory benchmark perfomance analyze, I found 100x > > perfomance > > regression between gcc 3.4.5 and gcc 4.X. >

Re: 100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-12 Thread Richard Guenther
On 3/12/06, Nickolay Kolchin <[EMAIL PROTECTED]> wrote: > During "bashmark" memory benchmark perfomance analyze, I found 100x perfomance > regression between gcc 3.4.5 and gcc 4.X. > > -- test_cmd.cpp (simplified bashmark memory RW test) --- > #include >

100x perfomance regression between gcc 3.4.5 and gcc 4.X

2006-03-11 Thread Nickolay Kolchin
During "bashmark" memory benchmark perfomance analyze, I found 100x perfomance regression between gcc 3.4.5 and gcc 4.X. -- test_cmd.cpp (simplified bashmark memory RW test) --- #include #include template static void int_membench(uint8_t* mb1, uint8_t* mb2) { for(uint32_