http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
Thorsten Glaser tg at mirbsd dot org changed:
What|Removed |Added
CC||tg at mirbsd dot
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #1 from Julian Andres Klode j...@jak-linux.org 2010-10-26
14:30:24 UTC ---
Created attachment 22162
-- http://gcc.gnu.org/bugzilla/attachment.cgi?id=22162
Clang's assember
Attaching the assembler output from clang, it should help
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
Paolo Carlini paolo.carlini at oracle dot com changed:
What|Removed |Added
CC|
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #3 from Julian Andres Klode j...@jak-linux.org 2010-10-26
14:32:27 UTC ---
System information:
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.4.5-5'
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #4 from Jonathan Wakely redi at gcc dot gnu.org 2010-10-26
14:47:12 UTC ---
GCC's output is significantly faster at -O3 or without the noinline attribute
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #5 from Julian Andres Klode j...@jak-linux.org 2010-10-26
14:53:24 UTC ---
(In reply to comment #4)
GCC's output is significantly faster at -O3 or without the noinline attribute
I just tested and at -O3, gcc-4.4 creates slow code
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #6 from Dominique d'Humieres dominiq at lps dot ens.fr 2010-10-26
14:59:18 UTC ---
You get this kind of speedup if the compiler knows that the result of the loop
is
sum=(b*(b-1)-a*(a-1))/2
In which case the timing is meaningless (it
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #7 from Julian Andres Klode j...@jak-linux.org 2010-10-26
15:00:37 UTC ---
(In reply to comment #5)
(In reply to comment #4)
GCC's output is significantly faster at -O3 or without the noinline
attribute
I just tested and at
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #8 from Julian Andres Klode j...@jak-linux.org 2010-10-26
15:25:56 UTC ---
(In reply to comment #6)
You get this kind of speedup if the compiler knows that the result of the loop
is
sum=(b*(b-1)-a*(a-1))/2
In which case the
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #9 from Jonathan Wakely redi at gcc dot gnu.org 2010-10-26
15:28:51 UTC ---
(In reply to comment #8)
Since the optimization seems to be mostly there in -O3, it's just a matter of
enabling it in -O2.
Or if you want all
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
Jakub Jelinek jakub at gcc dot gnu.org changed:
What|Removed |Added
CC||jakub at gcc dot
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #11 from Paolo Carlini paolo.carlini at oracle dot com 2010-10-26
15:42:58 UTC ---
Can we please stop talking about nano and giga numbers like kids? If an
optimization like complete loop unrolling is involved of course very small or
On Oct 26, 2010, at 7:30 AM, j...@jak-linux.org gcc-bugzi...@gcc.gnu.org
wrote:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #1 from Julian Andres Klode j...@jak-linux.org
2010-10-26 14:30:24 UTC ---
Created attachment 22162
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #12 from pinskia at gmail dot com pinskia at gmail dot com
2010-10-26 15:56:20 UTC ---
On Oct 26, 2010, at 7:30 AM, j...@jak-linux.org gcc-bugzi...@gcc.gnu.org
wrote:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
---
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #13 from Dominique d'Humieres dominiq at lps dot ens.fr
2010-10-26 16:36:05 UTC ---
This multiplication transformation is incorrect if the loop wraps
(unsigned always wraps; never overflows).
I think this is wrong: wrapping is
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
Jakub Jelinek jakub at gcc dot gnu.org changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #15 from Dominique d'Humieres dominiq at lps dot ens.fr
2010-10-26 17:15:31 UTC ---
For sum += 2 or sum += b sccp handles this, so I wonder whether it couldn't
handle even the sum += a case.
2 and b are constants while a is not.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #16 from Jakub Jelinek jakub at gcc dot gnu.org 2010-10-26
18:43:40 UTC ---
chrec_apply is called with
{a_4(D), +, {a_4(D) + 1, +, 1}_1}_1
chrec and ~a_4(D) + b_5(D) in x.
I wonder if this can be fixed just by recognizing such special
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #17 from Dominique d'Humieres dominiq at lps dot ens.fr
2010-10-26 18:53:49 UTC ---
Note that clang seems to know the general result: \sum_{i=a}^b p(i)=P(b), where
p(i) is a given polynomial of degree n and P(x) a polynomial of degree
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #18 from Jakub Jelinek jakub at gcc dot gnu.org 2010-10-26
19:11:59 UTC ---
I guess you mean LLVM instead of clang, I'm pretty sure the FE doesn't perform
this optimization.
Anyway, given:
#define F(n, exp) \
unsigned long
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #19 from joseph at codesourcery dot com joseph at codesourcery dot
com 2010-10-26 20:29:56 UTC ---
On Tue, 26 Oct 2010, dominiq at lps dot ens.fr wrote:
--- Comment #13 from Dominique d'Humieres dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #20 from Jakub Jelinek jakub at gcc dot gnu.org 2010-10-26
21:00:11 UTC ---
If I translate the assembly back to C, it seems it is performing part of the
arithmetics in TImode:
unsigned long f (unsigned long a, unsigned long b)
{
if
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186
--- Comment #21 from Dominique d'Humieres dominiq at lps dot ens.fr
2010-10-26 21:06:48 UTC ---
I guess you mean LLVM instead of clang,
Yes, if you prefer. I was referring to the command I used.
F (6, a * a * a * a * a + 2 * a * a * a + 5 *
23 matches
Mail list logo