--- Comment #11 from ubizjak at gmail dot com 2007-12-13 14:24 ---
c testcase:
--cut here--
extern void foo(void);
extern double *dpb;
double s05m_test(void)
{
double result = 0.0;
int n;
for (n = 0; n 2000; ++n)
result += dpb[n];
#ifdef FOOBAR
foo();
#endif
--- Comment #9 from ubizjak at gmail dot com 2007-12-13 14:10 ---
Reduced c++ testcase that is the cause of the runtime difference:
--cut here--
#include iostream
extern double *dpb;
void s05m_test(void)
{
double result = 0.0;
for (int n = 0; n 2000; ++n)
result +=
--- Comment #10 from ubizjak at gmail dot com 2007-12-13 14:12 ---
BTW: .p2align are removed manually from the first case for clarity, I have just
forgot to remove them in second case before posting.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23322
--- Comment #12 from rguenth at gcc dot gnu dot org 2007-12-13 14:36
---
This is still a register allocation problem. We somehow prefer xmm0 which is
call clobbered and causes reloads inside the loop.
Micha? :)
--
rguenth at gcc dot gnu dot org changed:
What
--- Comment #14 from rguenth at gcc dot gnu dot org 2007-12-13 14:54
---
Does yara address this somehow?
--
rguenth at gcc dot gnu dot org changed:
What|Removed |Added
--- Comment #15 from rguenth at gcc dot gnu dot org 2007-12-13 15:00
---
Works with 2.95.4, fails at least starting with 3.3.6 (-m32). Also happens
on x86_64, but there it's not a regression. Happens on all targets that have
only call-clobbered registers that can hold 'result'.
--
--- Comment #13 from rguenth at gcc dot gnu dot org 2007-12-13 14:43
---
I guess if we would split the life-range of (reg:DF 64 [result]) to not extend
over the call, global wouldn't reload all of its uses.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23322
--
mmitchel at gcc dot gnu dot org changed:
What|Removed |Added
Target Milestone|4.1.2 |4.1.3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23322