Re: [PATCH] Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-07 Thread Gabriel Paubert
On Wed, Jan 06, 2010 at 04:18:06PM +0100, Jakub Jelinek wrote: On Wed, Jan 06, 2010 at 10:15:58AM +, Andrew Haley wrote: On 01/06/2010 09:59 AM, Mark Colby wrote: Yabbut, how come RTL cse can handle it in x86_64, but PPC not? Probably because the RTL on x86_64 uses and's and ior's,

Re: [PATCH] Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-07 Thread Jakub Jelinek
On Thu, Jan 07, 2010 at 09:48:53AM +0100, Gabriel Paubert wrote: apparently rs6000_emit_set_long_const needs work. lis 3,0x8034 extsw 3,3 or li 3,0x401a sldi 3,3,17 etc. do IMHO the same. Huh? I don't think so: - first one loads 0x__8034_ in r3, and

RE: PowerPC : GCC2 optimises better than GCC4???

2010-01-06 Thread Mark Colby
Yabbut, how come RTL cse can handle it in x86_64, but PPC not? Probably because the RTL on x86_64 uses and's and ior's, but PPC uses set's of zero_extract's (insvsi). Aha! Yes, that'll probably be it. It should be easy to fix cse to recognize those too. Andrew I'm not familiar with

Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-06 Thread Andrew Haley
On 01/06/2010 09:59 AM, Mark Colby wrote: Yabbut, how come RTL cse can handle it in x86_64, but PPC not? Probably because the RTL on x86_64 uses and's and ior's, but PPC uses set's of zero_extract's (insvsi). Aha! Yes, that'll probably be it. It should be easy to fix cse to recognize

RE: PowerPC : GCC2 optimises better than GCC4???

2010-01-06 Thread Mark Colby
Aha! Yes, that'll probably be it. It should be easy to fix cse to recognize those too. I'm not familiar with the gcc source yet, but just in case I get the time to look at this, could anyone give me a file/line ref to dive into and examine? Would you believe cse.c? :-) Ha! I'll look

[PATCH] Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-06 Thread Jakub Jelinek
On Wed, Jan 06, 2010 at 10:15:58AM +, Andrew Haley wrote: On 01/06/2010 09:59 AM, Mark Colby wrote: Yabbut, how come RTL cse can handle it in x86_64, but PPC not? Probably because the RTL on x86_64 uses and's and ior's, but PPC uses set's of zero_extract's (insvsi). Aha! Yes,

PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Mark Colby
This sounds like a dumb question I know. However the following code snippet results in many more machine instructions under 4.4.2 than under 2.9.5 (I am running a cygwin-PowerPC cross): typedef unsigned int U32; typedef union { U32 R; struct { U32 BF1:2; U32 :8;

Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Andrew Haley
On 01/04/2010 10:51 AM, Mark Colby wrote: This sounds like a dumb question I know. However the following code snippet results in many more machine instructions under 4.4.2 than under 2.9.5 (I am running a cygwin-PowerPC cross): typedef unsigned int U32; typedef union { U32 R;

Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Steven Bosscher
On Mon, Jan 4, 2010 at 12:02 PM, Andrew Haley a...@redhat.com wrote: On 01/04/2010 10:51 AM, Mark Colby wrote: This sounds like a dumb question I know. However the following code snippet results in many more machine instructions under 4.4.2 than under 2.9.5 (I am running a cygwin-PowerPC

RE: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Mark Colby
Is there any way to improve this behaviour? I have been using 2.9.5 very successfully for years and am now looking at 4.4.2, but have many such examples in my code (for clarity of commenting and maintainability). This is very strange.  On x86_64, gcc 4.4.1 generates        movl    

Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Jakub Jelinek
On Mon, Jan 04, 2010 at 12:18:50PM +0100, Steven Bosscher wrote: This optimization is done by the first RTL cse pass.  I can't understand why it's not being done for your target.  I guess this will need a powerpc expert. Known bug, see http://gcc.gnu.org/PR22141 That's unrelated.

RE: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Mark Colby
On Mon, Jan 04, 2010 at 12:18:50PM +0100, Steven Bosscher wrote: This optimization is done by the first RTL cse pass.  I can't understand why it's not being done for your target.  I guess this will need a powerpc expert. Known bug, see http://gcc.gnu.org/PR22141 That's unrelated.

Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Andrew Haley
On 01/04/2010 12:07 PM, Jakub Jelinek wrote: On Mon, Jan 04, 2010 at 12:18:50PM +0100, Steven Bosscher wrote: On Mon, Jan 4, 2010 at 12:02 PM, Andrew Haley a...@redhat.com wrote: This optimization is done by the first RTL cse pass. I can't understand why it's not being done for your target. I

RE: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Bingfeng Mei
Sent: 04 January 2010 16:08 To: gcc@gcc.gnu.org Subject: Re: PowerPC : GCC2 optimises better than GCC4??? On 01/04/2010 12:07 PM, Jakub Jelinek wrote: On Mon, Jan 04, 2010 at 12:18:50PM +0100, Steven Bosscher wrote: On Mon, Jan 4, 2010 at 12:02 PM, Andrew Haley a...@redhat.com wrote

Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Nathan Froyd
On Mon, Jan 04, 2010 at 04:08:17PM +, Andrew Haley wrote: On 01/04/2010 12:07 PM, Jakub Jelinek wrote: IMHO we really should have some late tree pass that converts adjacent bitfield operations into integral operations on non-bitfields (likely with alias set of the whole containing

Re: PowerPC : GCC2 optimises better than GCC4???

2010-01-04 Thread Andrew Haley
On 01/04/2010 04:17 PM, Nathan Froyd wrote: On Mon, Jan 04, 2010 at 04:08:17PM +, Andrew Haley wrote: On 01/04/2010 12:07 PM, Jakub Jelinek wrote: IMHO we really should have some late tree pass that converts adjacent bitfield operations into integral operations on non-bitfields (likely