Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-14 Thread Paolo Bonzini
On 10/13/2011 10:07 PM, H.J. Lu wrote: On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: The answer to H.J.'s Why do we do it for MEM then? is simply because no one ever thought about not doing it No, that's false. The same expand_compound_operation /

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-14 Thread H.J. Lu
On Thu, Oct 13, 2011 at 11:51 PM, Paolo Bonzini bonz...@gnu.org wrote: On 10/13/2011 10:07 PM, H.J. Lu wrote: On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner ken...@vlsi1.ultra.nyu.edu  wrote: The answer to H.J.'s Why do we do it for MEM then? is simply because no one ever thought about

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-14 Thread Paolo Bonzini
On 10/14/2011 05:36 PM, H.J. Lu wrote: There is a testcase at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696 It passes with my patch. Cool, so let's wait for the results of testing. Paolo

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-14 Thread H.J. Lu
On Fri, Oct 14, 2011 at 9:23 AM, Paolo Bonzini bonz...@gnu.org wrote: On 10/14/2011 05:36 PM, H.J. Lu wrote: There is a testcase at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50696 It passes with my patch. Cool, so let's wait for the results of testing. Paolo Here is the complete

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini
On 10/13/2011 01:04 AM, Richard Kenner wrote: I still don't like the patch, but I'm no longer as familiar with the code as I used to be so can't suggest a replacement. Let's see what others think about it. Same here, I don't like it but I hardly see any alternative. The only possibility

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
Same here, I don't like it but I hardly see any alternative. The only possibility could be to prevent calling expand_compound_operation completely for addresses. Richard, what do you think? Don't worry, combine hasn't changed much since your days. :) The problem wasn't potential changes

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini
On 10/13/2011 02:51 PM, Richard Kenner wrote: case MEM: /* Ensure that our address has any ASHIFTs converted to MULT in case address-recognizing predicates are called later. */ temp = make_compound_operation (XEXP (x, 0), MEM); SUBST (XEXP (x, 0), temp);

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
Or being fooled by the 0xfffc masking, perhaps. No, I'm pretty sure that's NOT the case. The *whole point* of the routine is to deal with that masking.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 7:14 AM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: Or being fooled by the 0xfffc masking, perhaps. No, I'm pretty sure that's NOT the case.  The *whole point* of the routine is to deal with that masking. I got (gdb) step make_compound_operation

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
at the end. make_compound_operation doesn't know how to restore ZERO_EXTEND. It does in general. See make_extraction, which it calls. The question is why it doesn't in this case. That's the bug.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 9:11 AM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: at the end.  make_compound_operation doesn't know how to restore ZERO_EXTEND. It does in general.  See make_extraction, which it calls.  The question is why it doesn't in this case.  That's the bug. It never

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini
On 10/13/2011 06:35 PM, Richard Kenner wrote: It never calls make_extraction. There are several cases handled for AND operation. But (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) (const_int 4294967292

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
An and:DI is cheaper than a zero_extend:DI of an and:SI. That depends strongly on the constants and whether the machine is 32-bit or 64-bit. But that's irrelevant in this case since the and:SI will be removed (it reflects what already been done).

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini bonz...@gnu.org wrote: On 10/13/2011 06:35 PM, Richard Kenner wrote: It never calls make_extraction.  There are several cases handled for AND operation. But (and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])                (const_int 4

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini
On Thu, Oct 13, 2011 at 19:19, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini bonz...@gnu.org wrote: On 10/13/2011 06:35 PM, Richard Kenner wrote: It never calls make_extraction.  There are several cases handled for AND operation. But (and:DI (plus:DI

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 10:21 AM, Paolo Bonzini bonz...@gnu.org wrote: On Thu, Oct 13, 2011 at 19:19, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Oct 13, 2011 at 10:01 AM, Paolo Bonzini bonz...@gnu.org wrote: On 10/13/2011 06:35 PM, Richard Kenner wrote: It never calls make_extraction.  There

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Paolo Bonzini
On Thu, Oct 13, 2011 at 19:06, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: An and:DI is cheaper than a zero_extend:DI of an and:SI. That depends strongly on the constants and whether the machine is 32-bit or 64-bit. Yes, the rtx_costs take care of that. But that's irrelevant in this

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 11:15 AM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: The answer to H.J.'s Why do we do it for MEM then? is simply because no one ever thought about not doing it No, that's false.  The same expand_compound_operation / make_compound_operation pair is present in

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
Does it look OK? No. If I understand your code correctly, there's essentially the same code as you have a bit above that: /* If the constant is one less than a power of two, this might be representable by an extraction even if no shift is present. If it doesn't end

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 2:30 PM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: It is because mask 0x is optimized to 0xfffc by keeping track of non-zero bits in registers and the above code doesn't take that into account. Then I'd suggest modifying that code so that it does

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 2:45 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Oct 13, 2011 at 2:30 PM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: It is because mask 0x is optimized to 0xfffc by keeping track of non-zero bits in registers and the above code doesn't take that

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
But the current code converts (and X 3) into a bit extraction since ((i = exact_log2 (UINTVAL (XEXP (x, 1)) + 1)) = 0) is true when UINTVAL (XEXP (x, 1)) == 3. Should we do it or not? By adding the test for nonzero bits, you'd potentially be doing the conversion more often (which is the point

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
I am testing this patch. The difference is it checks nonzero bits of the first operand. I would suggest moving (and expanding) the comments from the existing block into your new block.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 3:33 PM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: I am testing this patch.  The difference is it checks nonzero bits of the first operand. I would suggest moving (and expanding) the comments from the existing block into your new block. Like ths? -- H.J. ---

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread Richard Kenner
Like ths? Yes, that's what I meant. Thanks. Again, I'd suggest doing some performance testing on this just to verify that it doesn't pessimize things.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-13 Thread H.J. Lu
On Thu, Oct 13, 2011 at 3:52 PM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: Like ths? Yes, that's what I meant.  Thanks. Again, I'd suggest doing some performance testing on this just to verify that it doesn't pessimize things. I will run SPEC CPU 2K/2006 on ia32, x86-64 and x32. --

PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread H.J. Lu
Hi, When combine tries to combine: (insn 37 35 39 3 (set (reg:SI 90) (plus:SI (mult:SI (reg/v:SI 84 [ i ]) (const_int 4 [0x4])) (reg:SI 106))) x.i:11 247 {*leasi_2} (nil)) (insn 39 37 41 3 (set (mem:SI (zero_extend:DI (reg:SI 90)) [3 MEM[symbol: x,

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread Richard Kenner
X86 backend doesn't accept the new expression as valid address while (zero_extend:DI) works just fine. This patches keeps ZERO_EXTEND when zero-extending address to Pmode. It reduces number of lea from 24173 to 21428 in x32 libgfortran.so. Does it make any senses? I'd be inclined to have

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread H.J. Lu
On Wed, Oct 12, 2011 at 3:40 PM, Richard Kenner ken...@vlsi1.ultra.nyu.edu wrote: X86 backend doesn't accept the new expression as valid address while (zero_extend:DI) works just fine.  This patches keeps ZERO_EXTEND when zero-extending address to Pmode.  It reduces number of lea from 24173 to

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread Richard Kenner
1. The placement of subreg in (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ]) (const_int 4 [0x4])) 0) (subreg:DI (reg:SI 106) 0)) isn't supported by x86 backend. That's easy to fix. 2. The biggest problem is optimizing mask 0x to