[PATCH,rs6000 2/2] Fusion patterns for add-logical/logical-add

2021-04-26 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This patch modifies the function in genfusion.pl for generating the logical-logical patterns so that it can also generate the add-logical and logical-add patterns which are very similar. gcc/ChangeLog: * config/rs6000/genfusion.pl (gen_logical_addsubf): Refactor to

[PATCH,rs6000 1/2] combine patterns for add-add fusion

2021-04-26 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This patch adds a function to genfusion.pl to add a couple more patterns so combine can do fusion of pairs of add and vaddudm instructions. gcc/ChangeLog: * gcc/config/rs6000/genfusion.pl (gen_addadd): New function. * gcc/config/rs6000/fusion.md: Regenerate fi

[PATCH,rs6000 0/2] p10 add-add and add-logical fusion series

2021-04-26 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey Two more sets of combine patterns for p10 fusion. These require the "Add insn types for fusion pairs" patch I posted earlier today. If ok I would like to put these in gcc 12 trunk and backport for 11.2. Thanks, Aaron Aaron Sawdey (2): combine patterns for add-add fusio

[PATCH,rs6000] Test cases for p10 fusion patterns

2021-04-26 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This adds some test cases to make sure that the combine patterns for p10 fusion are working. OK for trunk? gcc/testsuite/ChangeLog: * gcc.target/powerpc/fusion-p10-ldcmpi.c: New file. * gcc.target/powerpc/fusion-p10-2logical.c: New file. --- .../gcc.target/po

[PATCH,rs6000] Add insn types for fusion pairs

2021-04-26 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This adds new values for insn attr type for p10 fusion. The genfusion.pl script is modified to use them, and fusion.md regenerated to capture the new patterns. There are also some formatting only changes to fusion.md that apparently weren't captured after a previous commit of g

[PATCH,rs6000] Tighten predicates for p10 ld/cmpi fusion

2021-03-08 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey PR99070 is caused by a fusion pattern matching that the individual instructions do not match when it is split later. In this case the ld+cmpi patterns were allowing a d-form load address, which the split condition would rightly split, however that left us with something that co

[PATCH,rs6000] [v2] Optimize pcrel access of globals

2021-02-22 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This patch implements a RTL pass that looks for pc-relative loads of the address of an external variable using the PCREL_GOT relocation and a single load or store that uses that external address. Produced by a cast of thousands: * Michael Meissner * Peter Bergner * Bill Sch

[PATCH,rs6000] do not generate fusion.md, update contrib/gcc_update

2021-02-01 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey In a previous fusion-combine patch for rs6000, Segher had asked me to comment out the automatic regeneration of fusion.md. And more recently Edelsohn pointed out that gcc_update needed to fix the timestamp of fusion.md so it didn't get unnecessarily regenerated. OK for trunk i

[PATCH,rs6000] Test cases for p10 fusion patterns

2020-12-11 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This adds some test cases to make sure that the combine patterns for p10 fusion are working. These test cases pass on power10. OK for trunk after the 2 previous patches for the fusion patterns go in? Thanks! Aaron gcc/testsuite/ChangeLog: * gcc.target/powerpc/fusi

[PATCH,rs6000] Fusion patterns for logical-logical

2020-12-10 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This patch adds a new function to genfusion.pl to generate patterns for logical-logical fusion. They are enabled by default for power10 and can be disabled by -mno-power10-fusion-2logical or -mno-power10-fusion. This patch builds on top of the load-cmpi patch posted earlier th

[PATCH,rs6000] Optimize pcrel access of globals [ping]

2020-12-09 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey Ping. I've folded in the changes to comments suggested by Will Schmidt. This patch implements a RTL pass that looks for pc-relative loads of the address of an external variable using the PCREL_GOT relocation and a single load or store that uses that external address. Produced

[PATCH,rs6000] Combine patterns for p10 load-cmpi fusion

2020-12-04 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This patch adds the first batch of patterns to support p10 fusion. These will allow combine to create a single insn for a pair of instructions that that power10 can fuse and execute. These particular ones have the requirement that only cr0 can be used when fusing a load with a

[PATCH,rs6000] Make MMA builtins use opaque modes [v2]

2020-11-19 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey Segher & Bergner - Thanks for the reviews, here's the updated patch after fixing those things. We now have an UNSPEC for xxsetaccz, and an accompanying change to rs6000_rtx_costs to make it be cost 0 so that CSE doesn't try to replace it with a bunch of register moves. If bo

[PATCH] Additional small changes to support opaque modes

2020-11-19 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey After building some larger codes using opaque types and some c++ codes using opaque types it became clear I needed to go through and look for places where opaque types and modes needed to be handled. A whole pile of one-liners. If bootstrap/regtest passes for ppc64le and x86_6

[PATCH] Additional small changes to support opaque modes

2020-11-19 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey After building some larger codes using opaque types and some c++ codes using opaque types it became clear I needed to go through and look for places where opaque types and modes needed to be handled. A whole pile of one-liners. If bootstrap/regtest passes for ppc64le and x86_6

[PATCH, rs6000] Re-enable vector pair memcpy/memmove expansion

2020-11-17 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey After the MMA opaque mode patch goes in, we can re-enable use of vector pair in the inline expansion of memcpy/memmove. After bootstrap/regtest, OK for trunk? Thanks, Aaron gcc/ * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable vector pai

[PATCH,rs6000] Make MMA builtins use opaque modes

2020-11-17 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This patch changes powerpc MMA builtins to use the new opaque mode class and use modes OO (32 bytes) and XO (64 bytes) instead of POI/PXI. Using the opaque modes prevents optimization from trying to do anything with vector pair/quad, which was the problem we were seeing with th

Re: [PATCH] Add MODE_OPAQUE

2020-11-16 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey Richard, Thanks for the review. I think I have resolved everything, as follows: * I was able to remove the const_tiny_rtx initialization for MODE_OPAQUE. If that becomes a problem it's a pretty simple matter to use an UNSPEC to assign a constant to an opaque mode if necessa

[PATCH] Add MODE_OPAQUE

2020-11-13 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey After discussion with Richard Sandiford on IRC, he suggested adding a new mode class MODE_OPAQUE to deal with the problems (PR 96791) we had been having with POImode/PXImode in powerpc target. This patch is the accumulation of changes I needed to make to add this and make it us

Re: [PATCH, rs6000] Optimize pcrel access of globals (updated, ping)

2020-11-04 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey Ping, as it has been a while. This also includes a slight fix to make sure that all references can get optimized. This patch implements a RTL pass that looks for pc-relative loads of the address of an external variable using the PCREL_GOT relocation and a single load or store

[PATCH,rs6000] Add patterns for combine to support p10 fusion

2020-10-26 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This patch adds the first couple patterns to support p10 fusion. These will allow combine to create a single insn for a pair of instructions that that power10 can fuse and execute. These particular ones have the requirement that only cr0 can be used when fusing a load with a co

[PATCH, rs6000] Optimize pcrel access of globals

2020-10-20 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This patch implements a RTL pass that looks for pc-relative loads of the address of an external variable using the PCREL_GOT relocation and a single load or store that uses that external address. It then uses the PCREL_OPT relocation to convert that first load into a single pc-

[PATCH] rs6000: add option -mblock-ops-unaligned-vsx

2020-07-24 Thread acsawdey--- via Gcc-patches
From: Aaron Sawdey This option is mostly being added to provide -mno-block-ops-unaligned-vsx. The default is set the same as -mefficient-unaligned-vsx. This option will control the use of unaligned VSX loads/stores in the inline expansion of memcpy() and memmove(). The use case for this would be

Re: [PATCH][PR target/94542]Don't allow PC-relative addressing for TLS data

2020-04-13 Thread acsawdey via Gcc-patches
On 2020-04-13 10:08, will schmidt wrote: On Fri, 2020-04-10 at 18:00 -0500, acsawdey via Gcc-patches wrote: diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 2b6613bcb7e..c77e60a718f 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -24824,15

[PATCH][PR target/94542]Don't allow PC-relative addressing for TLS data

2020-04-10 Thread acsawdey via Gcc-patches
One of the things that address_to_insn_form() is used for is determining whether a PC-relative addressing instruction could be used. In particular predicate pcrel_external_address and function prefixed_paddi_p() both use it for this purpose. So what emerged in PR/94542 is that it should be look