https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107717
Bug ID: 107717 Summary: [13 Regression] ICEs expanding permutes after g:dc95e1e9702f2f6367bbc108c8d01169be1b66d2 Product: gcc Version: 13.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64* After commit dc95e1e9702f2f6367bbc108c8d01169be1b66d2 (origin/trunk, origin/master, origin/HEAD) Author: Hongyu Wang <hongyu.w...@intel.com> Date: Mon Jan 17 13:01:51 2022 +0800 Optimize VEC_PERM_EXPR with same permutation index and operation The sequence c1 = VEC_PERM_EXPR (a, a, mask) c2 = VEC_PERM_EXPR (b, b, mask) c3 = c1 op c2 can be optimized to c = a op b c3 = VEC_PERM_EXPR (c, c, mask) for all integer vector operation, and float operation with full permutation. gcc/ChangeLog: PR target/98167 * match.pd: New perm + vector op patterns for int and fp vector. gcc/testsuite/ChangeLog: PR target/98167 * gcc.target/i386/pr98167.c: New test. We see various ICEs, an example is void foo(int n, char *restrict out, char *restrict in) { for (int i=n; i-->0; ) { out[i] += in[i]; } } compiled with aarch64-none-linux-gnu -O3 -march=armv8-a+sve2 The problem is that the match.pd pattern as written causes the permute to switch from a single register permute to a two register one. The reason is that when the folded result is expanded in SSA form vec_perm (op @0 @1) (op @0 @1) the result of applying op twice results in two distinct SSA names. This fails because expand_vec_perm_const now tries to use a two operand expansion because there's no easy way to tell that these two operands are the same. If it happens early enough we can CSE the operands, but when this happens after vec_lower it generated something the target does not support. I tried getting expand_vec_perm_const to recognize that they are the same, but that's quite hard. It's best to prevent the generation of the two SSA names to begin with, or add an additional rule for match.pd that's able to CSE this. I'm making this issue because I don't know which approach upstream would like so it's easier to ask first.