[Bug target/117353] [15 regression] RISC-V: ICE when building libcrypt since r15-3228-g771256bcb9ddc4

2024-10-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117353 --- Comment #5 from Robin Dapp --- The issue is that we expand a const-vector (using a left shift, among others) move during lra where we can't create pseudos which we must not do. Likely just missing a can_create_pseudo_p somewhere.

[Bug middle-end/117173] can_vec_perm_const_p does not consider costs

2024-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117173 --- Comment #2 from Robin Dapp --- In x264, before the optimization we have: _42 = VEC_PERM_EXPR ; ... _44 = VEC_PERM_EXPR ; _45 = VEC_PERM_EXPR ; The first one (_42) is "monotonic" and can be implemented by a vmerge. This implies a load and

[Bug middle-end/117173] New: can_vec_perm_const_p does not consider costs

2024-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117173 Bug ID: 117173 Summary: can_vec_perm_const_p does not consider costs Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: mid

[Bug tree-optimization/116578] vectorizer SLP transition issues / dependences

2024-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116578 Bug 116578 depends on bug 116655, which changed state. Bug 116655 Summary: RISC-V: ICE with -mrvv-max-lmul=dynamic in compute_nregs_for_mode https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116655 What|Removed |Ad

[Bug target/116655] RISC-V: ICE with -mrvv-max-lmul=dynamic in compute_nregs_for_mode

2024-10-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116655 Robin Dapp changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/116611] Inefficient mix of contiguous and load-lane vectorization due to missing permutes

2024-09-30 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611 --- Comment #8 from Robin Dapp --- (In reply to Richard Biener from comment #7) > (In reply to Robin Dapp from comment #6) > > Hmm, the RTL follows the gimple code pretty well and those > >vect_array.27[0] = vect__2.17_71; > > become subreg-

[Bug target/116611] Inefficient mix of contiguous and load-lane vectorization due to missing permutes

2024-09-30 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611 --- Comment #6 from Robin Dapp --- Hmm, the RTL follows the gimple code pretty well and those vect_array.27[0] = vect__2.17_71; become subreg-subreg moves. vect_array.27 is only dead after the v10 use. How should it ideally work? Could we r

[Bug target/112109] Missing riscv vectorized strcmp (and other) expanders

2024-09-18 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112109 --- Comment #6 from Robin Dapp --- Should we close this? I think all of the routines are in or are we missing something still? What's IMHO still a TODO is to honor TARGET_MAX_LMUL for some of the builtins that came first. memcpy for example a

[Bug tree-optimization/116573] [15 Regression] Recent SLP work appears to generate significantly worse code on RISC-V

2024-09-17 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573 --- Comment #7 from Robin Dapp --- I'm testing a patch that basically does what Richi proposes. I was also playing around with mixed lane configurations where we potentially reuse the pointer increment from another pointer update. To me the co

[Bug target/116611] Inefficient mix of contiguous and load-lane vectorization due to missing permutes

2024-09-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611 --- Comment #4 from Robin Dapp --- I just send a patch to get rid of this early exit in our backend. However with test testsuite compile options -O3 -march=rv64gcv -fno-vect-cost-model I still see MASK_LEN_LOAD_LANES.

[Bug target/116611] Inefficient mix of contiguous and load-lane vectorization due to missing permutes

2024-09-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611 --- Comment #3 from Robin Dapp --- Actually we're already supposed to be handling all constant permutes. Maybe what's in the way is /* FIXME: Explicitly disable VLA interleave SLP vectorization when we may encounter ICE for poly size (1

[Bug target/116611] Inefficient mix of contiguous and load-lane vectorization due to missing permutes

2024-09-05 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611 --- Comment #1 from Robin Dapp --- For the record, with the default -march=rv64gcv I don't see any LOAD_LANES, with -march=rv64gcv -mrvv-vector-bits=zvl I do.

[Bug target/116242] [meta-bug] Tracker for zvl issues in RISC-V

2024-08-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116242 Bug 116242 depends on bug 116086, which changed state. Bug 116086 Summary: RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 What|Removed

[Bug target/116086] RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2

2024-08-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 Robin Dapp changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug middle-end/115495] [15 Regression] ICE in smallest_mode_for_size, at stor-layout.cc:356 during combine on RISC-V rv64gcv_zvl256b at -O3 since r15-1042-g68b0742a49d

2024-08-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug middle-end/115495] [15 Regression] ICE in smallest_mode_for_size, at stor-layout.cc:356 during combine on RISC-V rv64gcv_zvl256b at -O3 since r15-1042-g68b0742a49d

2024-08-20 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495 --- Comment #7 from Robin Dapp --- Ah, hmm, this doesn't seem to occur on trunk anymore for me. It's still likely latent. Patrick, does it still happen for you?

[Bug middle-end/115495] [15 Regression] ICE in smallest_mode_for_size, at stor-layout.cc:356 during combine on RISC-V rv64gcv_zvl256b at -O3 since r15-1042-g68b0742a49d

2024-08-20 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495 Robin Dapp changed: What|Removed |Added Component|rtl-optimization|middle-end --- Comment #6 from Robin Dapp

[Bug target/116202] RISC-V: Miscompile at -O3 with zvl256b

2024-08-03 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116202 Robin Dapp changed: What|Removed |Added CC||pan2.li at intel dot com --- Comment #1 fr

[Bug target/116149] RISC-V: Miscompile at -O3 with zvl256b -mrvv-vector-bits=zvl

2024-08-01 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/116149] RISC-V: Miscompile at -O3 with zvl256b -mrvv-vector-bits=zvl

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149 --- Comment #3 from Robin Dapp --- It looks like the problem is a wrong mode_idx attribute for the wx variants of the adds. The widening adds's mode is the one of the non-widened input operand but for the wx/scalar variants this is a scalar mod

[Bug target/116149] RISC-V: Miscompile at -O3 with zvl256b -mrvv-vector-bits=zvl

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149 --- Comment #2 from Robin Dapp --- Correction, it's actually just the wx adds with a length of 1 and those should be "tu". Quite likely this only got exposed recently with the late-combine change in place.

[Bug target/116149] RISC-V: Miscompile at -O3 with zvl256b -mrvv-vector-bits=zvl

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149 --- Comment #1 from Robin Dapp --- > Still present when rvv_ta_all_1s=true is omitted. My result is '0' when rvv_ta_all_1s=false, is that what you meant? I didn't have time to check this in detail but it's not the missing else for masked loads

[Bug target/111600] [14/15 Regression] RISC-V bootstrap time regression

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600 --- Comment #37 from Robin Dapp --- > The size of the partitions is a little uneven though. Using > --with-emitinsn-partitions=48 I get some empty partitions and some bigger > than 2MB: > Another problematic file is insn-recog.cc which is 19MB

[Bug bootstrap/116146] Split insn-recog.cc

2024-07-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116146 --- Comment #3 from Robin Dapp --- On riscv insn-output is the largest file right now as well. I have a local patch that splits it - it's a bit cumbersome because the static initializer needs to be made non-static i.e. the initialization must b

[Bug target/116125] RISC-V: Does not fully checking for overlapping memory regions

2024-07-29 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116125 Robin Dapp changed: What|Removed |Added Known to fail||14.1.0 Status|UNCONFIRMED

[Bug target/116086] RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2

2024-07-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 --- Comment #7 from Robin Dapp --- Ok, if done right, i.e. without introducing a new bug, both the reduced case as well as the original case show the same behavior with respect to the fix. Also, xz calculates the proper hash, phew. I sent a fir

[Bug target/116086] RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2

2024-07-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 --- Comment #6 from Robin Dapp --- Ah, thanks for reducing. I didn't get much further with cvise yesterday. What were your settings for it? The reduced test case is great because it is easy to analyze and uncovers a fairly significant problem

[Bug target/116086] RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2

2024-07-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 --- Comment #4 from Robin Dapp --- Probably because I left out a crucial detail ;) It only happens starting with vlen=256 in qemu.

[Bug tree-optimization/115819] RISC-V: Failed to hoist vrsub.vx to the header of the loop

2024-07-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819 Robin Dapp changed: What|Removed |Added Ever confirmed|1 |0 Status|NEW

[Bug target/116036] [14/15 only] RISCV: internal compiler error: in riscv_expand_mult_with_const_int with -march=rv64idv

2024-07-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116036 Robin Dapp changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Known to fail|15.0

[Bug target/116086] RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2

2024-07-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 --- Comment #2 from Robin Dapp --- Reduced a bit: typedef unsigned int uint32_t; typedef unsigned long long uint64_t; typedef struct { uint64_t length; uint64_t state[8]; uint32_t curlen; unsigned char buf[128]; } sha512_state;

[Bug target/116086] RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2

2024-07-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 --- Comment #1 from Robin Dapp --- The following reproduces the problem for me, though not very minimal yet: typedef unsigned int uint32_t; typedef unsigned long long uint64_t; typedef struct { uint64_t length; uint64_t state[8]; u

[Bug target/116086] New: RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2

2024-07-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086 Bug ID: 116086 Summary: RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b and LMUL=m2 Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: wrong-co

[Bug tree-optimization/115819] RISC-V: Failed to hoist vrsub.vx to the header of the loop

2024-07-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819 --- Comment #7 from Robin Dapp --- No regressions, going to commit after a while, possibly adding the previously failing test case.

[Bug tree-optimization/115819] RISC-V: Failed to hoist vrsub.vx to the header of the loop

2024-07-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819 --- Comment #6 from Robin Dapp --- (In reply to JuzheZhong from comment #4) > (In reply to Andrew Pinski from comment #1) > > This might be a cost issue. > > No. I don't it's cost issue. > It's because we suppress the hoist by incorrect POLY IN

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-07-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 Robin Dapp changed: What|Removed |Added Last reconfirmed|2024-07-24 00:00:00 | Known to fail|14.0

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-07-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 Robin Dapp changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed|

[Bug target/116059] [14/15 Regression] Miscompile at -O2 since r14-6420-g85c5efcffed

2024-07-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116059 Robin Dapp changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rdapp at gcc dot gnu.org Last rec

[Bug target/116059] [14/15 Regression] Miscompile at -O2 since r14-6420-g85c5efcffed

2024-07-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116059 --- Comment #3 from Robin Dapp --- Glad we went for rvv_ma_all_1s=true because otherwise this one would have gone unnoticed :) The -fsigned-char -fno-strict-aliasing -fwrapv look unnecessary. I see the problem without them as well, just the ou

[Bug target/116036] [14/15] RISCV: internal compiler error: in riscv_expand_mult_with_const_int with -march=rv64idv

2024-07-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116036 --- Comment #2 from Robin Dapp --- Begrudgingly confirming :) Still need to figure out where to best error out for that combination. If we do it at the assertion spot the message will be output as many times as we try vector modes (like 8 or s

[Bug target/115995] RISC-V: Can't generate portable RVV code for rv64gcv_zvl512b

2024-07-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115995 --- Comment #2 from Robin Dapp --- Hmm I can't reproduce either. riscv64-unknown-linux-gnu-gcc -march=rv64gcv_zvl512b1p0 -mabi=lp64d -O2 990128-1.c QEMU_CPU=rv64,v=true,xventanacondops=true,x-zvfh=true,zfh=true,zba=true,zbb=true,zbc=true,zicond

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-05 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 Robin Dapp changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/115336] [15] rv64gcv_zvl256b miscompile at -O3

2024-07-03 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115336 --- Comment #3 from Robin Dapp --- Follow-up on this one: My workaround of emitting a vmv.v.i v[0-9],0 before any (potentially) offending masked load is not going to work universally. That's because on several instances we make use of the fact

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 --- Comment #14 from Robin Dapp --- Thanks Kito. In addition, I asked Daniel to have a look into the vmv.s.x policy handling. From what I saw it is special in that it currently always uses undisturbed and doesn't observe the specified policy.

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 --- Comment #11 from Robin Dapp --- > I believe it is VSETVL PASS doing the fusion, fuse all "vsetvl" according > their > demand field into a single "vsetvli" and put them since beginning. Yes, and the vsetvl fusion is very useful here.

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 --- Comment #9 from Robin Dapp --- We already merge with operand[0], just the TU is missing as far as I can tell. I'm seeing the following output with my patch: vsetivlizero,8,e16,mf4,tu,ma vle16.v v1,0(a1) vmv.

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 --- Comment #7 from Robin Dapp --- I checked. It looks like qemu indeed always implicitly uses TU for vmv.s.x regardless of the actual setting. This behavior masks the bug here.

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 --- Comment #5 from Robin Dapp --- > zvl128b => GOOD. > vec_set_vnx8hi_0: > vl1re16.v v1,0(a1) > vsetivlizero,1,e16,m1,ta,ma > vmv.s.x v1,a2 > vs1r.v v1,0(a0) // Only store 1 element as source code

[Bug target/115725] RISC-V: Use wrong AVL for rv64gcv_zfh_zvl512b

2024-07-01 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725 --- Comment #4 from Robin Dapp --- Sorry, just got back from the RISC-V summit. IMHO, yes, it should be TU. We have the same thing for the not-element-0 case. I wonder why it doesn't fail with spike or qemu. Probably qemu doesn't do anything

[Bug tree-optimization/100756] [12 Regression] vect: Superfluous epilog created on s390x

2024-06-20 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100756 --- Comment #11 from Robin Dapp --- Just noticed this is still open due to the retargeting message. IMHO this can be closed. I'm pretty sure I erroneously used the GCC 12 target when opening the bug when it should have been trunk/GCC 13. I sup

[Bug rtl-optimization/115495] [15 Regression] ICE in smallest_mode_for_size, at stor-layout.cc:356 during combine on RISC-V rv64gcv_zvl256b at -O3

2024-06-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495 --- Comment #3 from Robin Dapp --- At first it looked very weird that we need 50 (or so) instructions to expand ;; MEM [(short int *)&a] = vect_cst__21; but then I realized that all the hoops we jump through are due to possible misalignment.

[Bug tree-optimization/115382] Wrong code with in-order conditional reduction and masked loops

2024-06-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382 --- Comment #7 from Robin Dapp --- Ah yes, I'm going to push the patch to 14 still.

[Bug target/115439] [15 Regression] ICEs after r15-638 on master-thumb_m55_hard_eabi

2024-06-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115439 --- Comment #6 from Robin Dapp --- Looks reasonable. That's what we were doing before in internal-fn.cc before expanding (except operands[2]). Are you going to post a patch?

[Bug target/115439] [15 Regression] ICEs after r15-638 on master-thumb_m55_hard_eabi

2024-06-11 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115439 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #2 fr

[Bug tree-optimization/115382] Wrong code with in-order conditional reduction and masked loops

2024-06-09 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382 --- Comment #3 from Robin Dapp --- For the record - the hunk before bootstrapped and regtested on the cfarm machines and tested successfully on aarch64 qemu with sve. I still need to set up a regtest environment with SME.

[Bug tree-optimization/115382] Wrong code with in-order conditional reduction and masked loops

2024-06-07 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382 --- Comment #1 from Robin Dapp --- Would something like this work? The testcase ran successfully with Intel's SME with that change (and aarch64 qemu with SVE). diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 028692614bb..f9bf6

[Bug target/115336] [15] rv64gcv_zvl256b miscompile at -O3

2024-06-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115336 --- Comment #2 from Robin Dapp --- It looks to me as if we're expecting the result of a gather_load to be zero when it's masked out (semantics of mask_gather_load) but for mask_len_gather_load we actually describe it as undefined. Here the mask

[Bug tree-optimization/115340] New: Loop/SLP vectorization possible inefficiency

2024-06-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 Bug ID: 115340 Summary: Loop/SLP vectorization possible inefficiency Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tre

[Bug tree-optimization/113281] [11/12/13 Regression] Latent wrong code due to vectorization of shift reduction and missing promotions since r9-1590

2024-05-31 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113281 --- Comment #29 from Robin Dapp --- Just to document again: The test case should not be vectorized and at some point we will adjust the cost model so it is not going to be. I'd prefer to base that decision on real uarchs rather than adjust the

[Bug c/115104] RISC-V: GCC-14 can combine vsext+vadd -> vwadd but Trunk GCC (GCC 15) Failed

2024-05-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104 --- Comment #2 from Robin Dapp --- Thanks, I was just about to open a PR.

[Bug tree-optimization/113583] Main loop in 519.lbm not vectorized.

2024-05-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583 --- Comment #18 from Robin Dapp --- A bit of a follow-up: I'm working on a patch for reassociation that can handle the mentioned cases and some more but it will still require a bit of time to get everything regression free and correct. What it

[Bug middle-end/114196] [13 Regression] Fixed length vector ICE: in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9454

2024-05-12 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196 --- Comment #7 from Robin Dapp --- I can barely build a compiler on gcc185 due to disk space. I'm going to set up a cross toolchain (that I need for other purposes as well) in order to test.

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #10 from Robin Dapp --- Yes it helps. Great that get_gimple_for_ssa_name is right below get_rtx_for_ssa_name that I stepped through several times while debugging and I didn't realize the connection, g. But thanks! Good thing i

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-25 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #8 from Robin Dapp --- Created attachment 58037 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58037&action=edit Expand dump Dump attached. Insn 209 is the problematic one. The changing from _911 to 1078 happens in internal-f

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-24 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 Robin Dapp changed: What|Removed |Added CC||rguenth at gcc dot gnu.org,

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #5 from Robin Dapp --- What happens is that code sinking does: Sinking # VUSE <.MEM_1235> vect__173.251_1238 = .MASK_LEN_LOAD (_911, 32B, { -1, -1, -1, -1 }, loop_len_1064, 0); from bb 3 to bb 4 so we have vect__173.251_1238 = .M

[Bug target/114714] [RISC-V][RVV] ICE: insn does not satisfy its constraints (postreload)

2024-04-22 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714 Robin Dapp changed: What|Removed |Added CC||rdapp at gcc dot gnu.org --- Comment #5 fr

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #4 from Robin Dapp --- Ok, it looks like we do 5 iterations with the last one being length-masked to length 2 and then in the "live extraction" phase use "iteration 6".

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #3 from Robin Dapp --- > probably -fwhole-program is enough, -flto not needed(?) Yes, -fwhole-program is sufficient. > > # vectp_g.248_1401 = PHI > ... > _1411 = .SELECT_VL (ivtmp_1409, POLY_INT_CST [2, 2]); > .. > vect__19

[Bug target/114734] [14] RISC-V rv64gcv_zvl256b miscompile with -flto -O3 -mrvv-vector-bits=zvl

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734 --- Comment #1 from Robin Dapp --- Confirmed.

[Bug middle-end/114733] [14] Miscompile with -march=rv64gcv -O3 on riscv

2024-04-16 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114733 --- Comment #1 from Robin Dapp --- Confirmed, also shows up here.

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 --- Comment #5 from Robin Dapp --- Weird, I tried your exact qemu version and still can't reproduce the problem. My results are always FFB5. Binutils difference? Very unlikely. Could you post your QEMU_CPU settings just to be sure?

[Bug target/114668] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668 Robin Dapp changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED

[Bug target/114686] Feature request: Dynamic LMUL should be the default for the RISC-V Vector extension

2024-04-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114686 --- Comment #3 from Robin Dapp --- I think we have always maintained that this can definitely be a per-uarch default but shouldn't be a generic default. > I don't see any reason why this wouldn't be the case for the vast majority of > implement

[Bug target/114668] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668 --- Comment #2 from Robin Dapp --- This, again, seems to be a problem with bit extraction from masks. For some reason I didn't add the VLS modes to the corresponding vec_extract patterns. With those in place the problem is gone because we go th

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 --- Comment #2 from Robin Dapp --- Checked with the latest commit on a different machine but still cannot reproduce the error. PR114668 I can reproduce. Maybe a copy and paste problem?

[Bug target/114665] [14] RISC-V rv64gcv: miscompile at -O3

2024-04-10 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665 --- Comment #1 from Robin Dapp --- Hmm, my local version is a bit older and seems to give the same result for both -O2 and -O3. At least a good starting point for bisection then.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247 --- Comment #6 from Robin Dapp --- Testsuite looks unchanged on rv64gcv.

[Bug ipa/114247] RISC-V: miscompile at -O3 and IPA SRA

2024-04-04 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247 --- Comment #5 from Robin Dapp --- This fixes the test case for me locally, thanks. I can run the testsuite with it later if you'd like.

[Bug tree-optimization/114476] [13/14 Regression] wrong code with -fwrapv -O3 -fno-vect-cost-model (and -march=armv9-a+sve2 on aarch64 and -march=rv64gcv on riscv)

2024-04-03 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476 --- Comment #8 from Robin Dapp --- I tried some things (for the related bug without -fwrapv) then got busy with some other things. I'm going to have another look later this week.

[Bug rtl-optimization/114515] [14 Regression] Failure to use aarch64 lane forms after PR101523

2024-04-02 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515 Robin Dapp changed: What|Removed |Added CC||ewlu at rivosinc dot com,

[Bug tree-optimization/114485] [13/14 Regression] Wrong code with -O3 -march=rv64gcv on riscv or `-O3 -march=armv9-a` for aarch64

2024-03-27 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485 --- Comment #4 from Robin Dapp --- Yes, the vectorization looks ok. The extracted live values are not used afterwards and therefore the whole vectorized loop is being thrown away. Then we do one iteration of the epilogue loop, inverting the ori

[Bug tree-optimization/114476] [13/14 Regression] wrong code with -fwrapv -O3 -fno-vector-cost-mode (and -march=armv9-a+sve2 on aarch64 and -march=rv64gcv on riscv)

2024-03-26 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476 --- Comment #5 from Robin Dapp --- So the result is -9 instead of 9 (or vice versa) and this happens (just) with vectorization. We only vectorize with -fwrapv. >From a first quick look, the following is what we have before vect: (loop) [lo

[Bug tree-optimization/114396] [14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #8 from Robin Dapp --- No fallout on x86 or aarch64. Of course using false instead of TYPE_SIGN (utype) is also possible and maybe clearer?

[Bug tree-optimization/114396] [14 Regression] Vector: Runtime mismatch at -O2 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #7 from Robin Dapp --- diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 4375ebdcb49..f8f7ba0ccc1 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -9454,7 +9454,7 @@ vect_peel_nonlinear_iv_init (gimple

[Bug target/114396] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 --- Comment #3 from Robin Dapp --- -O3 -mavx2 -fno-vect-cost-model -fwrapv seems to be sufficient.

[Bug target/114396] [14] RISC-V rv64gcv vector: Runtime mismatch at -O3 with -fwrapv

2024-03-19 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396 Robin Dapp changed: What|Removed |Added Target|riscv*-*-* |x86_64-*-* riscv*-*-* --- Comment #2 from

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #29 from Robin Dapp --- Yes, that also appears to work here. There was no lto involved this time? Now we need to figure out what's different with SPEC.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-15 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #27 from Robin Dapp --- Can you try it with a simpler (non SPEC) test? Maybe there is still something weird happening with SPEC's scripting.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #24 from Robin Dapp --- I rebuilt GCC from scratch with your options but still have the same problem. Could our sources differ? My SPEC version might not be the most recent but I'm not aware that mcf changed at some point. Just to

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #22 from Robin Dapp --- Still the same problem unfortunately. I'm a bit out of ideas - maybe your compiler executables could help?

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #20 from Robin Dapp --- No change with -std=gnu99 unfortunately.

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #18 from Robin Dapp --- Hmm, doesn't help unfortunately. A full command line for me looks like: x86_64-pc-linux-gnu-gcc -c -o pbeampp.o -DSPEC_CPU -DNDEBUG -DWANT_STDC_PROTO -Ofast -march=znver4 -mtune=znver4 -flto=32 -g -fprofil

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-14 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #16 from Robin Dapp --- Thank you! I'm having a problem with the data, though. Compiling with -Ofast -march=znver4 -mtune=znver4 -flto -fprofile-use=/tmp. Would you mind showing your exact final options for compilation of e.g. pbeam

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-13 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #10 from Robin Dapp --- (In reply to Sam James from comment #9) > (In reply to Filip Kastl from comment #8) > > I'd like to help but I'm afraid I cannot send you the SPEC binaries with PGO > > applied since SPEC is licensed nor can I

[Bug target/112548] [14 regression] 5% exec time regression in 429.mcf on AMD zen4 CPU (since r14-5076-g01c18f58d37865)

2024-03-08 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548 --- Comment #7 from Robin Dapp --- I built executables with and without the commit (-Ofast -march=znver4 -flto). There is no difference so it must really be something that happens with PGO. I'd really need access to a zen4 box or the pgo execut

[Bug target/114202] [14] RISC-V rv64gcv: miscompile at -O3

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114202 Robin Dapp changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/114200] [14] RISC-V fixed-length vector miscompile at -O3

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200 --- Comment #3 from Robin Dapp --- *** Bug 114202 has been marked as a duplicate of this bug. ***

[Bug middle-end/114196] [13/14 Regression] Fixed length vector ICE: in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9454

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196 Robin Dapp changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill

[Bug target/114200] [14] RISC-V fixed-length vector miscompile at -O3

2024-03-06 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200 --- Comment #1 from Robin Dapp --- Took me a while to analyze this... needed more time than I'd like to admit to make sense of the somewhat weird code created by fully unrolling and peeling. I believe the problem is that we reload the output re

  1   2   3   >