https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117353
--- Comment #5 from Robin Dapp ---
The issue is that we expand a const-vector (using a left shift, among others)
move during lra where we can't create pseudos which we must not do. Likely
just missing a can_create_pseudo_p somewhere.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117173
--- Comment #2 from Robin Dapp ---
In x264, before the optimization we have:
_42 = VEC_PERM_EXPR ;
...
_44 = VEC_PERM_EXPR ;
_45 = VEC_PERM_EXPR ;
The first one (_42) is "monotonic" and can be implemented by a vmerge. This
implies a load and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117173
Bug ID: 117173
Summary: can_vec_perm_const_p does not consider costs
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: mid
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116578
Bug 116578 depends on bug 116655, which changed state.
Bug 116655 Summary: RISC-V: ICE with -mrvv-max-lmul=dynamic in
compute_nregs_for_mode
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116655
What|Removed |Ad
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116655
Robin Dapp changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611
--- Comment #8 from Robin Dapp ---
(In reply to Richard Biener from comment #7)
> (In reply to Robin Dapp from comment #6)
> > Hmm, the RTL follows the gimple code pretty well and those
> >vect_array.27[0] = vect__2.17_71;
> > become subreg-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611
--- Comment #6 from Robin Dapp ---
Hmm, the RTL follows the gimple code pretty well and those
vect_array.27[0] = vect__2.17_71;
become subreg-subreg moves.
vect_array.27 is only dead after the v10 use.
How should it ideally work? Could we r
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112109
--- Comment #6 from Robin Dapp ---
Should we close this? I think all of the routines are in or are we missing
something still?
What's IMHO still a TODO is to honor TARGET_MAX_LMUL for some of the builtins
that came first. memcpy for example a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116573
--- Comment #7 from Robin Dapp ---
I'm testing a patch that basically does what Richi proposes.
I was also playing around with mixed lane configurations where we potentially
reuse the pointer increment from another pointer update. To me the co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611
--- Comment #4 from Robin Dapp ---
I just send a patch to get rid of this early exit in our backend.
However with test testsuite compile options
-O3 -march=rv64gcv -fno-vect-cost-model
I still see MASK_LEN_LOAD_LANES.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611
--- Comment #3 from Robin Dapp ---
Actually we're already supposed to be handling all constant permutes.
Maybe what's in the way is
/* FIXME: Explicitly disable VLA interleave SLP vectorization when we
may encounter ICE for poly size (1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116611
--- Comment #1 from Robin Dapp ---
For the record, with the default -march=rv64gcv I don't see any LOAD_LANES,
with -march=rv64gcv -mrvv-vector-bits=zvl I do.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116242
Bug 116242 depends on bug 116086, which changed state.
Bug 116086 Summary: RISC-V: Hash mismatch with vectorized 557.xz_r at zvl128b
and LMUL=m2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495
--- Comment #7 from Robin Dapp ---
Ah, hmm, this doesn't seem to occur on trunk anymore for me. It's still likely
latent. Patrick, does it still happen for you?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495
Robin Dapp changed:
What|Removed |Added
Component|rtl-optimization|middle-end
--- Comment #6 from Robin Dapp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116202
Robin Dapp changed:
What|Removed |Added
CC||pan2.li at intel dot com
--- Comment #1 fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149
--- Comment #3 from Robin Dapp ---
It looks like the problem is a wrong mode_idx attribute for the wx variants of
the adds. The widening adds's mode is the one of the non-widened input operand
but for the wx/scalar variants this is a scalar mod
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149
--- Comment #2 from Robin Dapp ---
Correction, it's actually just the wx adds with a length of 1 and those should
be "tu". Quite likely this only got exposed recently with the late-combine
change in place.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116149
--- Comment #1 from Robin Dapp ---
> Still present when rvv_ta_all_1s=true is omitted.
My result is '0' when rvv_ta_all_1s=false, is that what you meant?
I didn't have time to check this in detail but it's not the missing else for
masked loads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #37 from Robin Dapp ---
> The size of the partitions is a little uneven though. Using
> --with-emitinsn-partitions=48 I get some empty partitions and some bigger
> than 2MB:
> Another problematic file is insn-recog.cc which is 19MB
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116146
--- Comment #3 from Robin Dapp ---
On riscv insn-output is the largest file right now as well. I have a local
patch that splits it - it's a bit cumbersome because the static initializer
needs to be made non-static i.e. the initialization must b
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116125
Robin Dapp changed:
What|Removed |Added
Known to fail||14.1.0
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086
--- Comment #7 from Robin Dapp ---
Ok, if done right, i.e. without introducing a new bug, both the reduced case as
well as the original case show the same behavior with respect to the fix.
Also, xz calculates the proper hash, phew.
I sent a fir
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086
--- Comment #6 from Robin Dapp ---
Ah, thanks for reducing. I didn't get much further with cvise yesterday. What
were your settings for it?
The reduced test case is great because it is easy to analyze and uncovers a
fairly significant problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086
--- Comment #4 from Robin Dapp ---
Probably because I left out a crucial detail ;)
It only happens starting with vlen=256 in qemu.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819
Robin Dapp changed:
What|Removed |Added
Ever confirmed|1 |0
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116036
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Known to fail|15.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086
--- Comment #2 from Robin Dapp ---
Reduced a bit:
typedef unsigned int uint32_t;
typedef unsigned long long uint64_t;
typedef struct
{
uint64_t length;
uint64_t state[8];
uint32_t curlen;
unsigned char buf[128];
} sha512_state;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086
--- Comment #1 from Robin Dapp ---
The following reproduces the problem for me, though not very minimal yet:
typedef unsigned int uint32_t;
typedef unsigned long long uint64_t;
typedef struct
{
uint64_t length;
uint64_t state[8];
u
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116086
Bug ID: 116086
Summary: RISC-V: Hash mismatch with vectorized 557.xz_r at
zvl128b and LMUL=m2
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: wrong-co
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819
--- Comment #7 from Robin Dapp ---
No regressions, going to commit after a while, possibly adding the previously
failing test case.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115819
--- Comment #6 from Robin Dapp ---
(In reply to JuzheZhong from comment #4)
> (In reply to Andrew Pinski from comment #1)
> > This might be a cost issue.
>
> No. I don't it's cost issue.
> It's because we suppress the hoist by incorrect POLY IN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665
Robin Dapp changed:
What|Removed |Added
Last reconfirmed|2024-07-24 00:00:00 |
Known to fail|14.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116059
Robin Dapp changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |rdapp at gcc dot gnu.org
Last rec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116059
--- Comment #3 from Robin Dapp ---
Glad we went for rvv_ma_all_1s=true because otherwise this one would have gone
unnoticed :) The -fsigned-char -fno-strict-aliasing -fwrapv look unnecessary.
I see the problem without them as well, just the ou
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116036
--- Comment #2 from Robin Dapp ---
Begrudgingly confirming :)
Still need to figure out where to best error out for that combination. If we
do it at the assertion spot the message will be output as many times as we try
vector modes (like 8 or s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115995
--- Comment #2 from Robin Dapp ---
Hmm I can't reproduce either.
riscv64-unknown-linux-gnu-gcc -march=rv64gcv_zvl512b1p0 -mabi=lp64d -O2
990128-1.c
QEMU_CPU=rv64,v=true,xventanacondops=true,x-zvfh=true,zfh=true,zba=true,zbb=true,zbc=true,zicond
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115336
--- Comment #3 from Robin Dapp ---
Follow-up on this one: My workaround of emitting a vmv.v.i v[0-9],0 before any
(potentially) offending masked load is not going to work universally.
That's because on several instances we make use of the fact
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
--- Comment #14 from Robin Dapp ---
Thanks Kito. In addition, I asked Daniel to have a look into the vmv.s.x
policy handling. From what I saw it is special in that it currently always
uses undisturbed and doesn't observe the specified policy.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
--- Comment #11 from Robin Dapp ---
> I believe it is VSETVL PASS doing the fusion, fuse all "vsetvl" according
> their
> demand field into a single "vsetvli" and put them since beginning.
Yes, and the vsetvl fusion is very useful here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
--- Comment #9 from Robin Dapp ---
We already merge with operand[0], just the TU is missing as far as I can tell.
I'm seeing the following output with my patch:
vsetivlizero,8,e16,mf4,tu,ma
vle16.v v1,0(a1)
vmv.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
--- Comment #7 from Robin Dapp ---
I checked. It looks like qemu indeed always implicitly uses TU for vmv.s.x
regardless of the actual setting. This behavior masks the bug here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
--- Comment #5 from Robin Dapp ---
> zvl128b => GOOD.
> vec_set_vnx8hi_0:
> vl1re16.v v1,0(a1)
> vsetivlizero,1,e16,m1,ta,ma
> vmv.s.x v1,a2
> vs1r.v v1,0(a0) // Only store 1 element as source code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115725
--- Comment #4 from Robin Dapp ---
Sorry, just got back from the RISC-V summit.
IMHO, yes, it should be TU. We have the same thing for the not-element-0 case.
I wonder why it doesn't fail with spike or qemu. Probably qemu doesn't do
anything
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100756
--- Comment #11 from Robin Dapp ---
Just noticed this is still open due to the retargeting message.
IMHO this can be closed. I'm pretty sure I erroneously used the GCC 12 target
when opening the bug when it should have been trunk/GCC 13.
I sup
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115495
--- Comment #3 from Robin Dapp ---
At first it looked very weird that we need 50 (or so) instructions to expand
;; MEM [(short int *)&a] = vect_cst__21;
but then I realized that all the hoops we jump through are due to possible
misalignment.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382
--- Comment #7 from Robin Dapp ---
Ah yes, I'm going to push the patch to 14 still.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115439
--- Comment #6 from Robin Dapp ---
Looks reasonable. That's what we were doing before in internal-fn.cc before
expanding (except operands[2]).
Are you going to post a patch?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115439
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #2 fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382
--- Comment #3 from Robin Dapp ---
For the record - the hunk before bootstrapped and regtested on the cfarm
machines and tested successfully on aarch64 qemu with sve. I still need to set
up a regtest environment with SME.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382
--- Comment #1 from Robin Dapp ---
Would something like this work? The testcase ran successfully with Intel's SME
with that change (and aarch64 qemu with SVE).
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 028692614bb..f9bf6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115336
--- Comment #2 from Robin Dapp ---
It looks to me as if we're expecting the result of a gather_load to be zero
when it's masked out (semantics of mask_gather_load) but for
mask_len_gather_load we actually describe it as undefined. Here the mask
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340
Bug ID: 115340
Summary: Loop/SLP vectorization possible inefficiency
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tre
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113281
--- Comment #29 from Robin Dapp ---
Just to document again: The test case should not be vectorized and at some
point we will adjust the cost model so it is not going to be. I'd prefer to
base that decision on real uarchs rather than adjust the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115104
--- Comment #2 from Robin Dapp ---
Thanks, I was just about to open a PR.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113583
--- Comment #18 from Robin Dapp ---
A bit of a follow-up: I'm working on a patch for reassociation that can handle
the mentioned cases and some more but it will still require a bit of time to
get everything regression free and correct. What it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196
--- Comment #7 from Robin Dapp ---
I can barely build a compiler on gcc185 due to disk space. I'm going to set up
a cross toolchain (that I need for other purposes as well) in order to test.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #10 from Robin Dapp ---
Yes it helps. Great that get_gimple_for_ssa_name is right below
get_rtx_for_ssa_name that I stepped through several times while debugging and I
didn't realize the connection, g.
But thanks! Good thing i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #8 from Robin Dapp ---
Created attachment 58037
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58037&action=edit
Expand dump
Dump attached. Insn 209 is the problematic one.
The changing from _911 to 1078 happens in internal-f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
Robin Dapp changed:
What|Removed |Added
CC||rguenth at gcc dot gnu.org,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #5 from Robin Dapp ---
What happens is that code sinking does:
Sinking # VUSE <.MEM_1235>
vect__173.251_1238 = .MASK_LEN_LOAD (_911, 32B, { -1, -1, -1, -1 },
loop_len_1064, 0);
from bb 3 to bb 4
so we have
vect__173.251_1238 = .M
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114714
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #5 fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #4 from Robin Dapp ---
Ok, it looks like we do 5 iterations with the last one being length-masked to
length 2 and then in the "live extraction" phase use "iteration 6".
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #3 from Robin Dapp ---
> probably -fwhole-program is enough, -flto not needed(?)
Yes, -fwhole-program is sufficient.
>
> # vectp_g.248_1401 = PHI
> ...
> _1411 = .SELECT_VL (ivtmp_1409, POLY_INT_CST [2, 2]);
> ..
> vect__19
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114734
--- Comment #1 from Robin Dapp ---
Confirmed.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114733
--- Comment #1 from Robin Dapp ---
Confirmed, also shows up here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665
--- Comment #5 from Robin Dapp ---
Weird, I tried your exact qemu version and still can't reproduce the problem.
My results are always FFB5.
Binutils difference? Very unlikely. Could you post your QEMU_CPU settings
just to be sure?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114686
--- Comment #3 from Robin Dapp ---
I think we have always maintained that this can definitely be a per-uarch
default but shouldn't be a generic default.
> I don't see any reason why this wouldn't be the case for the vast majority of
> implement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114668
--- Comment #2 from Robin Dapp ---
This, again, seems to be a problem with bit extraction from masks.
For some reason I didn't add the VLS modes to the corresponding vec_extract
patterns. With those in place the problem is gone because we go th
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665
--- Comment #2 from Robin Dapp ---
Checked with the latest commit on a different machine but still cannot
reproduce the error. PR114668 I can reproduce. Maybe a copy and paste
problem?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114665
--- Comment #1 from Robin Dapp ---
Hmm, my local version is a bit older and seems to give the same result for both
-O2 and -O3. At least a good starting point for bisection then.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247
--- Comment #6 from Robin Dapp ---
Testsuite looks unchanged on rv64gcv.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114247
--- Comment #5 from Robin Dapp ---
This fixes the test case for me locally, thanks.
I can run the testsuite with it later if you'd like.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476
--- Comment #8 from Robin Dapp ---
I tried some things (for the related bug without -fwrapv) then got busy with
some other things. I'm going to have another look later this week.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515
Robin Dapp changed:
What|Removed |Added
CC||ewlu at rivosinc dot com,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114485
--- Comment #4 from Robin Dapp ---
Yes, the vectorization looks ok. The extracted live values are not used
afterwards and therefore the whole vectorized loop is being thrown away.
Then we do one iteration of the epilogue loop, inverting the ori
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114476
--- Comment #5 from Robin Dapp ---
So the result is -9 instead of 9 (or vice versa) and this happens (just) with
vectorization. We only vectorize with -fwrapv.
>From a first quick look, the following is what we have before vect:
(loop)
[lo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396
--- Comment #8 from Robin Dapp ---
No fallout on x86 or aarch64.
Of course using false instead of TYPE_SIGN (utype) is also possible and maybe
clearer?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396
--- Comment #7 from Robin Dapp ---
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 4375ebdcb49..f8f7ba0ccc1 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -9454,7 +9454,7 @@ vect_peel_nonlinear_iv_init (gimple
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396
--- Comment #3 from Robin Dapp ---
-O3 -mavx2 -fno-vect-cost-model -fwrapv seems to be sufficient.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114396
Robin Dapp changed:
What|Removed |Added
Target|riscv*-*-* |x86_64-*-* riscv*-*-*
--- Comment #2 from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #29 from Robin Dapp ---
Yes, that also appears to work here. There was no lto involved this time?
Now we need to figure out what's different with SPEC.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #27 from Robin Dapp ---
Can you try it with a simpler (non SPEC) test? Maybe there is still something
weird happening with SPEC's scripting.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #24 from Robin Dapp ---
I rebuilt GCC from scratch with your options but still have the same problem.
Could our sources differ? My SPEC version might not be the most recent but I'm
not aware that mcf changed at some point.
Just to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #22 from Robin Dapp ---
Still the same problem unfortunately.
I'm a bit out of ideas - maybe your compiler executables could help?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #20 from Robin Dapp ---
No change with -std=gnu99 unfortunately.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #18 from Robin Dapp ---
Hmm, doesn't help unfortunately. A full command line for me looks like:
x86_64-pc-linux-gnu-gcc -c -o pbeampp.o -DSPEC_CPU -DNDEBUG -DWANT_STDC_PROTO
-Ofast -march=znver4 -mtune=znver4 -flto=32 -g -fprofil
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #16 from Robin Dapp ---
Thank you!
I'm having a problem with the data, though.
Compiling with -Ofast -march=znver4 -mtune=znver4 -flto -fprofile-use=/tmp.
Would you mind showing your exact final options for compilation of e.g.
pbeam
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #10 from Robin Dapp ---
(In reply to Sam James from comment #9)
> (In reply to Filip Kastl from comment #8)
> > I'd like to help but I'm afraid I cannot send you the SPEC binaries with PGO
> > applied since SPEC is licensed nor can I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112548
--- Comment #7 from Robin Dapp ---
I built executables with and without the commit (-Ofast -march=znver4 -flto).
There is no difference so it must really be something that happens with PGO.
I'd really need access to a zen4 box or the pgo execut
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114202
Robin Dapp changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200
--- Comment #3 from Robin Dapp ---
*** Bug 114202 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114196
Robin Dapp changed:
What|Removed |Added
See Also||https://gcc.gnu.org/bugzill
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114200
--- Comment #1 from Robin Dapp ---
Took me a while to analyze this... needed more time than I'd like to admit to
make sense of the somewhat weird code created by fully unrolling and peeling.
I believe the problem is that we reload the output re
1 - 100 of 267 matches
Mail list logo