https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014
--- Comment #4 from Robin Dapp ---
Richard has posted it and asked for reviews. I have tested it and we have
several testsuite regressions with it but no severe ones. Most or all of them
are dump fails because we combine into vx variants that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113014
--- Comment #2 from Robin Dapp ---
Yes, that's right.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999
--- Comment #1 from Robin Dapp ---
What actually gets in the way of vec_extract here is changing to a "better"
vector mode (which is RVVMF4QI here). If we tried to extract from the mask
directly everything would work directly.
I have a patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112999
Bug ID: 112999
Summary: riscv: Infinite loop with mask extraction
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #8 from Robin Dapp ---
Yes, can confirm that this helps.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #5 from Robin Dapp ---
Yes that's what I just tried. No infinite loop anymore then. But that's not a
new simplification and looks reasonable so there must be something special for
our backend.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #3 from Robin Dapp ---
In match.pd we do something like this:
;; Function e (e, funcdef_no=0, decl_uid=2751, cgraph_uid=1, symbol_order=4)
Pass statistics of "forwprop":
Matching expression match.pd:2771,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112971
--- Comment #2 from Robin Dapp ---
It doesn't look like the same issue to me. The other bug is related to TImode
handling in combination with mask registers. I will also have a look at this
one.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #15 from Robin Dapp ---
I think we need to make sure that we're not writing out of bounds. In that
case anything might happen and if we just don't happen to overwrite this
variable we might hit another one but the test can still
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #10 from Robin Dapp ---
I just realized that I forgot to post the comparison recently. With the patch
now upstream I don't see any differences for zvl128b and different vlens
anymore. What I haven't fully tested yet is zvl256b or
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #13 from Robin Dapp ---
I just built from the most recent commit and it still fails for me.
Could there be a difference in qemu? I'm on qemu-riscv64 version 8.1.91 but
yours is even newer so that might not explain it.
You could
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #9 from Robin Dapp ---
In the good version the length is 32 here because directly before the vsetvl we
have:
li a4,32
That seems to get lost somehow.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #7 from Robin Dapp ---
Here
0x105c6 vse8.v v8,(a5)
is where we overwrite m. The vl is 128 but the preceding vsetvl gets a4 =
46912504507016 as AVL which seems already borken.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112929
--- Comment #6 from Robin Dapp ---
This seems to be gone when simple vsetvl (instead of lazy) is used or with
-fno-schedule-insns which might indicate a vsetvl pass problem.
We might have a few more of those. Maybe it would make sense to run
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #8 from Robin Dapp ---
With Juzhe's latest fix that disables VLS modes >= 128 bit for zvl128b x264
runs without issues here and some of the additional execution failures are
gone.
Will post the current comparison later.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112872
--- Comment #2 from Robin Dapp ---
Thanks. Yes that's similar and also looks fixed by the introduction of the
vec_init expander. Added this test case to the patch and will push it soon.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #7 from Robin Dapp ---
Ah, forgot three tests:
FAIL: gcc.dg/vect/bb-slp-cond-1.c execution test
FAIL: gcc.dg/vect/bb-slp-pr101668.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/bb-slp-pr101668.c execution test
On
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #6 from Robin Dapp ---
I indeed see more failures with _zvl128b, vlen=256 (than with _zvl128b,
vlen=128):
FAIL: gcc.dg/vect/pr66251.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/pr66251.c execution test
FAIL:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112854
--- Comment #3 from Robin Dapp ---
The problem seems to be that we can overlay a 32-bit bitmask with an SImode
subreg and work with it. For zvl1024b on rv32 we don't allow this causing the
ICE.
We might be able to work around it by providing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112853
--- Comment #5 from Robin Dapp ---
Can confirm. The scalable build works with qemu vlen=128 but fails with
vlen=256. That's a good data point as I'm not sure we're already covering this
with the current runs?
I'm going to start a testsuite
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112854
--- Comment #2 from Robin Dapp ---
Hehe I was hoping we wouldn't hit a vec_set on a mask but apparently this
happens as well. We don't have a pattern for that either, yet.
Thanks for the test. I would expect this to be fixed in a similar way
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #14 from Robin Dapp ---
Yes, that's the culprit. I already pushed a fix yesterday.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #12 from Robin Dapp ---
Ok, on my server the difference is that I didn't add vext_spec=v1.0 to the qemu
options. This caused the qemu diagnostic which would of course not match the
expected output.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #11 from Robin Dapp ---
Verified they work locally but also fail on a different server. Also fail
without vector and at -O0. Maybe it's different tcl versions or the shell
doing wonky stuff?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #10 from Robin Dapp ---
I didn't yet look at all those closer because they are more dump failures than
real execution failures.
The ones I checked are
expected
"^foobar$" but got:
"foobar"
so I considered this rather an
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #13 from Robin Dapp ---
Mostly an issue because our expander is definitely not prepared to handle that
:)
It looks like aarch64's is, though, and ours can/should be changed then.
aarch64 doesn't need to implement a qi/bi extract
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #11 from Robin Dapp ---
When I define a vec_extract...bi pattern we don't enter the if (vec_extract) in
expmed because e.g.
bitsize = {1, 0}
bitnum = {3, 4}
and GET_MODE_BITSIZE (innermode) = {1, 0} with innermode = BImode.
This
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #9 from Robin Dapp ---
Ok, it's not the fold_extract_last expander. It just appeared that way here
because I disabled some other things.
What we want to do is extract the last element from a vector. This works as
long as we have
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112773
--- Comment #8 from Robin Dapp ---
Thanks for the testcase. It looks pretty similar to the situation why I
introduced the bitmask extract in the first place and I don't think that's the
root cause.
As last time the problem is that the generic
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
--- Comment #15 from Robin Dapp ---
Does the =m fix your issue? Or is the code gen different then and we're just
lucky? For my problem it doesn't help because we still don't recognize an
alias between load and store and the load is moved.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
--- Comment #13 from Robin Dapp ---
It looks like the takeaway from the other thread is that there are many
likewise assumptions about masked stores in the middle end. It's probably
difficult to get them all right in a short time. Therefore I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112598
--- Comment #11 from Robin Dapp ---
On Friday I looked into one of the Fortran fails, class_67.f90 and debugged it
independently without reading here further. It is also due to the same reason
- alias analysis finds that the predicated store
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112670
--- Comment #1 from Robin Dapp ---
The problem is exposed with the ipa copy propagation pass. I haven't narrowed
it down yet but will continue tomorrow.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661
--- Comment #3 from Robin Dapp ---
Yes, as agreed. Though today I probably won't be able to do much due to private
matters.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112661
--- Comment #1 from Robin Dapp ---
Confirmed, smaller example:
program main
implicit none
integer, parameter :: n=5
character(len=6), dimension(n,n) :: a
character(len=6), dimension(n) :: r1
integer :: i
logical, dimension(n,n) ::
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111488
Robin Dapp changed:
What|Removed |Added
Resolution|--- |FIXED
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #21 from Robin Dapp ---
Grml,
../../gcc/tree-vect-loop.cc:12248:1: fatal error: error writing to
/tmp/ccsMqSV2.s: No space left on device
on cfarm185, cannot even build anymore.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #20 from Robin Dapp ---
Not really depending on an order but rather expecting that the reduction
variable is in op[1] (as created by ifcvt).
That might already be the problem because here the reduction index is 2. It
just never
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #18 from Robin Dapp ---
Already in ifcvt we have:
_ifc__60 = .COND_ADD (_2, _6, MADPictureC1_lsm.10_25, MADPictureC1_lsm.10_25);
which we should not. This is similar on riscv.
But during value numbering it still is
Value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #17 from Robin Dapp ---
Thanks, I reproduced it on the compile farm with this example. Going to have a
look. riscv doesn't fail in a similar way this time.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #15 from Robin Dapp ---
Hmm, that's definitely related to the original change but most likely not to
the fixes.
gcc_assert (code == IFN_COND_ADD || code == IFN_COND_SUB
|| code == IFN_COND_MUL || code ==
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112583
--- Comment #3 from Robin Dapp ---
I cannot reproduce this either. Just started with binop/* and don't see any
fails locally. Patrick, could you check what caused this?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111970
--- Comment #18 from Robin Dapp ---
I did a quick testsuite run on rv32 and can confirm that this fixes the issue
for me.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #47 from Robin Dapp ---
And, just to confirm: Testsuite is unchanged on riscv with your patch.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #46 from Robin Dapp ---
(In reply to Jakub Jelinek from comment #43)
> Now, the patch changed it to allow one extra use in certain cases (but I
> think only on use_stmt, because there should be one use on use_stmt and if
> there is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #40 from Robin Dapp ---
(In reply to Jakub Jelinek from comment #37)
[..]
> The above isn't complete, so one just has to guess what you mean outside of
> that, but the above doesn't seem to be correct. There are many internal
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #35 from Robin Dapp ---
What does get rid of the comparison failures in the three last posted reduced
examples is:
gcall *call = dyn_cast (op_use_stmt);
internal_fn ifn;
if (call &&
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #34 from Robin Dapp ---
(In reply to Jakub Jelinek from comment #29)
> --- gcc/tree-vect-loop.cc.jj 2023-11-14 10:35:52.0 +0100
> +++ gcc/tree-vect-loop.cc 2023-11-15 22:42:32.782007408 +0100
> @@ -4105,9 +4105,9 @@
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112552
--- Comment #7 from Robin Dapp ---
Ah, it's even easier to trigger then. I already have a somewhat working
solution by going with Richi's suggestion and adding the handling for COND_OPs
in vect patterns. Still needs a bit more polishing and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112531
--- Comment #4 from Robin Dapp ---
Personally, I don't mind having some FAILs as long as we know them and
understand the reason for them. I wouldn't insist on "fixing" them but don't
mind if others prefer to have the results "clean".
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112531
--- Comment #2 from Robin Dapp ---
Yes, I'd also argue in favor of -fno-tree-vectorize here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112527
Robin Dapp changed:
What|Removed |Added
Resolution|--- |INVALID
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112527
--- Comment #2 from Robin Dapp ---
Ah, thanks, so it depends on zve32f which implies zve32x. Ok, then all good
and we can close this.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112527
Bug ID: 112527
Summary: RVV integer vector instructions generated with
rv64gc_zvfh
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112481
Robin Dapp changed:
What|Removed |Added
CC||palmer at dabbelt dot com
--- Comment #10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #11 from Robin Dapp ---
Thanks for figuring that out. No idea if the pattern is the problem, most
likely not? I rather suppose there is still a missing fixup somewhere in the
vectorizer that I didn't encounter with my testing.
So
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464
--- Comment #4 from Robin Dapp ---
Is there another way to make it more robust?
Or does the existing
void
vect_finish_replace_stmt (vec_info *vinfo,
stmt_vec_info stmt_info, gimple *vec_stmt)
{
gimple *scalar_stmt
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464
--- Comment #2 from Robin Dapp ---
I tested
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a544bc9b059..257fd40793e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -7084,7 +7084,7 @@
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112464
--- Comment #1 from Robin Dapp ---
We fail at:
void
vect_finish_replace_stmt (vec_info *vinfo,
stmt_vec_info stmt_info, gimple *vec_stmt)
{
gimple *scalar_stmt = vect_orig_stmt (stmt_info)->stmt;
gcc_assert
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #11 from Robin Dapp ---
Thanks, this is helpful.
I have a patch that I just bootstrapped and ran the testsuite with on aarch64.
Going to post it soon, maybe Richi still has a better idea how to work around
this.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #9 from Robin Dapp ---
I believe the problem is that in
if (vectype)
vector_type = vectype;
else if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op))
&& VECTOR_BOOLEAN_TYPE_P (stmt_vectype))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #8 from Robin Dapp ---
Ah of course it's not the first argument but the mask. During vectorization we
already create
fail1.c:15:10: note: add new stmt: vect__ifc__141.81_358 = .COND_ADD
(vect_cst__356,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112406
--- Comment #7 from Robin Dapp ---
Ah, thanks, I can reproduce this on the cfarm/gcc185.
We don't expand:
vect__ifc__141.81_358 = .COND_ADD (vect_cst__356,
vect_GetImageChannelMoments_M00_0_lsm.74_338, { 1.0e+0, ... },
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374
--- Comment #6 from Robin Dapp ---
How does the test suite look without bootstrapping? Are there still new FAILs?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112359
--- Comment #2 from Robin Dapp ---
Would something like
+ bool allow_cond_op = flag_tree_loop_vectorize
+&& !gimple_bb (phi)->loop_father->dont_vectorize;
in convert_scalar_cond_reduction be sufficient or are the more conditions to
check
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361
--- Comment #6 from Robin Dapp ---
So "before" we created
vect__3.12_55 = MEM [(float *)vectp_a.10_53];
vect__ifc__43.13_57 = VEC_COND_EXPR ;
// _ifc__43 = _24 ? _3 : 0.0;
stmp__44.14_58 = BIT_FIELD_REF ;
stmp__44.14_59 = r3_29 +
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112363
--- Comment #1 from Robin Dapp ---
This test was introduced in order to check that we correctly "reduce" with -0.0
as neutral element, i.e. a reduction preserves an intial -0.0 and doesn't turn
it into 0.0 by adding 0.0. Kernel aborted means
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361
--- Comment #2 from Robin Dapp ---
I can have a look. Of course I tested it but neither the compile farm machine
(gcc188) I used nor my local device have AVX512 run capability. Anywhere else
I can test it?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311
--- Comment #10 from Robin Dapp ---
As a general remark: Some of those are present on other backends as well, some
have been introduced by recent common-code changes and some are bogus test
prerequisites or checks. I'm not saying we are in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #30 from Robin Dapp ---
On my machine it is not nearly as bad as insn-emit.cc. What dominates for me
with a GCC 13 host compiler is the already fixed insn-opinit problem.
How long does it take for you (maybe in % of the total
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112109
Bug ID: 112109
Summary: Missing riscv vectorized strcmp (and other) expanders
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791
--- Comment #4 from Robin Dapp ---
This is a scalar popcount and as Kito already noted we will just emit
cpop a0, a0
once the zbb extension is present.
As to the question what is actually being vectorized here, I'm not so sure :D
It looks
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #10 from Robin Dapp ---
>From what I can tell with my barely working connection no regressions on x86,
aarch64 or power10 with the adjusted check.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #9 from Robin Dapp ---
Yes, that's from pattern recog:
slp.c:11:20: note: === vect_pattern_recog ===
slp.c:11:20: note: vect_recog_mask_conversion_pattern: detected: _5 = _2 &
_4;
slp.c:11:20: note: mask_conversion pattern
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #7 from Robin Dapp ---
vectp.4_188 = x_50(D);
vect__1.5_189 = MEM [(int *)vectp.4_188];
mask__2.6_190 = { 1, 1, 1, 1, 1, 1, 1, 1 } == vect__1.5_189;
mask_patt_156.7_191 = VIEW_CONVERT_EXPR>(mask__2.6_190);
_1 = *x_50(D);
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #5 from Robin Dapp ---
Disregarding the reasons for the precision adjustment, for this case here, we
seem to fail at:
/* We do not handle bit-precision changes. */
if ((CONVERT_EXPR_CODE_P (code)
|| code ==
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111794
--- Comment #4 from Robin Dapp ---
Just to mention here as well. As this seems ninstance++ where the
adjust_precision thing comes back to bite us, I'm going to go back and check if
the issue why it was introduced (DCE?) cannot be solved
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #26 from Robin Dapp ---
So insn-opinit.cc still takes 2-3 minutes to compile here, even though the file
is not gigantic.
With the same GCC 13.1 x86 host compiler I see:
phase opt and generate : 170.28 ( 99%) 0.75 (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #25 from Robin Dapp ---
At least here locally the maximum I saw was 1.4 GB of RES for insn-emit-10.cc.
That's still not ideal (especially when 8 or 10 of those files compile in
parallel) but at least no 8 GB for a single file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #23 from Robin Dapp ---
For the lack of a better idea (and time constraints as looking for compiler
bottlenecks is slow and tedious) I went with Kito's suggestion of splitting
insn-emit.cc
This reduces this part of the compilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111760
--- Comment #6 from Robin Dapp ---
Yes, thanks for filing this bug separately. The patch doesn't disable all of
those optimizations, of course I paid special attention not mess up with them.
The difference here is that we valueize, add
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111760
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111428
--- Comment #3 from Robin Dapp ---
Still difficult to track down. The following is a smaller reproducer:
program main
implicit none
integer, parameter :: n=5, m=3
integer, dimension(n,m) :: v
real, dimension(n,m) :: r
do
call
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #22 from Robin Dapp ---
Ah, then it's not that different, your machine is just faster ;)
callgraph ipa passes : 69.77 ( 11%) 5.97 ( 13%) 76.05 ( 12%)
2409M ( 10%)
integration: 91.95 (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #20 from Robin Dapp ---
Mhm, why is your profile so different from mine? I'm also on an x86_64 host
with a 13.2.1 host compiler (Fedora).
Is it because of the preprocessed source? Or am I just reading the timing
report wrong?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #18 from Robin Dapp ---
Just finished an initial timing run, sorted, first 10:
Time variable usr sys wall
GGC
phase opt and generate : 567.60 ( 97%) 38.23
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
--- Comment #16 from Robin Dapp ---
Confirming that it's the compilation of insn-emit.cc which takes > 10 minutes.
The rest (including auto generating of files) is reasonably fast. Going to do
some experiments with it and see which pass takes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111506
--- Comment #5 from Robin Dapp ---
Ah, thanks Joseph, so this at least means that we do not need
!flag_trapping_math here.
However, the vectorizer emulates the 64-bit integer to _Float16 conversion via
an intermediate int32_t and now the riscv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600
Robin Dapp changed:
What|Removed |Added
CC||law at gcc dot gnu.org
--- Comment #12
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111506
Robin Dapp changed:
What|Removed |Added
CC||joseph at codesourcery dot com
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111428
--- Comment #2 from Robin Dapp ---
Reproduced locally. The identical binary sometimes works and sometimes doesn't
so it must be a race...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111488
Robin Dapp changed:
What|Removed |Added
CC||juzhe.zhong at rivai dot ai
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111488
Bug ID: 111488
Summary: ICE ion riscv gcc.dg/vect/vect-126.c
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
--- Comment #6 from Robin Dapp ---
Created attachment 55902
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55902=edit
Tentative
You're referring to the case where we have init = -0.0, the condition is false
and we end up wrongly doing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
--- Comment #3 from Robin Dapp ---
Several other things came up, so I'm just going to post the latest status here
without having revised or tested it. Going to try fixing it and testing
tomorrow.
--- a/gcc/tree-vect-loop.cc
+++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
Robin Dapp changed:
What|Removed |Added
CC||rdapp at gcc dot gnu.org
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53
--- Comment #4 from Robin Dapp ---
Yes, with VLS reduction this will improve.
On aarch64 + sve I see
loop inside costs: 2
This is similar to our VLS costs.
And their loop is indeed short:
ld1wz30.s, p7/z, [x0, x2, lsl 2]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53
--- Comment #2 from Robin Dapp ---
With the current trunk we don't spill anymore:
(VLS)
.L4:
vle32.v v2,0(a5)
vadd.vv v1,v1,v2
addia5,a5,16
bne a5,a4,.L4
Considering just that loop I'd say costing works
101 - 200 of 226 matches
Mail list logo