https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140
--- Comment #4 from Tamar Christina ---
It looks like it's because the old unrolled code for the pointer version did a
subtract and used the difference to optimize the IV check away to every 4
elements. This explains the increase in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140
--- Comment #3 from Tamar Christina ---
(In reply to Jan Hubicka from comment #2)
> Looking at the change, I do not see how that could disable inlining. It
> should only reduce size of the function size estimates in the heuristics.
>
> I think
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608
--- Comment #24 from Tamar Christina ---
(In reply to Mikael Morin from comment #23)
> (In reply to Mikael Morin from comment #21)
> >
> > (...) and should be able to submit the first
> > series (inline minloc without dim argument) this week.
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115974
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145
--- Comment #5 from Tamar Christina ---
(In reply to ktkachov from comment #4)
> Intersting, thanks for the background. The bigger issue I was seeing was
> with a string-matching loop like https://godbolt.org/z/E7b13915E where the
> constant
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140
--- Comment #1 from Tamar Christina ---
Yeah, we've noticed it as well.
The weird thing is that the dynamic instruction count went up by a lot.
So it looks like some inlining or something did not happen.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074
Tamar Christina changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 116074, which changed state.
Bug 116074 Summary: [15 regression] ICE when building harfbuzz-9.0.0 on arm64
(related_int_vector_mode, at stor-layout.cc:581)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074
--- Comment #8 from Tamar Christina ---
Going with a backend fix instead.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074
--- Comment #7 from Tamar Christina ---
The backend is returning TImode for get_vectype_for_scalar_type for historical
reasons where large integer modes were considered struct types and this vector
modes.
However they're not modes the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116074
Tamar Christina changed:
What|Removed |Added
Last reconfirmed||2024-07-25
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608
--- Comment #22 from Tamar Christina ---
(In reply to Mikael Morin from comment #21)
> (In reply to Tamar Christina from comment #20)
> > Hi Mikael,
> >
> > I did regression testing on x86_64 and AArch64 and only found one test-ism.
> >
> > I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106783
--- Comment #8 from Tamar Christina ---
(In reply to Jan Hubicka from comment #6)
> The problem is that n/=0 is undefined behavior (so we can optimize out call
> to function doing divide by zero), while __builtin_trap is observable and we
> do
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608
--- Comment #20 from Tamar Christina ---
Hi Mikael,
I did regression testing on x86_64 and AArch64 and only found one test-ism.
I think I understand most of the patch to be able to deal with any fallout,
would it be ok if I fix the test-ism
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531
Tamar Christina changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 115531, which changed state.
Bug 115531 Summary: vectorizer generates inefficient code for masked
conditional update loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936
Tamar Christina changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936
--- Comment #6 from Tamar Christina ---
(In reply to Richard Biener from comment #3)
> iv->step should never be a pointer type
This is created by SCEV.
simple_iv_with_niters in the case where no CHREC is found creates an IV with
base == ev,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936
--- Comment #5 from Tamar Christina ---
(In reply to Richard Biener from comment #3)
> iv->step should never be a pointer type
This is created by SCEV.
simple_iv_with_niters in the case where no CHREC is found creates an IV with
base == ev,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115934
--- Comment #7 from Tamar Christina ---
(In reply to Thomas Schwinge from comment #6)
> Tamar, Richard, thanks for having a look.
>
> (In reply to Tamar Christina from comment #4)
> > This one looks a bit like costing, [...]
>
> I see. So we
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936
--- Comment #4 from Tamar Christina ---
(In reply to Richard Biener from comment #3)
> iv->step should never be a pointer type
That's what I initially thought too. My suspicion is that there is some code
that tries to create the 0 offset.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115934
--- Comment #4 from Tamar Christina ---
This one looks a bit like costing,
before the patch IVopts had:
:
inv_expr 1: -element_7(D)
inv_expr 2: (signed int) rite_5(D) - (signed int) element_7(D)
and after the patch it generates a few
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936
Tamar Christina changed:
What|Removed |Added
Target Milestone|--- |15.0
--- Comment #2 from Tamar
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115936
Tamar Christina changed:
What|Removed |Added
Ever confirmed|0 |1
CC|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115934
Tamar Christina changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115934
--- Comment #1 from Tamar Christina ---
Hi, thanks for the report, could you tell me a target triple I can use for
nvptx?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866
Tamar Christina changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866
Bug ID: 115866
Summary: missed optimization vectorizing switch statements.
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115799
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104265
--- Comment #5 from Tamar Christina ---
Also for fully masked architectures we can instead of recreating the vectors
just mask out the irrelevant values.
But we should still order the exits based on complexity.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104265
--- Comment #4 from Tamar Christina ---
(In reply to Richard Biener from comment #3)
> Note the SLP discovery opportunity is from the "reduction" PHI to the
> return which merges control flow to a zero/one flag.
Right, so I get what you mean
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608
--- Comment #19 from Tamar Christina ---
Hi Mikael,
It looks like the last version of your patch already gets inlined in the call
sites we cared about.
Would it be possible for you to upstream it?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623
Tamar Christina changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629
--- Comment #6 from Tamar Christina ---
(In reply to rguent...@suse.de from comment #5)
> > In this case, the second load is conditional on the first load mask, which
> > means it's already done an AND.
> > And crucially inverting it means you
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545
--- Comment #12 from Tamar Christina ---
I had a bug in the benchmark, I forgot to set taskset,
These are the correct ones:
++---+-+-+
| NEEDLE | scalar 1x | vect| memchr |
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629
--- Comment #4 from Tamar Christina ---
(In reply to Richard Biener from comment #3)
> So we now tail-merge the two b[i] loading blocks. Can you check SVE
> code-gen with this? If that fixes the PR consider adding a SVE testcase.
Thanks, the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88545
--- Comment #11 from Tamar Christina ---
(In reply to Jonathan Wakely from comment #9)
> Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653731.html
>
> Rerunning benchmarks with this patch would be very welcome.
OK, I have
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120
--- Comment #5 from Tamar Christina ---
considering ivopts bails out on doloop prediction for multiple exits anyway,
what do you think about:
diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629
Bug ID: 115629
Summary: Inefficient if-convert of masked conditionals
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531
Tamar Christina changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623
--- Comment #4 from Tamar Christina ---
novect3.c: In function 'void f(char*, int)':
novect3.c:4:9: error: missing loop condition in loop with 'GCC novector' pragma
before ';' token
4 | for (;;i++)
|
should do it, will
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115623
Tamar Christina changed:
What|Removed |Added
Status|NEW |ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120
--- Comment #4 from Tamar Christina ---
You asked why this doesn't happen with a normal vector loop Richi.
For a normal loop when IVcannon adds the downward counting loop there are two
main differences.
1. for a single exit loop, the downward
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
--- Comment #4 from Tamar Christina ---
(In reply to Richard Biener from comment #2)
> Ah, I feared this would happen - this case seems to be because of a lot of
> VEC_PERM nodes(?) which are not handled by the CSE process as well as the
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
--- Comment #3 from Tamar Christina ---
>
> Can you check whether that fixes the issue?
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 9465d94de1a..212d5f97f7d 100644
> --- a/gcc/tree-vect-slp.cc
> +++
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
Bug ID: 115597
Summary: [15 Regression] vectorizer takes 20+ h compiling
510.parest in SPECCPU2017 since
g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
Product: gcc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
--- Comment #5 from Tamar Christina ---
(In reply to Andrew Pinski from comment #4)
> This might be improved by
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654819.html . Or it
> might be the case the vectorizer case needs to be
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115537
--- Comment #5 from Tamar Christina ---
Thanks for the fix!
I think the testcase needs SVE enabled to ICE no?
shouldn't that be -mcpu=neoverse-v1 and not -mcpu=neoverse-n1?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
--- Comment #2 from Tamar Christina ---
(In reply to Andrew Pinski from comment #1)
> I suspect there is a dup of this already. See the bug which I made this one
> blocking for a list of related bugs.
Most of the other bugs relate to the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115537
Bug ID: 115537
Summary: [15 Regression] vectorizable_reduction ICEs after
g:d66b820f392aa9a7c34d3cddaf3d7c73bf23f82d
Product: gcc
Version: 15.0
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115534
Bug ID: 115534
Summary: intermediate stack use not eliminated
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531
--- Comment #3 from Tamar Christina ---
(In reply to Andrew Pinski from comment #1)
> I suspect PR 20999 would fix this ...
> but we have to be careful since without masked stores, you could still
> vectorize this unlike the transformed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115531
Bug ID: 115531
Summary: vectorizer generates inefficient code for masked
conditional update loops
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464
--- Comment #10 from Tamar Christina ---
Thanks for the fix, but I don't think it's sufficient.
what I meant with the earlier comment was that the subregs are broken in
general, so not just the one generated by the undef fast path.
i.e.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464
--- Comment #7 from Tamar Christina ---
(In reply to Tamar Christina from comment #6)
> (In reply to Richard Sandiford from comment #5)
> > In this kind of situation, we should go through a fresh pseudo rather than
> > try to take the subreg
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464
--- Comment #6 from Tamar Christina ---
(In reply to Richard Sandiford from comment #5)
> In this kind of situation, we should go through a fresh pseudo rather than
> try to take the subreg directly.
I did try that but fwprop pushed it back
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464
Tamar Christina changed:
What|Removed |Added
CC||rsandifo at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115464
Tamar Christina changed:
What|Removed |Added
Last reconfirmed||2024-06-12
CC|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #15 from Tamar Christina ---
(In reply to rguent...@suse.de from comment #14)
> On Thu, 6 Jun 2024, tnfchris at gcc dot gnu.org wrote:
>
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
> >
> > --- Comment #13 from Tamar
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #13 from Tamar Christina ---
(In reply to rguent...@suse.de from comment #12)
> > since we don't care about overflow here, it looks like the stripping should
> > be recursive as long as it's a NOP expression between two integral
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #11 from Tamar Christina ---
(In reply to Richard Biener from comment #10)
> I think the question is why IVOPTs ends up using both the signed and
> unsigned variant of the same IV instead of expressing all uses of both with
> one
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54013
Tamar Christina changed:
What|Removed |Added
Blocks||115130
--- Comment #4 from Tamar
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #9 from Tamar Christina ---
It's taken me a bit of time to track down all the reasons for the speedup with
the earlier patch.
This comes from two parts:
1. Signed IVs don't get simplified. Due to possible UB with signed overflows
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860
--- Comment #9 from Tamar Christina ---
(In reply to prathamesh3492 from comment #8)
> Hi Tamar,
> Using -falign-loops=5 indeed brings back the performance.
> The adrp instruction has same address (0x4ae784) by setting -falign-loops=5
> (which
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
Tamar Christina changed:
What|Removed |Added
Ever confirmed|0 |1
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
Bug ID: 115130
Summary: (early-break) [meta-bug] early break vectorization
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: meta-bug, missed-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115120
--- Comment #3 from Tamar Christina ---
That makes sense, though I also wonder how it works for scalar multi exit
loops, IVops has various checks on single exits.
I guess one problem is that the code in IVops that does this uses the exit to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860
--- Comment #7 from Tamar Christina ---
Yeah, it's most likely an alignment issue, especially as there's no code
changes.
We run our benchmarking with different flags so it may be why we don't see it.
the loop seems misaligned, you can try
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114412
--- Comment #5 from Tamar Christina ---
(In reply to Filip Kastl from comment #4)
> (In reply to Tamar Christina from comment #3)
> > Hi Filip,
> >
> > Do you generate these runs with counters based PGO or compiler
> > instrumentation?
> >
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115087
Bug ID: 115087
Summary: dead block not eliminated in SVE intrinsics code
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114412
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
Tamar Christina changed:
What|Removed |Added
Ever confirmed|0 |1
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #6 from Tamar Christina ---
Created attachment 58096
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58096=edit
exchange2.fppized-bad.f90.187t.ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #5 from Tamar Christina ---
Created attachment 58095
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58095=edit
exchange2.fppized-good.f90.187t.ivopts
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #4 from Tamar Christina ---
reduced more:
---
module brute_force
integer, parameter :: r=9
integer block(r, r, 0)
contains
subroutine brute
do
do
do
do
do
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
--- Comment #3 from Tamar Christina ---
(In reply to Andrew Pinski from comment #2)
> > which is harder for prefetchers to follow.
>
> This seems like a limitation in the HW prefetcher rather than anything else.
> Maybe the cost model for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932
Bug ID: 114932
Summary: Improvement in CHREC can give large performance gains
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92538
Tamar Christina changed:
What|Removed |Added
CC||jamborm at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860
--- Comment #3 from Tamar Christina ---
I cannot reproduce this even recompiling libc.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860
--- Comment #1 from Tamar Christina ---
Hmm
I Am unable to reproduce this with -O3 - flto -mcpu=neoverse-v2 on a
neoverse-v2 machine.
Is any other option required?
Also that code was new in gcc 14 and was partially reverted due to register
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766
--- Comment #2 from Tamar Christina ---
(In reply to Vladimir Makarov from comment #1)
> (In reply to Tamar Christina from comment #0)
> > The documentation for ^ states:
>
> If it works for you, we could try to use the patch (although it needs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114769
Tamar Christina changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114769
--- Comment #2 from Tamar Christina ---
I believe this is safe, but the interface is definitely not the cleanest.
vect_recog_absolute_difference has two callers:
1. vect_recog_sad_pattern where if you return true with unprom not set, then
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113625
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114766
Bug ID: 114766
Summary: ^ constraint modifier unexpectedly affects register
class selection.
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Tamar Christina changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114513
Bug 114513 depends on bug 114741, which changed state.
Bug 114741 Summary: [14 regression] aarch64 sve: unnecessary fmov for scalar
int bit operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Tamar Christina changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
--- Comment #6 from Tamar Christina ---
and the exact armv9-a cost model you quoted, also does the right codegen.
https://godbolt.org/z/obafoT6cj
There is just an inexplicable penalty being applied to the r->r alternative.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552
Tamar Christina changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
Tamar Christina changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
--- Comment #26 from Tamar Christina ---
(In reply to Richard Biener from comment #25)
> That means, when the loop takes the early exit we _must_ take that during
> the vector iterations. Peeling for gaps means if we would take the early
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
--- Comment #24 from Tamar Christina ---
(In reply to Richard Biener from comment #23)
> Maybe easier to understand testcase:
>
> with -O3 -msse4.1 -fno-vect-cost-model we return 20 instead of 8. Adding
> -fdisable-tree-cunroll avoids the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
--- Comment #22 from Tamar Christina ---
note that due to the secondary exit the actual full vector iteration count is 8
scalar elements at VF=4 == 2.
And it's this boundary condition where we fail, since ceil (8/4) == 2. any
other value would
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
--- Comment #21 from Tamar Christina ---
Created attachment 57932
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57932=edit
loop.c
attached reduced testcase that reproduces the issue and also checks the buffer
position and copied values.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
--- Comment #6 from Tamar Christina ---
(In reply to Jakub Jelinek from comment #4)
> Now, with SVE/RISCV vectors the actual vectorization factor is a poly_int
> rather than constant. One possibility would be to use VLA arrays in those
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
Bug ID: 114635
Summary: OpenMP reductions fail dependency analysis
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
1 - 100 of 797 matches
Mail list logo