https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102171
--- Comment #3 from Tamar Christina ---
(In reply to Andrew Pinski from comment #2)
> I think I am going to implement this (or assign it interally to someone else
> to implement).
If you do, please also remove them from arm_neon.h and use the n
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877
--- Comment #9 from Tamar Christina ---
While RA should be able to deal with this,
shouldn't we also just lower TBLs in gimple?
This no reason why this can't be a VEC_PERM_EXPR which would also get the
copies
removed at the gimple level and allo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151
Bug ID: 114151
Summary: [14 Regression] weird and inefficient codegen and
addressing modes since
g:a0b1798042d033fd2cc2c806afbb77875dd2909b
Product: gcc
Version:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151
--- Comment #3 from Tamar Christina ---
>
> This was a correctness fix btw, so I'm not sure we can easily recover - we
> could try using niter information for CHREC_VARIABLE but then there's
> variable niter here so I don't see a chance.
>
It
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877
--- Comment #11 from Tamar Christina ---
(In reply to Andrew Pinski from comment #10)
> (In reply to Tamar Christina from comment #9)
> > While RA should be able to deal with this,
> > shouldn't we also just lower TBLs in gimple?
> >
> > This no
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98877
--- Comment #12 from Tamar Christina ---
and it's not the first time we have conditional lowering. We already do so for
e.g. shifts, where shifting by an amount => bitsize of a vector element is
defined behavior or AArch64.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114192
Tamar Christina changed:
What|Removed |Added
Ever confirmed|0 |1
Status|UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114234
Tamar Christina changed:
What|Removed |Added
Last reconfirmed||2024-03-05
Status|UNCONFI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114151
--- Comment #17 from Tamar Christina ---
> So doing in the vectorizer sth like the following should get us the best
> possible ranges? Ah, probably only global ranges since the SCEV query
> itself would still lack context sensitive info (but as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114339
--- Comment #6 from Tamar Christina ---
vectorizer generates:
mask_patt_21.19_58 = vect_perm_even_49 >= vect_cst__57;
mask_patt_21.19_59 = vect_perm_even_55 >= vect_cst__57;
vexit_reduc_63 = mask_patt_21.19_58 | mask_patt_21.19_59;
if (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345
Bug ID: 114345
Summary: FRE missing knowledge of semantics of IFN loads
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114346
Bug ID: 114346
Summary: vectorizer generates the same IV twice
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Pr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345
--- Comment #3 from Tamar Christina ---
(In reply to Andrew Pinski from comment #2)
> Oh VN does have some knowledge of MASK_STORE and LEN_STORE. Just not
> LOAD_LANES .
>
>
> See PR 106365 for MASK_STORE and LEN_STORE implementation. Shouldn'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114350
Bug ID: 114350
Summary: missing support for SVE widening floating point
conversion
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114345
--- Comment #5 from Tamar Christina ---
(In reply to Richard Biener from comment #4)
> Well, the shuffling in .LOAD_LANES will be a bit awkward to do, but sure. We
> basically lack "constant folding" of .LOAD_LANES and similarly of course
> we
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
Tamar Christina changed:
What|Removed |Added
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
--- Comment #20 from Tamar Christina ---
This is a bad interaction with early break and peeling for gaps.
when peeling for gaps we set bias_for_lowest to 0, which then negates the ceil
for the upper bound calculation when the div is exact.
We
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682
Tamar Christina changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682
--- Comment #9 from Tamar Christina ---
(In reply to Andrew Pinski from comment #8)
> This might be the path splitting running on the gimple level causing issues
> too; see PR 112402 .
Ah that's a good shout. It looks like Richi already agrees
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346
Tamar Christina changed:
What|Removed |Added
Target Milestone|11.5|14.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
--- Comme
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #14 from Tamar Christina ---
Or rather, info_for_reduction looks at the original statement if it's a
pattern, whereas vect_is_reduction only looks at the direct statement.
You'll probably want to check vect_orig_stmt if using info_f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #16 from Tamar Christina ---
(In reply to Hao Liu from comment #15)
> Ah, I see.
>
> I've sent out a quick fix patch for code review. I'll investigate more
> about this and find out the root cause.
Thanks! I can reduce a testcase
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625
--- Comment #18 from Tamar Christina ---
Hi, here's the reduced case:
> cat analyse.i
double x264_weights_analyse___trans_tmp_1;
float x264_weights_analyse_ref_mean;
x264_weights_analyse() {
x264_weights_analyse___trans_tmp_1 = floor(x2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 106346, which changed state.
Bug 106346 Summary: [11/12/13/14 Regression] Potential regression on
vectorization of left shift with constants since r11-5160-g9fc9573f9a5e94
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=1063
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106346
Tamar Christina changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958
Bug 95958 depends on bug 88212, which changed state.
Bug 88212 Summary: IRA Register Coalescing not working for the testcase
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88212
What|Removed |Added
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88212
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
Re
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111370
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
Last re
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608
--- Comment #9 from Tamar Christina ---
(In reply to Mikael Morin from comment #8)
> Created attachment 56091 [details]
> Rough patch
>
> Here is a rough patch to make the scalarizer support minloc calls.
> It regresses on minloc_1.f90 at least,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111770
Bug ID: 111770
Summary: predicated loads inactive lane values not modelled
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
--- Comme
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145
--- Comment #5 from Tamar Christina ---
(In reply to ktkachov from comment #4)
> Intersting, thanks for the background. The bigger issue I was seeing was
> with a string-matching loop like https://godbolt.org/z/E7b13915E where the
> constant poo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115974
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608
--- Comment #24 from Tamar Christina ---
(In reply to Mikael Morin from comment #23)
> (In reply to Mikael Morin from comment #21)
> >
> > (...) and should be able to submit the first
> > series (inline minloc without dim argument) this week.
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140
--- Comment #3 from Tamar Christina ---
(In reply to Jan Hubicka from comment #2)
> Looking at the change, I do not see how that could disable inlining. It
> should only reduce size of the function size estimates in the heuristics.
>
> I think
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116140
--- Comment #4 from Tamar Christina ---
It looks like it's because the old unrolled code for the pointer version did a
subtract and used the difference to optimize the IV check away to every 4
elements. This explains the increase in instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116229
Tamar Christina changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116229
Tamar Christina changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116409
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #5 from Tamar Christina ---
Yeah, This is because they generate different gimple sequences and thus
different SLP trees.
The core of the problem is there's no canonical form here, and a missing gimple
simplification rule:
_33 = IM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116463
--- Comment #11 from Tamar Christina ---
(In reply to Richard Biener from comment #6)
> I think
>
> a - ((b * -c) + (d * -e)) -> a + (b * c) + (d * e)
>
> is a good simplification to be made, but it's difficult to do this with
> canonicali
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520
--- Comment #3 from Tamar Christina ---
(In reply to Richard Biener from comment #2)
> The issue seems to be that if-conversion isn't done:
>
> Can not ifcvt due to multiple exits
>
> maybe my patched dev tree arrives with a different CFG here
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116520
--- Comment #4 from Tamar Christina ---
(In reply to Tamar Christina from comment #3)
> (In reply to Richard Biener from comment #2)
> > The issue seems to be that if-conversion isn't done:
>
> I wonder if this transformation is really beneficia
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116541
Tamar Christina changed:
What|Removed |Added
CC||wilco at gcc dot gnu.org
Ever con
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36010
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
--- Commen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575
Bug ID: 116575
Summary: [15 Regression] blender in SPEC2017 ICE in
vect_analyze_slp
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: ice-on-valid-code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116577
Bug ID: 116577
Summary: [15 Regression] tonto in SPECCPU 2006 ICEs in
vect_lower_load_permutations
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: ice
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116575
--- Comment #1 from Tamar Christina ---
---
int a;
float *b, *c;
void d() {
char *e;
for (; a; a++, b += 4, c += 4)
if (*e++) {
float *f = c;
f[0] = b[0];
f[1] = b[1];
f[2] = b[2];
f[3] = b[3];
}
}
comp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116577
--- Comment #2 from Tamar Christina ---
---
module type
type a
complex(kind(1.0d0)) j
real(kind(1.0d0)) k
real(kind(1.0d0)) l
end type
contains
subroutine b(c,g)
type(a), dimension(:) :: c
target c
type(a), dim
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116577
--- Comment #3 from Tamar Christina ---
reproducer should be saved with extension .f90
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116628
--- Comment #3 from Tamar Christina ---
Still seems to ICE after that commit on last night's trunk
https://godbolt.org/z/GnYT7Kx46
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116628
--- Comment #5 from Tamar Christina ---
(In reply to Richard Biener from comment #4)
> Confirmed. The ICE means we've "fatally" failed to analyze an epilogue
> which we do not expect.
>
> t.c:4:21: note: worklist: examine stmt: .MASK_STORE (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116628
Tamar Christina changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116628
Tamar Christina changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866
Tamar Christina changed:
What|Removed |Added
Resolution|FIXED |---
Status|RESOLVED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 115866, which changed state.
Bug 115866 Summary: missed optimization vectorizing switch statements.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866
What|Removed |Added
--
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115130
Bug 115130 depends on bug 115866, which changed state.
Bug 115866 Summary: missed optimization vectorizing switch statements.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115866
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116667
Bug ID: 116667
Summary: missing superfluous zero-extends of SVE values
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116684
Tamar Christina changed:
What|Removed |Added
CC||victorldn at gcc dot gnu.org
--- Comm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153
Bug ID: 109153
Summary: missed vector constructor optimizations
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
P
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153
--- Comment #3 from Tamar Christina ---
(In reply to Richard Biener from comment #2)
> On the GIMPLE side we should canonicalize here I think, at which point
> inserts into a splatted vector become more profitable depends?
>
> _4 = VEC_PERM_E
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153
Tamar Christina changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156
Bug ID: 109156
Summary: Support Absolute Difference detection in GCC
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
--- Comment #1 from Tamar Christina ---
Thanks for the report, taking a look!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156
--- Comment #2 from Tamar Christina ---
(In reply to Richard Biener from comment #1)
> (In reply to Tamar Christina from comment #0)
> > 2. It looks like all targets that implement SAD do so with an instruction
> > that does ABD and then perform
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109156
Tamar Christina changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
--- Comment #2 from Tamar Christina ---
Confirmed, It looks like the extra range information from
g:4fbe3e6aa74dae5c75a73c46ae6683fdecd1a75d is leading jump threading down the
wrong path.
Reduced testcase:
---
int etot_0, fasten_main_natpro_ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
Tamar Christina changed:
What|Removed |Added
Summary|[13 regression] aarch64 |[13 regression] jump
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109230
--- Comment #1 from Tamar Christina ---
That patch only fixed the bootstrap, in any case I'm on holidays so have asked
someone else to look.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109230
--- Comment #11 from Tamar Christina ---
Neither of those vec_perms are valid targets for this optimization.
It looks like sel.series_p is not doing what I expected. It's matching even
elements and ignoring the odd ones.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
--- Comment #24 from Tamar Christina ---
(In reply to Jakub Jelinek from comment #12)
> (In reply to Richard Biener from comment #11)
> > _1 shoud be [-Inf, nextafter (0.0, -Inf)], not [-Inf, -0.0]
> The reduced testcase is invalid because it us
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
--- Comment #25 from Tamar Christina ---
Created attachment 54777
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54777&action=edit
extracted codegen
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109391
Bug ID: 109391
Summary: Inefficient codegen on AArch64 when structure types
are returned
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Keywords: missed-optimi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
--- Comment #42 from Tamar Christina ---
Thanks for all the work so far folks!
Just to clarify the current state, it looks like the first reduced testcase is
now correct.
But the larger example as in c26 is still suboptimal, but slightly bette
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587
Bug ID: 109587
Summary: Deeply nested loop unrolling overwhelms register
allocator
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587
--- Comment #4 from Tamar Christina ---
(In reply to Richard Biener from comment #3)
> The issue isn't unrolling but invariant motion. We unroll the innermost
> loop, vectorizer the middle loop and then unroll that as well. That leaves
> us wi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587
--- Comment #7 from Tamar Christina ---
(In reply to Richard Biener from comment #5)
> (In reply to Tamar Christina from comment #4)
> > (In reply to Richard Biener from comment #3)
> > > The issue isn't unrolling but invariant motion. We unrol
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
--- Comment #54 from Tamar Christina ---
@Jakub, just to check to avoid doing duplicate work, did you intend to do the
remaining ifcvt changes or should we?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154
Tamar Christina changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot
gnu.org
-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632
Bug ID: 109632
Summary: Inefficient codegen when complex numbers are emulated
with structs
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-opti
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632
--- Comment #2 from Tamar Christina ---
(In reply to Richard Biener from comment #1)
> Well, the usual unknown ABI boundary at function entry/exit.
Yes but LLVM gets it right, so should be a solve able computer science problem.
:)
Note that th
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632
--- Comment #3 from Tamar Christina ---
note that even if we can't stop SLP, we should be able to generate as efficient
code by being creative about the instruction selection, that's why I marked it
as a target bug :)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632
--- Comment #6 from Tamar Christina ---
That's an interesting approach, I think it would also fix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109391 would it not? Since the
int16x8x3_t return would be "scalarized" avoiding the bad expansion?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109632
--- Comment #9 from Tamar Christina ---
Thank you!
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711
--- Comment #5 from Tamar Christina ---
(In reply to Martin Liška from comment #3)
> Hm, on x86_64-linux-gnu, it started with r13-6616-g2246d576f922ba.
$ cat prtest2.c
void lspf2lpc();
int interpolate_lpc_q_0;
void
interpolate_lpc(int subfram
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109711
--- Comment #6 from Tamar Christina ---
my own bisect does indeed end up at r14-377-gc92b8be9b52b7e and cannot
reproduce it on GCC 13.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114575
Bug ID: 114575
Summary: [14 Regression] SVE addressing modes broken since
g:839bc42772ba7af66af3bd16efed4a69511312ae
Product: gcc
Version: 14.0
Status: UNCONFIRMED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114515
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114510
Tamar Christina changed:
What|Removed |Added
CC||tnfchris at gcc dot gnu.org
--- Comme
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114577
Bug ID: 114577
Summary: Inefficient codegen for SVE/NEON bridge
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
P
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
Bug ID: 114635
Summary: OpenMP reductions fail dependency analysis
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635
--- Comment #6 from Tamar Christina ---
(In reply to Jakub Jelinek from comment #4)
> Now, with SVE/RISCV vectors the actual vectorization factor is a poly_int
> rather than constant. One possibility would be to use VLA arrays in those
> cases,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
--- Comment #21 from Tamar Christina ---
Created attachment 57932
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57932&action=edit
loop.c
attached reduced testcase that reproduces the issue and also checks the buffer
position and copied v
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
--- Comment #22 from Tamar Christina ---
note that due to the secondary exit the actual full vector iteration count is 8
scalar elements at VF=4 == 2.
And it's this boundary condition where we fail, since ceil (8/4) == 2. any
other value would
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
--- Comment #24 from Tamar Christina ---
(In reply to Richard Biener from comment #23)
> Maybe easier to understand testcase:
>
> with -O3 -msse4.1 -fno-vect-cost-model we return 20 instead of 8. Adding
> -fdisable-tree-cunroll avoids the issu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
--- Comment #26 from Tamar Christina ---
(In reply to Richard Biener from comment #25)
> That means, when the loop takes the early exit we _must_ take that during
> the vector iterations. Peeling for gaps means if we would take the early
> exit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403
Tamar Christina changed:
What|Removed |Added
Status|ASSIGNED|RESOLVED
Resolution|---
401 - 500 of 856 matches
Mail list logo