https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #12 from Hongtao.liu ---
> That's pretty good, but VMOVD eax, xmm0 would be more efficient than
> VPEXTRW when we don't need to avoid high garbage (because it's a return
> value in this case).
And TARGET_AVX512FP16 has vmovw.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #11 from Peter Cordes ---
Also, horizontal byte sums are generally best done with VPSADBW against a zero
vector, even if that means some fiddling to flip to unsigned first and then
undo the bias.
simde_vaddlv_s8:
vpxorxmm0,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
Peter Cordes changed:
What|Removed |Added
CC||peter at cordes dot ca
--- Comment #10
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #9 from CVS Commits ---
The master branch has been updated by hongtao Liu :
https://gcc.gnu.org/g:77ca2cfcdcccee3c8e8aeaf1d03e9920893d2486
commit r12-4241-g77ca2cfcdcccee3c8e8aeaf1d03e9920893d2486
Author: liuhongt
Date: Tue Sep
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #8 from rguenther at suse dot de ---
On Tue, 28 Sep 2021, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
>
> --- Comment #7 from Hongtao.liu ---
> After supporting v4hi reduce, gimple seems
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #7 from Hongtao.liu ---
After supporting v4hi reduce, gimple seems not optimal to convert v8qi to v8hi.
6 vector(4) short int vect__21.36;
7 vector(4) unsigned short vect__2.31;
8 int16_t stmp_r_17.17;
9 vector(8) short int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #6 from Richard Biener ---
The vectorizer looks for a way to "shift" the whole vector by either vec_shr
or a corresponding vec_perm with constant shuffle operands. When the target
provides none of those you get element extracts and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #5 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #4)
> >
> > But for the case in PR, it's v8qi -> 2 v4hi, and no vector reduction for
> > v4hi.
>
> We need add (define_expand "reduc_plus_scal_v4hi" just like
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #4 from Hongtao.liu ---
>
> But for the case in PR, it's v8qi -> 2 v4hi, and no vector reduction for
> v4hi.
We need add (define_expand "reduc_plus_scal_v4hi" just like (define_expand
"reduc_plus_scal_v8qi" in mmx.md.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494
--- Comment #3 from Hongtao.liu ---
(In reply to Hongtao.liu from comment #2)
> It seems x86 doesn't supports optab reduc_plus_scal_v8hi yet.
vectorizer does the work for backend.
typedef short v8hi __attribute__((vector_size(16)));
short
10 matches
Mail list logo