[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-10-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #12 from Hongtao.liu --- > That's pretty good, but VMOVD eax, xmm0 would be more efficient than > VPEXTRW when we don't need to avoid high garbage (because it's a return > value in this case). And TARGET_AVX512FP16 has vmovw.

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-10-25 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #11 from Peter Cordes --- Also, horizontal byte sums are generally best done with VPSADBW against a zero vector, even if that means some fiddling to flip to unsigned first and then undo the bias. simde_vaddlv_s8: vpxorxmm0,

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-10-25 Thread peter at cordes dot ca via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 Peter Cordes changed: What|Removed |Added CC||peter at cordes dot ca --- Comment #10

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-10-07 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #9 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:77ca2cfcdcccee3c8e8aeaf1d03e9920893d2486 commit r12-4241-g77ca2cfcdcccee3c8e8aeaf1d03e9920893d2486 Author: liuhongt Date: Tue Sep

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-09-28 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #8 from rguenther at suse dot de --- On Tue, 28 Sep 2021, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 > > --- Comment #7 from Hongtao.liu --- > After supporting v4hi reduce, gimple seems

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-09-28 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #7 from Hongtao.liu --- After supporting v4hi reduce, gimple seems not optimal to convert v8qi to v8hi. 6 vector(4) short int vect__21.36; 7 vector(4) unsigned short vect__2.31; 8 int16_t stmp_r_17.17; 9 vector(8) short int

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-09-27 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #6 from Richard Biener --- The vectorizer looks for a way to "shift" the whole vector by either vec_shr or a corresponding vec_perm with constant shuffle operands. When the target provides none of those you get element extracts and

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-09-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #5 from Hongtao.liu --- (In reply to Hongtao.liu from comment #4) > > > > But for the case in PR, it's v8qi -> 2 v4hi, and no vector reduction for > > v4hi. > > We need add (define_expand "reduc_plus_scal_v4hi" just like

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-09-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #4 from Hongtao.liu --- > > But for the case in PR, it's v8qi -> 2 v4hi, and no vector reduction for > v4hi. We need add (define_expand "reduc_plus_scal_v4hi" just like (define_expand "reduc_plus_scal_v8qi" in mmx.md.

[Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP

2021-09-26 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #3 from Hongtao.liu --- (In reply to Hongtao.liu from comment #2) > It seems x86 doesn't supports optab reduc_plus_scal_v8hi yet. vectorizer does the work for backend. typedef short v8hi __attribute__((vector_size(16))); short