https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #21 from CVS Commits ---
The master branch has been updated by Hongyu Wang :
https://gcc.gnu.org/g:dc95e1e9702f2f6367bbc108c8d01169be1b66d2
commit r13-4044-gdc95e1e9702f2f6367bbc108c8d01169be1b66d2
Author: Hongyu Wang
Date: Mon
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #20 from Richard Biener ---
-fno-trapping-math tells us we are not concerned about FP exception flags (so
say spurious FP_INEXACT is OK), -fno-signalling-nans is needed as well I guess.
Oh, and in practice performing the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #19 from Andrew Pinski ---
(In reply to Hongtao.liu from comment #18)
> For vector integers it should be ok?
> For vector floating point we can add condition
> flag_unsafe_math_optimizations || !flag_trapping_math for the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #18 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #17)
> (In reply to Hongtao.liu from comment #16)
> > typedef int v4si __attribute__ ((vector_size(16)));
> >
> > v4si f(v4si a, v4si b) {
> > v4si a1 =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #17 from Andrew Pinski ---
(In reply to Hongtao.liu from comment #16)
> typedef int v4si __attribute__ ((vector_size(16)));
>
> v4si f(v4si a, v4si b) {
> v4si a1 = __builtin_shufflevector (a, a, 2, 3 ,1 ,0);
> v4si b1 =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #16 from Hongtao.liu ---
typedef int v4si __attribute__ ((vector_size(16)));
v4si f(v4si a, v4si b) {
v4si a1 = __builtin_shufflevector (a, a, 2, 3 ,1 ,0);
v4si b1 = __builtin_shufflevector (b, a, 2, 3 ,1 ,0);
return a1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #15 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #14)
> (In reply to Hongtao.liu from comment #13)
> > fold shulfps to vec_perm_exp, but still 2 shulfps are generated.
> >
> > __m128 f (__m128 a, __m128 b)
> > {
> >
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #14 from Andrew Pinski ---
(In reply to Hongtao.liu from comment #13)
> fold shulfps to vec_perm_exp, but still 2 shulfps are generated.
>
> __m128 f (__m128 a, __m128 b)
> {
> vector(4) float _3;
> vector(4) float _5;
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #13 from Hongtao.liu ---
fold shulfps to vec_perm_exp, but still 2 shulfps are generated.
__m128 f (__m128 a, __m128 b)
{
vector(4) float _3;
vector(4) float _5;
vector(4) float _6;
;; basic block 2, loop depth 0
;;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #12 from CVS Commits ---
The master branch has been updated by hongtao Liu :
https://gcc.gnu.org/g:0fa4787bf34b173ce6f198e99b6f6dd8a3f98014
commit r12-3177-g0fa4787bf34b173ce6f198e99b6f6dd8a3f98014
Author: liuhongt
Date: Fri Dec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
Jakub Jelinek changed:
What|Removed |Added
CC||jakub at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #10 from Hongtao.liu ---
A patch is posted at
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561909.html
And record jakub comments in another thread
On Tue, Jan 12, 2021 at 11:47:48AM +0100, Jakub Jelinek via Gcc-patches
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #9 from Hongtao.liu ---
(In reply to Marc Glisse from comment #8)
> (In reply to Richard Biener from comment #4)
> > We already handle IX86_BUILTIN_SHUFPD there but not IX86_BUILTIN_SHUFPS for
> > some reason.
>
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #8 from Marc Glisse ---
(In reply to Richard Biener from comment #4)
> We already handle IX86_BUILTIN_SHUFPD there but not IX86_BUILTIN_SHUFPS for
> some reason.
https://gcc.gnu.org/pipermail/gcc-patches/2019-May/521983.html
I was
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #7 from Richard Biener ---
The transform with doubles on the [1] element would produce
unpckhpd%xmm1, %xmm1
unpckhpd%xmm0, %xmm0
mulsd %xmm1, %xmm0
unpcklpd%xmm0, %xmm0
so
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #6 from Richard Biener ---
Created attachment 49695
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49695=edit
vector lowering ssa_uniform_vector_p hack
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #5 from Richard Biener ---
So
__m128d f(__m128d a, __m128d b) {
return _mm_mul_pd(_mm_shuffle_pd(a, a, 0), _mm_shuffle_pd(b, b, 0));
}
is expanded as
_3 = VEC_PERM_EXPR ;
_5 = VEC_PERM_EXPR ;
_6 = _3 * _5;
return _6;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #4 from Richard Biener ---
That works only for single-operation and doesn't really scale. I think we want
to expose the permutes at the GIMPLE level via ix86_gimple_fold_builtin. We
already handle IX86_BUILTIN_SHUFPD there but not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
--- Comment #3 from Hongtao.liu ---
;; _3 = __builtin_ia32_shufps (b_2(D), b_2(D), 0);
(insn 7 6 8 (set (reg:V4SF 88)
(reg/v:V4SF 86 [ b ])) "./gcc/include/xmmintrin.h":746:19 -1
(nil))
(insn 8 7 9 (set (reg:V4SF 89)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
Richard Biener changed:
What|Removed |Added
Target|x86_64 i?86 |x86_64-*-* i?86-*-*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
Gabriel Ravier changed:
What|Removed |Added
Summary|[x86] Failure to optimize |[x86] Failure to optimize
21 matches
Mail list logo