https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257
--- Comment #6 from Hongtao.liu ---
> const __m128i in = _mm_setr_epi16(val_0, val_1, val_2, 0, 0, 0, 0, 0);
in ix86_expand_vector_init, we can generate asm like
vmovd val_0, %xmm0
pinsrw $1, val_1, %xmm0
pinsrw $2, val_2, %xmm0
and l
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257
--- Comment #5 from Hongtao.liu ---
(In reply to Andrew Pinski from comment #4)
> (In reply to Hongtao.liu from comment #2)
> > for vec_init, if higher part is zero, we can use vmovd/vmovq instead of
> > vector concat.
>
> That is related to PR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257
--- Comment #4 from Andrew Pinski ---
(In reply to Hongtao.liu from comment #2)
> for vec_init, if higher part is zero, we can use vmovd/vmovq instead of
> vector concat.
That is related to PR 94680 if not the same.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257
--- Comment #3 from Richard Biener ---
Confirmed. We fail to elide the 'pixel' temporary, that is, express
memcpy (&pixel, src_33, 6);
_1 = pixel.b;
_2 = pixel.g;
_3 = pixel.r;
in terms of loads from src. Then the backend intrinsic e
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257
Hongtao.liu changed:
What|Removed |Added
CC||crazylht at gmail dot com
--- Comment #2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257
Andrew Pinski changed:
What|Removed |Added
Severity|normal |enhancement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257
--- Comment #1 from Andrew Pinski ---
Looks like a few missed optimizations at the tree level (and a target issue of
the store):
memcpy (&pixel, src_33, 6);
_1 = pixel.b;
_2 = pixel.g;
_3 = pixel.r;
val_2.0_21 = (short int) _1;
val_1