[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Hongtao Liu changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #13 from Hongtao Liu --- Fixed in GCC15.1
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897
--- Comment #12 from Hongtao Liu ---
(In reply to Andrew Pinski from comment #10)
> Looks like this was fixed in GCC 15:
> ```
> foo:
> .LFB7284:
> .cfi_startproc
> vmovd %edi, %xmm2
> vmovdqa32 %zmm1, %zmm4
> kmovw m(%rip), %k1
> vpsrad %xmm2, %zmm0, %zmm4{%k1}
> vmovdqa32 %zmm4, %zmm0
> ret
>
>
> ```
>
> Though for comment #5 we get:
> ```
> foo:
> .LFB7470:
> .cfi_startproc
> vmovdqa64 %zmm0, %zmm3
> vmovd %edi, %xmm2
> vmovdqa32 %zmm1, %zmm0
> kmovw m(%rip), %k1
> vmovdqa32 %zmm1, %zmm4
> vpslld %xmm2, %zmm3, %zmm0{%k1}
> kmovw m(%rip), %k2
> vpsrad %xmm2, %zmm3, %zmm4{%k2}
> vmovdqa32 %zmm0, zzz(%rip)
> vmovdqa32 %zmm4, %zmm0
> ret
> ```
>
>
> Note the extra kmovw.
The extra kmovw is gone if you add -mavx512bw.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Hongtao Liu changed: What|Removed |Added CC||liuhongt at gcc dot gnu.org --- Comment #11 from Hongtao Liu --- Should be fixed by r15-22-gc19a674d03847b
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897
--- Comment #10 from Andrew Pinski ---
Looks like this was fixed in GCC 15:
```
foo:
.LFB7284:
.cfi_startproc
vmovd %edi, %xmm2
vmovdqa32 %zmm1, %zmm4
kmovw m(%rip), %k1
vpsrad %xmm2, %zmm0, %zmm4{%k1}
vmovdqa32 %zmm4, %zmm0
ret
```
Though for comment #5 we get:
```
foo:
.LFB7470:
.cfi_startproc
vmovdqa64 %zmm0, %zmm3
vmovd %edi, %xmm2
vmovdqa32 %zmm1, %zmm0
kmovw m(%rip), %k1
vmovdqa32 %zmm1, %zmm4
vpslld %xmm2, %zmm3, %zmm0{%k1}
kmovw m(%rip), %k2
vpsrad %xmm2, %zmm3, %zmm4{%k2}
vmovdqa32 %zmm0, zzz(%rip)
vmovdqa32 %zmm4, %zmm0
ret
```
Note the extra kmovw.
But we get for the trunk:
```
foo:
.LFB7470:
.cfi_startproc
vmovdqa64 %zmm0, %zmm3
vmovd %edi, %xmm2
vmovdqa32 %zmm1, %zmm0
kmovw m(%rip), %k1
vmovdqa32 %zmm1, %zmm4
vpslld %xmm2, %zmm3, %zmm0{%k1}
vpsrad %xmm2, %zmm3, %zmm4{%k1}
vmovdqa32 %zmm0, zzz(%rip)
vmovdqa32 %zmm4, %zmm0
ret
```
Which looks fixed.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|ubizjak at gmail dot com |unassigned at gcc dot gnu.org --- Comment #9 from Uroš Bizjak --- Oh well ... it looks that the implementation wandered into the areas of the compiler I'm not familiar with ... Unassigning myself, considering that at the end of a day, the prototype patch looks more like a band-aid for some different problem.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #8 from Jakub Jelinek --- That does something different though. But there is in C: if (targetm.calls.promote_prototypes (fundecl ? TREE_TYPE (fundecl) : 0) && INTEGRAL_TYPE_P (type) && (TYPE_PRECISION (type) < TYPE_PRECISION (integer_type_node))) parmval = default_conversion (parmval); and in C++: else if (targetm.calls.promote_prototypes (type) && INTEGRAL_TYPE_P (type) && COMPLETE_TYPE_P (type) && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE (integer_type_node))) type = integer_type_node; and else if (targetm.calls.promote_prototypes (type) && INTEGRAL_TYPE_P (type) && COMPLETE_TYPE_P (type) && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE (integer_type_node))) val = cp_perform_integral_promotions (val, complain); This shows a clear inconsistency between C and C++, C passes the FUNCTION_TYPE, while C++ passes the argument type. If all the FEs passed the FUNCTION_TYPE/METHOD_TYPE at least, then i386 target hook could decide say based on some custom attribute you'd use on those builtins. Or we could change the target hook further and pass a fndecl (if known) and type to the target hook.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #7 from Uroš Bizjak --- Another idea is to add an "nopromote" attribute to builtin decl (in ix86_Add_new_buitins), detect this attribute in TARGET_PROMOTE_PROTOTYPES and disable promotion in this case. If this approach works, we can perhaps selectively add "nopromote" attribute to masked builtins.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #6 from Marek Polacek --- There is "type generic" attribute which disables certain promotions I think (or at least float -> double).
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897
--- Comment #5 from Uroš Bizjak ---
Also,
#include
__mmask16 m;
__m512i zzz;
__m512i
foo (__m512i x, __m512i y, int a)
{
zzz = _mm512_mask_slli_epi32 (y, m, x, a);
return _mm512_mask_srai_epi32 (y, m, x, a);
}
defeats the proposed prototype patch, resulting in:
vmovd %edi, %xmm2
vmovdqa64 %zmm1, %zmm3
movzwl m(%rip), %eax
vmovdqa64 %zmm1, %zmm4
kmovw %eax, %k1
vpslld %xmm2, %zmm0, %zmm3{%k1}
vpsrad %xmm2, %zmm0, %zmm4{%k1}
vmovdqa64 %zmm3, zzz(%rip)
vmovdqa64 %zmm4, %zmm0
ret
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #4 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #3) > Because builtins are treated like any other function calls, and if > short/char args are promoted for normal calls, they are promoted for > builtins too. Indeed. Disabling TARGET_PROMOTE_PROTOTYPES gets us the direct move from memory. So... do we really need to promote all these builtins that usually result in a well known instruction? Is there a way to mark (some of ?) them with a "nopromote" flag?
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Because builtins are treated like any other function calls, and if short/char args are promoted for normal calls, they are promoted for builtins too.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897
--- Comment #2 from Uroš Bizjak ---
Hm, why does middle-end convert to integer in the first place? .optimized tree
dump reads:
foo (__m512i x, __m512i y, int a)
{
short unsigned int m.1_1;
int _2;
vector(16) int _7;
vector(16) int _8;
vector(16) int _9;
vector(8) long long int _10;
[local count: 1]:
m.1_1 = m;
_2 = (int) m.1_1;
_7 = VIEW_CONVERT_EXPR(y_5(D));
_8 = VIEW_CONVERT_EXPR(x_6(D));
_9 = __builtin_ia32_psradi512_mask (_8, a_3(D), _7, _2);
_10 = VIEW_CONVERT_EXPR<__m512i>(_9);
return _10;
}
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Uroš Bizjak changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2017-11-08 Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Ever confirmed|0 |1 --- Comment #1 from Uroš Bizjak --- Created attachment 42561 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42561&action=edit Prototype patch This patch changes predicate for mask register to nonimmediate operand. This way, combine is able to simplify sequences of zero-extend and subreg operators to a simple move.
