[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2025-06-03 Thread pzheng at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

Pengxuan Zheng  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #10 from Pengxuan Zheng  ---
Fixed.

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2025-05-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #9 from GCC Commits  ---
The master branch has been updated by Pengxuan Zheng :

https://gcc.gnu.org/g:265fdb3fa91346f1be40111a9f3e8a0838f7d7fd

commit r16-704-g265fdb3fa91346f1be40111a9f3e8a0838f7d7fd
Author: Pengxuan Zheng 
Date:   Mon May 12 10:21:49 2025 -0700

aarch64: Add more vector permute tests for the FMOV optimization [PR100165]

This patch adds more tests for vector permutes which can now be optimized
as
FMOV with the generic PERM change and the aarch64 AND patch.

Changes since v1:
* v2: Add -mlittle-endian to the little endian tests explicitly and rename
the
tests accordingly.

PR target/100165

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fmov-3-be.c: New test.
* gcc.target/aarch64/fmov-3-le.c: New test.
* gcc.target/aarch64/fmov-4-be.c: New test.
* gcc.target/aarch64/fmov-4-le.c: New test.
* gcc.target/aarch64/fmov-5-be.c: New test.
* gcc.target/aarch64/fmov-5-le.c: New test.

Signed-off-by: Pengxuan Zheng 

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2025-05-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Pengxuan Zheng :

https://gcc.gnu.org/g:0417a630811404c2362060b7e15f99e5a4a0d76a

commit r16-703-g0417a630811404c2362060b7e15f99e5a4a0d76a
Author: Pengxuan Zheng 
Date:   Mon May 12 10:12:11 2025 -0700

aarch64: Optimize AND with certain vector of immediates as FMOV [PR100165]

We can optimize AND with certain vector of immediates as FMOV if the result
of
the AND is as if the upper lane of the input vector is set to zero and the
lower
lane remains unchanged.

For example, at present:

v4hi
f_v4hi (v4hi x)
{
  return x & (v4hi){ 0x, 0x, 0, 0 };
}

generates:

f_v4hi:
movid31, 0x
and v0.8b, v0.8b, v31.8b
ret

With this patch, it generates:

f_v4hi:
fmovs0, s0
ret

Changes since v1:
* v2: Simplify the mask checking logic by using native_decode_int and
address a
few other review comments.

PR target/100165

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_output_fmov): New
prototype.
(aarch64_simd_valid_and_imm_fmov): Likewise.
* config/aarch64/aarch64-simd.md (and3): Allow
FMOV
codegen.
* config/aarch64/aarch64.cc (aarch64_simd_valid_and_imm_fmov): New.
(aarch64_output_fmov): Likewise.
* config/aarch64/constraints.md (Df): New constraint.
* config/aarch64/predicates.md (aarch64_reg_or_and_imm): Update
predicate to support FMOV codegen.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/fmov-1-be.c: New test.
* gcc.target/aarch64/fmov-1-le.c: New test.
* gcc.target/aarch64/fmov-2-be.c: New test.
* gcc.target/aarch64/fmov-2-le.c: New test.

Signed-off-by: Pengxuan Zheng 

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2025-05-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #7 from GCC Commits  ---
The master branch has been updated by Pengxuan Zheng :

https://gcc.gnu.org/g:dc501cb0dc857663f7fa762f3dbf0ae60973d2c3

commit r16-702-gdc501cb0dc857663f7fa762f3dbf0ae60973d2c3
Author: Pengxuan Zheng 
Date:   Wed May 7 10:47:37 2025 -0700

aarch64: Recognize vector permute patterns which can be interpreted as AND
[PR100165]

Certain permute that blends a vector with zero can be interpreted as an AND
of a
mask. This idea was suggested by Richard Sandiford when he was reviewing my
patch which tries to optimizes certain vector permute with the FMOV
instruction
for the aarch64 target.

For example, for the aarch64 target, at present:

v4hi
f_v4hi (v4hi x)
{
  return __builtin_shuffle (x, (v4hi){ 0, 0, 0, 0 }, (v4hi){ 4, 1, 6, 3 });
}

generates:

f_v4hi:
uzp1v0.2d, v0.2d, v0.2d
adrpx0, .LC0
ldr d31, [x0, #:lo12:.LC0]
tbl v0.8b, {v0.16b}, v31.8b
ret
.LC0:
.byte   -1
.byte   -1
.byte   2
.byte   3
.byte   -1
.byte   -1
.byte   6
.byte   7

With this patch, it generates:

f_v4hi:
mvniv31.2s, 0xff, msl 8
and v0.8b, v0.8b, v31.8b
ret

This patch also provides a target-independent routine for detecting vector
permute patterns which can be interpreted as AND.

Changes since v1:
* v2: Rework the patch to only perform the optimization for aarch64 by
calling
the target independent routine vec_perm_and_mask.

PR target/100165

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_evpc_and): New.
(aarch64_expand_vec_perm_const_1): Call aarch64_evpc_and.
* optabs.cc (vec_perm_and_mask): New.
* optabs.h (vec_perm_and_mask): New prototype.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/and-be.c: New test.
* gcc.target/aarch64/and-le.c: New test.

Signed-off-by: Pengxuan Zheng 

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2024-10-31 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

Andrew Pinski  changed:

   What|Removed |Added

URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2024-October
   ||/667088.html
   Assignee|pinskia at gcc dot gnu.org |pzheng at gcc dot 
gnu.org
   Keywords||patch

--- Comment #6 from Andrew Pinski  ---
Patch was posted:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/667088.html

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2024-02-27 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #5 from Andrew Pinski  ---
For the ones which produce ins, it should be easy to modify the pattern to emit
fmov for those cases, that is `elt == 0`:

(define_insn "aarch64_simd_vec_set_zero"
  [(set (match_operand:VALLS_F16 0 "register_operand" "=w")
(vec_merge:VALLS_F16
(match_operand:VALLS_F16 1 "aarch64_simd_imm_zero" "")
(match_operand:VALLS_F16 3 "register_operand" "0")
(match_operand:SI 2 "immediate_operand" "i")))]
  "TARGET_SIMD && exact_log2 (INTVAL (operands[2])) >= 0"
  {
int elt = ENDIAN_LANE_N (, exact_log2 (INTVAL (operands[2])));
operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt);
return "ins\\t%0.[%p2], zr";
  }
)

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2023-11-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2023-11-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

Andrew Pinski  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |pinskia at gcc dot 
gnu.org

--- Comment #4 from Andrew Pinski  ---
Mine, I will handle this. Most likely for GCC 15 though.

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2023-11-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-11-12
 Ever confirmed|0   |1

--- Comment #3 from Andrew Pinski  ---
Currently the trunk produces:
```
foo:
ins v0.d[1], xzr
ret
foo1:
moviv31.4s, 0
zip1v0.2d, v0.2d, v31.2d
ret
foo2:
ins v0.d[1], xzr
ret
```

Which is better than 10.x but still not using fmov.

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2023-11-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #2 from Andrew Pinski  ---
Created attachment 56564
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56564&action=edit
Full testcase

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

2021-08-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165

--- Comment #1 from Andrew Pinski  ---
This 
V
foo (V x)
{
  return __builtin_shuffle (x, (V) { 0, 0, 0, 0,  }, (VI) { 0, 1, 6, 7});
}

Produces:
moviv1.4s, 0
ins v0.d[1], v1.d[1]

Which is better but fmov is still better :).