[Bug middle-end/120378] Support narrowing clip idiom

2025-10-02 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #9 from Jeffrey A. Law  ---
Fixed by Edwin's patch on the trunk.

[Bug middle-end/120378] Support narrowing clip idiom

2025-09-12 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

--- Comment #8 from GCC Commits  ---
The master branch has been updated by Edwin Lu :

https://gcc.gnu.org/g:d0c08415310d1daecb28f8fbfeef3d7f18b21d56

commit r16-3831-gd0c08415310d1daecb28f8fbfeef3d7f18b21d56
Author: Edwin Lu 
Date:   Mon Sep 8 10:48:45 2025 -0700

RISC-V: Support vnclip idiom testcase [PR120378]

This patch contains testcases for PR120378 after the change made to
support the vnclipu variant of the SAT_TRUNC pattern.

PR target/120378

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr120378-1.c: New test.
* gcc.target/riscv/rvv/autovec/pr120378-2.c: New test.
* gcc.target/riscv/rvv/autovec/pr120378-3.c: New test.
* gcc.target/riscv/rvv/autovec/pr120378-4.c: New test.

Signed-off-by: Edwin Lu 

[Bug middle-end/120378] Support narrowing clip idiom

2025-07-15 Thread ewlu at rivosinc dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

--- Comment #7 from Edwin Lu  ---
Progress update:
After discussing with Vineet and Robin, the optab for generating vnclipu does
exist through IFN_SAT_TRUNC, which can be seen in a couple of the saturating
arithmetic testcases Pan has been working on. The current patterns do support
matching on gimple min/max truncates

  _3 = in_12(D) + _2;
  _4 = *_3;
  _5 = MIN_EXPR <_4, 255>;
  iftmp.0_8 = (uint8_t) _5;
  _6 = out_13(D) + _1;
  *_6 = iftmp.0_8;

https://godbolt.org/z/q7j1T9bxf

I'm currently working through adding a pattern in match.pd to recognize the
sequence x264 generates and am encountering a problem in the generated
gimple-match-4.cc. 

>   _4 = *_3;
>   x.0_12 = (unsigned int) _4;
>   _38 = -x.0_12;
>   _15 = (int) _38;
>   _16 = _15 >> 31;
>   _29 = x.0_12 > 255;
>   _17 = _29 ? _16 : _4;
>   _18 = (unsigned char) _17;

Right now, I'm doing
 (match (unsigned_integer_sat_trunc @0)
  /* SAT_U_TRUNC = X & (NT)(-1) ? (-X) >> 31 : X

 The gimple representation uses X > (NT)(-1) instead of
 using & so match on gt instead of bit_and.  */
  (cond^ (gt @0 INTEGER_CST@1)
 (rshift:s (convert? (negate @0)) INTEGER_CST@2)
 @0)

which tries to check if the statement `_4 = *_3;` and `x.0_12 = (unsigned int)
_4;` are equivalent when trying to evaluate the else statement.

[Bug middle-end/120378] Support narrowing clip idiom

2025-06-24 Thread Dusan.Stojkovic--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

Dusan Stojkovic  changed:

   What|Removed |Added

 CC||[email protected]

--- Comment #6 from Dusan Stojkovic  ---
Here are some examples from inter_pred_filter functions in x265:
https://godbolt.org/z/7asjsqcj8

They are generalized to show different variations to consider. The examples
show how introducing a temporary 
variable before storing the result of the clipping produces:
```
vmsge.viv0,v1,0
vsetvli zero,zero,e8,mf2,ta,mu
vnsrl.wiv6,v1,0,v0.t
vsetvli zero,zero,e16,m1,ta,ma
vmsle.vvv0,v1,v5
vsetvli zero,zero,e8,mf2,ta,ma
vmerge.vvm  v1,v3,v6,v0
```

Shouldn't both cases generate the vmax/vmin/truncate pattern at least?

Curiously, when doing a signed clip, there are two different approaches taken
by GCC; this time the choice doesn't involve
the type of store, but rather the size difference between the type being
clipped and the resulting type.

There is a case where at the end with two functions which could be optimized
with:
```
...
csrwi   vxrm,0
...
vnclipu.wi  v1,v1,6
...
```
Here GCC chooses vmax/vmin/truncate regardless of introducing a temporary
variable or not.

[Bug middle-end/120378] Support narrowing clip idiom

2025-05-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

--- Comment #5 from Richard Biener  ---
(In reply to Robin Dapp from comment #4)
> Does it make sense to have the vmax/vmin/truncate pattern as a fallback for
> other targets?  On riscv it would save one predicated instruction.

I think if targets decide that they could implement the optab in this way,
so I'd not do this initially.

[Bug middle-end/120378] Support narrowing clip idiom

2025-05-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

--- Comment #3 from Robin Dapp  ---
vnclipu is basically a scaling (narrowing), rounding shift with subsequent
"clip" i.e. saturation.  Its input and output is unsigned, though, so for the
function above we first need to "clip" the negative values to 0 and then shift
twice from unsigned int to unsigned char.

So far I'm only aware of the vector insn but I think there are discussions
about a scalar one for a future extension.

There's also vnclip (signed -> signed).

An alternative to vnclipu would be vmax (vmin (...)) but then we'd still need
to truncate the result 2x.  Truncations are narrowing shifts as well, meaning
we'd need 4 instructions instead of 3.

[Bug middle-end/120378] Support narrowing clip idiom

2025-05-23 Thread rdapp at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

--- Comment #4 from Robin Dapp  ---
Does it make sense to have the vmax/vmin/truncate pattern as a fallback for
other targets?  On riscv it would save one predicated instruction.

[Bug middle-end/120378] Support narrowing clip idiom

2025-05-23 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

--- Comment #2 from Richard Biener  ---
So what does vnclipu do?  But yes, the way to fix is to add an optab for this,
a vectorizer pattern and/or a match rule (in case that insn is a thing for
non-vector as well).

[Bug middle-end/120378] Support narrowing clip idiom

2025-05-21 Thread liuhongt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378

--- Comment #1 from Hongtao Liu  ---

> The ifcvt'ed code before vect is:
> 
>   _4 = *_3;
>   x.0_12 = (unsigned int) _4;
>   _38 = -x.0_12;
>   _15 = (int) _38;
>   _16 = _15 >> 31;
>   _29 = x.0_12 > 255;
>   _17 = _29 ? _16 : _4;
>   _18 = (unsigned char) _17;
> 

For the testcase in PR, I think x.0_12 > 255 must be false since it's
zero_extend from unsigned char. So the comparison can be optimized off?