[Bug middle-end/120378] Support narrowing clip idiom
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 Jeffrey A. Law changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #9 from Jeffrey A. Law --- Fixed by Edwin's patch on the trunk.
[Bug middle-end/120378] Support narrowing clip idiom
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 --- Comment #8 from GCC Commits --- The master branch has been updated by Edwin Lu : https://gcc.gnu.org/g:d0c08415310d1daecb28f8fbfeef3d7f18b21d56 commit r16-3831-gd0c08415310d1daecb28f8fbfeef3d7f18b21d56 Author: Edwin Lu Date: Mon Sep 8 10:48:45 2025 -0700 RISC-V: Support vnclip idiom testcase [PR120378] This patch contains testcases for PR120378 after the change made to support the vnclipu variant of the SAT_TRUNC pattern. PR target/120378 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr120378-1.c: New test. * gcc.target/riscv/rvv/autovec/pr120378-2.c: New test. * gcc.target/riscv/rvv/autovec/pr120378-3.c: New test. * gcc.target/riscv/rvv/autovec/pr120378-4.c: New test. Signed-off-by: Edwin Lu
[Bug middle-end/120378] Support narrowing clip idiom
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 --- Comment #7 from Edwin Lu --- Progress update: After discussing with Vineet and Robin, the optab for generating vnclipu does exist through IFN_SAT_TRUNC, which can be seen in a couple of the saturating arithmetic testcases Pan has been working on. The current patterns do support matching on gimple min/max truncates _3 = in_12(D) + _2; _4 = *_3; _5 = MIN_EXPR <_4, 255>; iftmp.0_8 = (uint8_t) _5; _6 = out_13(D) + _1; *_6 = iftmp.0_8; https://godbolt.org/z/q7j1T9bxf I'm currently working through adding a pattern in match.pd to recognize the sequence x264 generates and am encountering a problem in the generated gimple-match-4.cc. > _4 = *_3; > x.0_12 = (unsigned int) _4; > _38 = -x.0_12; > _15 = (int) _38; > _16 = _15 >> 31; > _29 = x.0_12 > 255; > _17 = _29 ? _16 : _4; > _18 = (unsigned char) _17; Right now, I'm doing (match (unsigned_integer_sat_trunc @0) /* SAT_U_TRUNC = X & (NT)(-1) ? (-X) >> 31 : X The gimple representation uses X > (NT)(-1) instead of using & so match on gt instead of bit_and. */ (cond^ (gt @0 INTEGER_CST@1) (rshift:s (convert? (negate @0)) INTEGER_CST@2) @0) which tries to check if the statement `_4 = *_3;` and `x.0_12 = (unsigned int) _4;` are equivalent when trying to evaluate the else statement.
[Bug middle-end/120378] Support narrowing clip idiom
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 Dusan Stojkovic changed: What|Removed |Added CC||[email protected] --- Comment #6 from Dusan Stojkovic --- Here are some examples from inter_pred_filter functions in x265: https://godbolt.org/z/7asjsqcj8 They are generalized to show different variations to consider. The examples show how introducing a temporary variable before storing the result of the clipping produces: ``` vmsge.viv0,v1,0 vsetvli zero,zero,e8,mf2,ta,mu vnsrl.wiv6,v1,0,v0.t vsetvli zero,zero,e16,m1,ta,ma vmsle.vvv0,v1,v5 vsetvli zero,zero,e8,mf2,ta,ma vmerge.vvm v1,v3,v6,v0 ``` Shouldn't both cases generate the vmax/vmin/truncate pattern at least? Curiously, when doing a signed clip, there are two different approaches taken by GCC; this time the choice doesn't involve the type of store, but rather the size difference between the type being clipped and the resulting type. There is a case where at the end with two functions which could be optimized with: ``` ... csrwi vxrm,0 ... vnclipu.wi v1,v1,6 ... ``` Here GCC chooses vmax/vmin/truncate regardless of introducing a temporary variable or not.
[Bug middle-end/120378] Support narrowing clip idiom
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 --- Comment #5 from Richard Biener --- (In reply to Robin Dapp from comment #4) > Does it make sense to have the vmax/vmin/truncate pattern as a fallback for > other targets? On riscv it would save one predicated instruction. I think if targets decide that they could implement the optab in this way, so I'd not do this initially.
[Bug middle-end/120378] Support narrowing clip idiom
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 --- Comment #3 from Robin Dapp --- vnclipu is basically a scaling (narrowing), rounding shift with subsequent "clip" i.e. saturation. Its input and output is unsigned, though, so for the function above we first need to "clip" the negative values to 0 and then shift twice from unsigned int to unsigned char. So far I'm only aware of the vector insn but I think there are discussions about a scalar one for a future extension. There's also vnclip (signed -> signed). An alternative to vnclipu would be vmax (vmin (...)) but then we'd still need to truncate the result 2x. Truncations are narrowing shifts as well, meaning we'd need 4 instructions instead of 3.
[Bug middle-end/120378] Support narrowing clip idiom
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 --- Comment #4 from Robin Dapp --- Does it make sense to have the vmax/vmin/truncate pattern as a fallback for other targets? On riscv it would save one predicated instruction.
[Bug middle-end/120378] Support narrowing clip idiom
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 --- Comment #2 from Richard Biener --- So what does vnclipu do? But yes, the way to fix is to add an optab for this, a vectorizer pattern and/or a match rule (in case that insn is a thing for non-vector as well).
[Bug middle-end/120378] Support narrowing clip idiom
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120378 --- Comment #1 from Hongtao Liu --- > The ifcvt'ed code before vect is: > > _4 = *_3; > x.0_12 = (unsigned int) _4; > _38 = -x.0_12; > _15 = (int) _38; > _16 = _15 >> 31; > _29 = x.0_12 > 255; > _17 = _29 ? _16 : _4; > _18 = (unsigned char) _17; > For the testcase in PR, I think x.0_12 > 255 must be false since it's zero_extend from unsigned char. So the comparison can be optimized off?
