[PATCH-3] Value Range: Add range op for builtin isnormal

2024-05-20 Thread HAO CHEN GUI
Hi, This patch adds the range op for builtin isnormal. It also adds two help function in frange to detect range of normal floating-point and range of subnormal or zero. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui

[PATCH-2v3] Value Range: Add range op for builtin isfinite

2024-05-20 Thread HAO CHEN GUI
Hi, This patch adds the range op for builtin isfinite. Compared to previous version, the main change is to set varying if nothing is known about the range. https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650857.html Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no

[PATCH-1v2] Value Range: Add range op for builtin isinf

2024-05-20 Thread HAO CHEN GUI
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem

[PATCHv2] Optab: add isnormal_optab for __builtin_isnormal

2024-05-20 Thread HAO CHEN GUI
Hi, This patch adds an optab for __builtin_isnormal. The normal check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version,

[PATCHv2] Optab: add isfinite_optab for __builtin_isfinite

2024-05-20 Thread HAO CHEN GUI
Hi, This patch adds an optab for __builtin_isfinite. The finite check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Compared to previous version,

Re: [PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-05-19 Thread HAO CHEN GUI
Hi Andrew, 在 2024/5/19 3:42, Andrew Pinski 写道: > This is missing adding documentation for the new optab. > It should be documented in md.texi under `Standard Pattern Names For > Generation` section. Thanks for your reminder. I will add ones for all patches. Thanks Gui Haochen

[PATCH-3v2, rs6000] Implement optab_isnormal for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi, This patch implemented optab_isnormal for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is not to test if pseudo can be created in expand and modify dg-options and dg-finals of test cases according to reviewer's advice.

[PATCH-2v2, rs6000] Implement optab_isfinite for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi, This patch implemented optab_isfinite for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is not to test if pseudo can be created in expand and modify dg-options and dg-finals of test cases according to reviewer's advice.

[PATCH-1v2, rs6000] Implement optab_isinf for SFDF and IEEE128

2024-05-19 Thread HAO CHEN GUI
Hi, This patch implemented optab_isinf for SFDF and IEEE128 by test data class instructions. Compared with previous version, the main change is to modify the dg-options and dg-finals of test cases according to reviewer's advice. https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html

Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-16 Thread HAO CHEN GUI
Hi Segher, Thanks for your review comments. I will modify it and resend. Just one question on the insn condition. 在 2024/5/17 1:25, Segher Boessenkool 写道: >> +(define_expand "isnormal2" >> + [(use (match_operand:SI 0 "gpc_reg_operand")) >> +(use (match_operand:SFDF 1 "gpc_reg_operand"))]

Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-15 Thread HAO CHEN GUI
Hi Andrew, Thanks so much for your explanation. I got it. I will address the issue. Thanks Gui Haochen 在 2024/5/15 2:45, Andrew MacLeod 写道: > > On 5/9/24 04:47, HAO CHEN GUI wrote: >> Hi Mikael, >> >>    Thanks for your comments. >> >> 在 2024/5/9

Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Jakub, Thanks for your review comments. 在 2024/5/14 23:57, Jakub Jelinek 写道: > BUILT_IN_ISFINITE is just one of many BUILT_IN_IS... builtins, > would be nice to handle the others as well. > > E.g. isnormal/isnan/isinf, fpclassify etc. > Yes, I already sent the patches which add range op

Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-14 Thread HAO CHEN GUI
Hi Mikael, Thanks for your comments. 在 2024/5/9 16:03, Mikael Morin 写道: > I think the canonical API behaviour sets R to varying and returns true > instead of just returning false if nothing is known about the range. > > I'm not sure whether it makes any difference; Aldy can probably tell.

Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-14 Thread HAO CHEN GUI
Hi, 在 2024/5/10 20:50, Richard Biener 写道: > IMO give we're dispatching to the rtx_cost hook eventually it needs > documenting there or alternatively catching zero and adjusting its > result there. Of course cost == 0 ? 1 : cost is wrong as it makes > zero vs. one the same cost - using cost + 1

Re: [PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-13 Thread HAO CHEN GUI
Hi Aldy, Thanks for your review comments. 在 2024/5/13 19:18, Aldy Hernandez 写道: > On Thu, May 9, 2024 at 10:05 AM Mikael Morin wrote: >> >> Hello, >> >> Le 07/05/2024 à 04:37, HAO CHEN GUI a écrit : >>> Hi, >>>The former patch adds isf

Ping [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-12 Thread HAO CHEN GUI
] Replace explicit CC bit reverse with common format https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650766.html [PATCH-6, rs6000] Split setcc to two insns after reload https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650856.html Thanks Gui Haochen 在 2024/4/30 15:18, HAO CHEN GUI 写道: > Hi, >

[PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-12 Thread HAO CHEN GUI
Hi, The cost return from set_src_cost might be zero. Zero for pattern_cost means unknown cost. So the regularization converts the zero to COSTS_N_INSNS (1). // pattern_cost cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed); return cost > 0 ? cost : COSTS_N_INSNS

Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-10 Thread HAO CHEN GUI
Hi Richard, Thanks for your comments. 在 2024/5/10 15:16, Richard Biener 写道: > But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no > longer meaningful. So shouldn't it instead be > > return cost > 0 ? cost : 1; Yes, it's better. > > ? Alternatively returning fractions of

[PATCHv2] rs6000: Enable overlapped by-pieces operations

2024-05-10 Thread HAO CHEN GUI
Hi, This patch enables overlapped by-piece operations. On rs6000, default move/set/clear ratio is 2. So the overlap is only enabled with compare by-pieces. Compared to previous version, the change is to remove power8 requirement from test case.

[PATCH-1v2] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-05-09 Thread HAO CHEN GUI
Hi, This patch replaces rtx_cost with insn_cost in forward propagation. In the PR, one constant vector should be propagated and replace a pseudo in a store insn if we know it's a duplicated constant vector. It reduces the insn cost but not rtx cost. In this case, the cost is determined by

Re: [PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-09 Thread HAO CHEN GUI
Hi Kewen, 在 2024/5/9 13:44, Kewen.Lin 写道: > Why does it need power8 forced here? I think it over. It's no need. For the sub-targets which library is called, l[hb]z won't be generated too. Thanks Gui Haochen

Re: [PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-09 Thread HAO CHEN GUI
Hi Kewen, Thanks for your comments. 在 2024/5/9 13:44, Kewen.Lin 写道: > Hi, > > on 2024/5/8 14:47, HAO CHEN GUI wrote: >> Hi, >> This patch enables overlapped by-piece operations. On rs6000, default >> move/set/clear ratio is 2. So the overlap is only enable

[PATCH] rs6000: Enable overlapped by-pieces operations

2024-05-08 Thread HAO CHEN GUI
Hi, This patch enables overlapped by-piece operations. On rs6000, default move/set/clear ratio is 2. So the overlap is only enabled with compare by-pieces. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the trunk? Thanks Gui Haochen ChangeLog rs6000:

Ping^3 [PATCH, rs6000] Split TImode for logical operations in expand pass [PR100694]

2024-05-07 Thread HAO CHEN GUI
Hi, As now it's stage-1, gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html Gui Haochen Thanks 在 2023/4/24 13:35, HAO CHEN GUI 写道: > Hi, > Gently ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611550.html > > Thank

Ping [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-05-07 Thread HAO CHEN GUI
Hi, Gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html Thanks Gui Haochen 在 2024/3/18 17:10, HAO CHEN GUI 写道: > Hi, > Gently ping this: > https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html > > Thanks > Gui Haochen > >

Re: [Patch, rs6000] Enable overlap memory store for block memory clear

2024-05-07 Thread HAO CHEN GUI
Hi, As now it's stage 1, gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646478.html Thanks Gui Haochen 在 2024/2/26 10:25, HAO CHEN GUI 写道: > Hi, > This patch enables overlap memory store for block memory clear which > saves the number of store ins

[PATCHv2] Value range: Add range op for __builtin_isfinite

2024-05-06 Thread HAO CHEN GUI
Hi, The former patch adds isfinite optab for __builtin_isfinite. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html Thus the builtin might not be folded at front end. The range op for isfinite is needed for value range analysis. This patch adds them. Compared to last version,

[PATCH-6, rs6000] Split setcc to two insns after reload

2024-05-06 Thread HAO CHEN GUI
Hi, It's the sixth patch of a series of patches optimizing CC modes on rs6000. This patch splits setcc to two separate insns after reload so that other insns can be inserted between them. It should increase the parallelism. The rotate_cr pattern still needs the info of the number of cr

[PATCH-5, rs6000] Replace explicit CC bit reverse with common format

2024-05-06 Thread HAO CHEN GUI
Hi, It's the fifth patch of a series of patches optimizing CC modes on rs6000. There are some explicit CR6 bit reverse (mfcr/xor) expand in vector.md. As the forth patch optimized CC bit reverse implement, the patch changes the explicit format to the common format (testing if the bit is not

[PATCH-4, rs6000] Optimize single cc bit reverse implementation

2024-04-30 Thread HAO CHEN GUI
Hi, It's the forth patch of a series of patches optimizing CC modes on rs6000. The single CC bit reverse can be implemented by setbcr on Power10 or isel on Power9 or mfcr on Power8 and below. Originally CCFP is not supported for isel and setbcr as bcd insns use CCFP and its bit reverse is not

[PATCH-3, rs6000] Set CC mode of vector string isolate insns to CCEQ

2024-04-30 Thread HAO CHEN GUI
Hi, It's the third patch of a series of patches optimizing CC modes on rs6000. This patch sets CC mode of vector string isolate insns to CCEQ instead of CCFP as these insns only set/check CR bit 2. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for the

[PATCH-2, rs6000] Add a new type of CC mode - CCLTEQ

2024-04-30 Thread HAO CHEN GUI
Hi, It's the second patch of a series of patches optimizing CC modes on rs6000. This patch adds a new type of CC mode - CCLTEQ used for the case which only set CR bit 0 and 2. The bit 1 and 3 are not used. The vector compare and test data class instructions are the cases. Bootstrapped and

[PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-04-30 Thread HAO CHEN GUI
Hi, It's the first patch of a series of patches optimizing CC modes on rs6000. bcd insns set all four bits of a CR field. But it has different single bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used to indict overflow or invalid number. It's not a bit for unordered

Re: [PATCH] Value range: Add range op for __builtin_isfinite

2024-04-23 Thread HAO CHEN GUI
Yes, it's my typo. Thanks. Gui Haochen 在 2024/4/23 17:10, rep.dot@gmail.com 写道: > On 12 April 2024 07:30:10 CEST, HAO CHEN GUI wrote: > > >> >> >> patch.diff >> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc >> index 9de130b4022..99c51

[PATCH, rs6000] Use bcdsub. instead of bcdadd. for bcd invalid number checking

2024-04-17 Thread HAO CHEN GUI
Hi, This patch replace bcdadd. with bcdsub. for bcd invalid number checking. bcdadd on two same numbers might cause overflow which also set overflow/invalid bit so that we can't distinguish it's invalid or overflow. The bcdsub doesn't have the problem as subtracting on two same number never

[PATCH, rs6000] Fix test case bcd4.c

2024-04-16 Thread HAO CHEN GUI
Hi, This patch fixes loss of return statement in maxbcd of bcd-4.c. Without return statement, it returns an invalid bcd number and make the test noneffective. The patch also enables test to run on Power9 and Big Endian, as all bcd instructions are supported from Power9. Bootstrapped and

[PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-04-12 Thread HAO CHEN GUI
Hi, This patch implemented optab_isnormal for SF/DF/TFmode by rs6000 test data class instructions. This patch relies on former patch which adds optab_isnormal. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html Bootstrapped and tested on powerpc64-linux BE and LE with no

[PATCH] Optab: add isnormal_optab for __builtin_isnormal

2024-04-12 Thread HAO CHEN GUI
Hi, This patch adds an optab for __builtin_isnormal. The normal check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Bootstrapped and tested on x86

[PATCH-3] Builtin: Fold builtin_isfinite on IBM long double to builtin_isfinite on double [PR97786]

2024-04-12 Thread HAO CHEN GUI
Hi, This patch folds builtin_isfinite on IBM long double to builtin_isfinite on double type. The former patch https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649346.html implemented the DFmode isfinite_optab. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it

[PATCH-2, rs6000] Implement optab_isfinite for SFmode, DFmode and TFmode [PR97786]

2024-04-12 Thread HAO CHEN GUI
Hi, This patch implemented optab_finite for SF/DF/TFmode by rs6000 test data class instructions. This patch relies on former patch which adds optab_finite. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html Bootstrapped and tested on powerpc64-linux BE and LE with no

[PATCH] Value range: Add range op for __builtin_isfinite

2024-04-11 Thread HAO CHEN GUI
Hi, The former patch adds isfinite optab for __builtin_isfinite. https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649339.html Thus the builtin might not be folded at front end. The range op for isfinite is needed for value range analysis. This patch adds them. Bootstrapped and tested

[PATCH] Optab: add isfinite_optab for __builtin_isfinite

2024-04-11 Thread HAO CHEN GUI
Hi, This patch adds an optab for __builtin_isfinite. The finite check can be implemented on rs6000 by a single instruction. It needs an optab to be expanded to the certain sequence of instructions. The subsequent patches will implement the expand on rs6000. Bootstrapped and tested on x86

[Patch] Builtin: Fold builtin_isinf on IBM long double to builtin_isinf on double [PR97786]

2024-03-27 Thread HAO CHEN GUI
Hi, This patch folds builtin_isinf on IBM long double to builtin_isinf on double type. The former patch https://gcc.gnu.org/pipermail/gcc-patches/2024-March/648304.html implemented the DFmode isinf_optab. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for

[patch, rs6000] Implement optab_isinf for SFmode, DFmode and TFmode [PR97786]

2024-03-24 Thread HAO CHEN GUI
Hi, This patch implemented optab_isinf for SF/DF/TFmode by rs6000 test data class instructions. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Is it OK for next stage 1? Thanks Gui Haochen ChangeLog rs6000: Implement optab_isinf for SFmode, DFmode and TFmode gcc/

[PATCH] Value Range: Add range op for builtin isinf

2024-03-24 Thread HAO CHEN GUI
Hi, The builtin isinf is not folded at front end if the corresponding optab exists. It causes the range evaluation failed on the targets which has optab_isinf. For instance, range-sincos.c will fail on the targets which has optab_isinf as it calls builtin_isinf. This patch fixed the problem

Re: [PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-18 Thread HAO CHEN GUI
Hi, Gently ping this: https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647533.html Thanks Gui Haochen 在 2024/3/11 13:41, HAO CHEN GUI 写道: > Hi, > This patch tries to fix the problem when a canonical form doesn't benefit > on a specific target. The cons

[PATCH, RFC] combine: Don't truncate const operand of AND if it's no benefits

2024-03-10 Thread HAO CHEN GUI
Hi, This patch tries to fix the problem when a canonical form doesn't benefit on a specific target. The const operand of AND is and with the nonzero bits of another operand in combine pass. It's a canonical form, but it's no benefits for the target which has rotate and mask insns. As the mask is

[PATCHv2, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-03-08 Thread HAO CHEN GUI
Hi, This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an out AND. It matches a DImode rotate and mask insert on rs6000. Trying 2 -> 7: 2: r122:DI=r129:DI REG_DEAD r129:DI

[PATCHv2] fwprop: Avoid volatile defines to be propagated

2024-03-04 Thread HAO CHEN GUI
Hi, This patch tries to fix a potential problem which is raised by the patch for PR111267. The volatile asm operand tries to be propagated to a single set insn with the patch for PR111267. The volatile asm operand might be executed for multiple times if the define insn isn't eliminated after

Re: [PATCH] fwprop: Avoid volatile defines to be propagated

2024-03-04 Thread HAO CHEN GUI
Hi Jeff, 在 2024/3/4 11:37, Jeff Law 写道: > Can the same thing happen with a volatile memory load?  I don't think that  > will be caught by the volatile_insn_p check. Yes, I think so. If the define rtx contains volatile memory references, it may hit the same problem. We may use volatile_refs_p

Re: [PATCH] fwprop: Avoid volatile defines to be propagated

2024-03-03 Thread HAO CHEN GUI
Hi Jeff, Thanks for your comments. 在 2024/3/4 6:02, Jeff Law 写道: > Why specifically are you worried here?  Propagation of a volatile shouldn't > in and of itself cause a problem.  We're not changing the number of volatile > accesses or anything like that -- we're just moving them around a 

[PATCH, rs6000] Add subreg patterns for SImode rotate and mask insert

2024-02-29 Thread HAO CHEN GUI
Hi, This patch fixes regression cases in gcc.target/powerpc/rlwimi-2.c. In combine pass, SImode (subreg from DImode) lshiftrt is converted to DImode lshiftrt with an out AND. It matches a DImode rotate and mask insert on rs6000. Trying 2 -> 7: 2: r122:DI=r129:DI REG_DEAD r129:DI

[PATCH] fwprop: Avoid volatile defines to be propagated

2024-02-25 Thread HAO CHEN GUI
Hi, This patch tries to fix a potential problem which is raised by the patch for PR111267. The volatile asm operand tries to be propagated to a single set insn with the patch for PR111267. It has potential risk as the behavior is wrong. Currently set_src_cost comparison can reject such

[Patch, rs6000] Enable overlap memory store for block memory clear

2024-02-25 Thread HAO CHEN GUI
Hi, This patch enables overlap memory store for block memory clear which saves the number of store instructions. The expander calls widest_fixed_size_mode_for_block_clear to get the mode for looped block clear and calls widest_fixed_size_mode_for_block_clear to get the mode for last overlapped

[Patch-2, rs6000] Eliminate unnecessary byte swaps for duplicated constant vector store [PR113325]

2024-01-25 Thread HAO CHEN GUI
Hi, This patch creates an insn_and_split pattern which helps the duplicated constant vector replace the source pseudo of store insn in fwprop pass. Thus the store can be implemented by a single stxvd2x and it eliminates the unnecessary byte swap insn on P8 LE. The test case shows the

[PATCH-1] fwprop: Replace rtx_cost with insn_cost in try_fwprop_subst_pattern [PR113325]

2024-01-25 Thread HAO CHEN GUI
Hi, This patch replaces rtx_cost with insn_cost in forward propagation. In the PR, one constant vector should be propagated and replace a pseudo in a store insn if we know it's a duplicated constant vector. It reduces the insn cost but not rtx cost. In this case, the kind of destination operand

[PATCH, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-01-15 Thread HAO CHEN GUI
Hi, This patch adds const0 move checking for CLEAR_BY_PIECES. The original vec_duplicate handles duplicates of non-constant inputs. But 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move by that mode. The test cases

Re: [PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-15 Thread HAO CHEN GUI
Hi Kewen, 在 2024/1/15 14:16, Kewen.Lin 写道: > Considering it's stage 4 now and the impact of this patch, let's defer > this to next stage 1, if possible could you organize the above changes > into patches: > > 1) Refactor expand_compare_loop by splitting into two functions without >any

[PATCH, rs6000] Enable block compare expand on P9 with m32 and mpowerpc64

2024-01-11 Thread HAO CHEN GUI
Hi, On P9 "setb" is used to set the result of block compare. So it works with m32 and mpowerpc64. On P8, carry bit is used. So it can't work with m32 and mpowerpc64. This patch enables block compare expand for m32 and mpowerpc64 on P9. Bootstrapped and tested on x86 and powerpc64-linux BE and

Re: [Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-11 Thread HAO CHEN GUI
Hi Richard, Thanks so much for your comments. >> patch.diff >> diff --git a/gcc/config/rs6000/rs6000-string.cc >> b/gcc/config/rs6000/rs6000-string.cc >> index 7f777666ba9..4c9b2cbeefc 100644 >> --- a/gcc/config/rs6000/rs6000-string.cc >> +++ b/gcc/config/rs6000/rs6000-string.cc >> @@ -140,7

[Patch, rs6000] Eliminate unnecessary byte swaps for block clear on P8 LE [PR113325]

2024-01-11 Thread HAO CHEN GUI
Hi, This patch eliminates unnecessary byte swaps for block clear on P8 LE. For block clear, all the bytes are set to zero. The byte order doesn't make sense. So the alignment of destination could be set to the store mode size in stead of 1 byte in order to eliminates unnecessary byte swap

[PATCH, rs6000] Refactor expand_compare_loop and split it to two functions

2024-01-09 Thread HAO CHEN GUI
Hi, This patch refactors function expand_compare_loop and split it to two functions. One is for fixed length and another is for variable length. These two functions share some low level common help functions. Besides above changes, the patch also does: 1. Don't generate load and compare loop

[Patchv3, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-20 Thread HAO CHEN GUI
Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by targetm.slow_unaligned_access. On P7 BE, the

[Patch, rs6000] Call library for block memory compare when optimizing for size

2023-12-20 Thread HAO CHEN GUI
Hi, This patch call library function for block memory compare when it's optimized for size. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Call library for block memory compare when optimizing for

[Patchv3, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-20 Thread HAO CHEN GUI
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of slow_unaligned_access. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640832.html the main change is to pass alignment measured by bits to

[Patchv2, rs6000] Clean up pre-checkings of expand_block_compare

2023-12-17 Thread HAO CHEN GUI
Hi, This patch cleans up pre-checkings of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Return false when optimizing for size. 3. Remove P7 processor test as only P7 above can enter this function and P7 LE is excluded by

[Patchv2, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-17 Thread HAO CHEN GUI
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and replace it with the call of slow_unaligned_access. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640076.html the main change is to replace the macro with

[Patch, rs6000] Clean up pre-checking of expand_block_compare

2023-12-10 Thread HAO CHEN GUI
Hi, This patch cleans up pre-checking of expand_block_compare. It does 1. Assert only P7 above can enter this function as it's already guard by the expand. 2. Return false when optimizing for size. 3. Remove P7 CPU test as only P7 above can enter this function and P7 LE is excluded by

[Patch, rs6000] Correct definition of macro of fixed point efficient unaligned

2023-12-10 Thread HAO CHEN GUI
Hi, The patch corrects the definition of TARGET_EFFICIENT_OVERLAPPING_UNALIGNED and change its name to a comprehensible name. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: Correct definition of

Re: [patch-2v3, rs6000] Guard fctid on PowerPC64 and PowerPC476 [PR112707]

2023-12-07 Thread HAO CHEN GUI
Hi, The "fctid" is supported on 64-bit Power processors and PowerPC476. It need a guard to check it. The patch fixes the issue. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639536.html the main change is to change the target requirement in pr88558*.c.

[patch-2v2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]

2023-12-06 Thread HAO CHEN GUI
Hi, The "fctid" is supported on 64-bit Power processors and powerpc 476. It need a guard to check it. The patch fixes the issue. Compared with last version, https://gcc.gnu.org/pipermail/gcc-patches/2023-December/638859.html the main change is to define TARGET_FCTID to POWERPC64 or PPC476.

[patch-1v2, rs6000] enable fctiw on old archs [PR112707]

2023-12-06 Thread HAO CHEN GUI
Hi, SImode in float register is supported on P7 above. It causes "fctiw" can't be generated on old 32-bit processors as the output operand of fctiw insn is an SImode in float/double register. This patch fixes the problem by adding one expand and one insn pattern for fctiw. The output of new

[patch-1, rs6000] enable fctiw on old archs [PR112707]

2023-11-30 Thread HAO CHEN GUI
Hi, SImode in float register is supported on P7 above. It causes "fctiw" can be generated on old 32-bit processors as the output operand of fctiw insn is a SImode in float/double register. This patch fixes the problem by adding an expand and an insn pattern for fctiw. The output of new pattern

[patch-2, rs6000] guard fctid on PPC64 and powerpc 476 [PR112707]

2023-11-30 Thread HAO CHEN GUI
Hi, The "fctid" is supported on 64-bit Power processors and powerpc 476. It need a guard to check it. The patch fixes the issue. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is this OK for trunk? Thanks Gui Haochen ChangeLog rs6000: guard fctid on PPC64

[PATCH] Expand: Pass down equality only flag to cmpmem expand

2023-11-27 Thread HAO CHEN GUI
Hi, This patch passes down the equality only flags from emit_block_cmp_hints to cmpmem optab so that the target specific expand can generate optimized insns for equality only compare. Targets (e.g. rs6000) can generate more efficient insn sequence if the block compare is equality only.

[PATCHv2] Clean up by_pieces_ninsns

2023-11-22 Thread HAO CHEN GUI
Hi, This patch cleans up by_pieces_ninsns and does following things. 1. Do the length and alignment adjustment for by pieces compare when overlap operation is enabled. 2. Replace unnecessary mov_optab checks with gcc assertions. Compared to last version, the main change is to replace

[PATCH] Clean up by_pieces_ninsns

2023-11-14 Thread HAO CHEN GUI
Hi, This patch cleans up by_pieces_ninsns and does following things. 1. Do the length and alignment adjustment for by pieces compare when overlap operation is enabled. 2. Remove unnecessary mov_optab checks. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is

[PATCH] Clean up

2023-11-14 Thread HAO CHEN GUI
Hi, This patch cleans up by_pieces_ninsns and does following things. 1. Do the length and alignment adjustment for by pieces compare when overlap operation is enabled. 2. Remove unnecessary mov_optab checks. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions. Is

Re: Fwd: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-13 Thread HAO CHEN GUI
Sorry, forgot to cc gcc-patches. 在 2023/11/13 16:05, HAO CHEN GUI 写道: > Andrew, > Could you kindly inform us what's the functionality of __objc_forward? > Does it change the memory content pointed by args? Thanks a lot. > > Thanks > Gui Haochen > > > libobjc/se

Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-10 Thread HAO CHEN GUI
Hi Richard, 在 2023/11/10 17:06, Richard Biener 写道: > On Fri, Nov 10, 2023 at 8:52 AM HAO CHEN GUI wrote: >> >> Hi Richard, >> Thanks so much for your comments. >> >> 在 2023/11/9 19:41, Richard Biener 写道: >>> I'm not sure if the testcase

[PATCH-3v4, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]

2023-11-10 Thread HAO CHEN GUI
Hi, Originally 16-byte memory to memory is expanded via pattern. expand_block_move does an optimization on P8 LE to leverage V2DI reversed load/store for memory to memory move. Now it's done by 16-byte by pieces move and the optimization is lost. This patch adds an insn_and_split pattern to

Re: [PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-09 Thread HAO CHEN GUI
Hi Richard, Thanks so much for your comments. 在 2023/11/9 19:41, Richard Biener 写道: > I'm not sure if the testcase is valid though? > > @defbuiltin{{void} __builtin_return (void *@var{result})} > This built-in function returns the value described by @var{result} from > the containing function.

[PATCH, expand] Call misaligned memory reference in expand_builtin_return [PR112417]

2023-11-08 Thread HAO CHEN GUI
Hi, This patch modifies expand_builtin_return and make it call expand_misaligned_mem_ref to load unaligned memory. The memory reference pointed by void* pointer might be unaligned, so expanding it with unaligned move optabs is safe. The new test case illustrates the problem. rs6000 doesn't

[PATCH-3v3, rs6000] Fix regression cases caused 16-byte by pieces move [PR111449]

2023-11-08 Thread HAO CHEN GUI
Hi, Originally 16-byte memory to memory is expanded via pattern. expand_block_move does an optimization on P8 LE to leverage V2DI reversed load/store for memory to memory move. Now it's done by 16-byte by pieces move and the optimization is lost. This patch adds an insn_and_split pattern to

[PATCH-2v2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-08 Thread HAO CHEN GUI
Hi, This patch enables vector mode for by pieces equality compare. It adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare relies both move and compare instructions, so both macro are changed. As the vector

Re: [PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-06 Thread HAO CHEN GUI
Hi Kewen, Thanks for your review comments. Just one question on following comment. 在 2023/11/7 10:40, Kewen.Lin 写道: > Nit: has_arch_pwr8 would make it un-tested on Power7 default env, I'd prefer > to remove this "has_arch_pwr8" and append "-mdejagnu-cpu=power8" to > dg-options. My original

[PATCH-3v2, rs6000] Enable 16-byte by pieces move [PR111449]

2023-11-06 Thread HAO CHEN GUI
Hi, The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes the regression cases caused by previous patch. For sra-17/18, the long array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit platform. So the array is not be constructed in LC0 and SRA optimization

[PATCH-3, rs6000] Enable 16-byte by pieces move [PR111449]

2023-11-05 Thread HAO CHEN GUI
Hi, The patch 2 enables 16-byte by pieces move on rs6000. This patch fixes the regression cases caused by previous patch. For sra-17/18, the long array with 4 elements can be loaded by one 16-byte by pieces move on 32-bit platform. So the array is not be constructed in LC0 and SRA optimization

[PATCH-2, rs6000] Enable vector mode for by pieces equality compare [PR111449]

2023-11-05 Thread HAO CHEN GUI
Hi, This patch enables vector mode for by pieces equality compare. It adds a new expand pattern - cbrnachv16qi4 and set MOVE_MAX_PIECES and COMPARE_MAX_PIECES to 16 bytes when P8 vector enabled. The compare relies both move and compare instructions, so both macro are changed. The vector

Re: [PATCH, expand] Checking available optabs for scalar modes in by pieces operations

2023-10-29 Thread HAO CHEN GUI
Committed as r14-5001. Thanks Gui Haochen 在 2023/10/27 17:29, Richard Sandiford 写道: > HAO CHEN GUI writes: >> Hi, >> This patch checks available optabs for scalar modes used in by >> pieces operations. It fixes the regression cases caused by previous >> patch. Now b

[PATCH, expand] Checking available optabs for scalar modes in by pieces operations

2023-10-27 Thread HAO CHEN GUI
Hi, This patch checks available optabs for scalar modes used in by pieces operations. It fixes the regression cases caused by previous patch. Now both scalar and vector modes are examined by the same approach. Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no regressions.

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-25 Thread HAO CHEN GUI
Hi Haochen, The regression cases are caused by "targetm.scalar_mode_supported_p" I added for scalar mode checking. XImode, OImode and TImode (with -m32) are not enabled in ix86_scalar_mode_supported_p. So they're excluded from by pieces operations on i386. The original code doesn't do a check

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-24 Thread HAO CHEN GUI
; Sent: Tuesday, October 24, 2023 4:43 PM > To: HAO CHEN GUI ; Richard Sandiford > > Cc: gcc-patches > Subject: RE: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces > [PR111449] > > Hi Haochen Gui, > > It seems that the commit caused lots of test case fai

Re: [PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-22 Thread HAO CHEN GUI
Committed as r14-4835. https://gcc.gnu.org/g:f08ca5903c7a02b450b93143467f70b9fd8e0085 Thanks Gui Haochen 在 2023/10/20 16:49, Richard Sandiford 写道: > HAO CHEN GUI writes: >> Hi, >> Vector mode instructions are efficient for compare on some targets. >> This patc

[PATCH-1v4, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-20 Thread HAO CHEN GUI
Hi, Vector mode instructions are efficient for compare on some targets. This patch enables vector mode for compare_by_pieces. Two help functions are added to check if vector mode is available for certain by pieces operations and if if optabs exists for the mode and certain by pieces operations.

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-19 Thread HAO CHEN GUI
Kewen & David, Thanks for your comments. 在 2023/10/17 10:19, Kewen.Lin 写道: > I think David raised a good question, it sounds to me that the current > handling simply consider that if MOVE_MAX_PIECES is set to 16, the > required operations for this optimization on TImode are always available, >

PATCH-1v3, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-13 Thread HAO CHEN GUI
Hi, Vector mode instructions are efficient for compare on some targets. This patch enables vector mode for compare_by_pieces. Currently, vector mode is enabled for compare, set and clear. Helper function "qi_vector_p" decides if vector mode is enabled for certain by pieces operation.

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-11 Thread HAO CHEN GUI
Hi David, 在 2023/10/10 20:44, David Edelsohn 写道: > Are you stating that although PPC32 supports V16QImode in VSX, the > move_by_pieces support also requires TImode, which is not available on PPC32? > Yes. By setting MOVE_MAX_PIECES to 16, TImode compare might be generated as it checks vector

[PATCH-2v2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-11 Thread HAO CHEN GUI
Hi, This patch enables vector mode for memory equality compare by adding a new expand cbranchv16qi4 and implementing it. Also the corresponding CC reg and compare code is set in rs6000_generate_compare. With the patch, 16-byte equality compare can be implemented by one vector compare

[PATCH-1v2, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-11 Thread HAO CHEN GUI
Hi, Vector mode instructions are efficient on some targets (e.g. ppc64). This patch enables vector mode for compare_by_pieces. The non-member function widest_fixed_size_mode_for_size takes by_pieces_operation as the second argument and decide whether vector mode is enabled or not by the type of

  1   2   3   4   5   >