Re: [PATCH] libbid: Set rounding mode to round-to-nearest for _Decimal128 arithmetic

2025-08-31 Thread Hongtao Liu
On Sun, Aug 31, 2025 at 11:15 AM H.J. Lu wrote: > > Since _Decimal128 arithmetic requires the round-to-nearest rounding > mode, define DFP_INIT_ROUNDMODE and DFP_RESTORE_ROUNDMODE, similar to > FP_INIT_ROUNDMODE in sfp-machine.h, to set the rounding mode to > round-to-nearest at _Decimal128 relate

Re: [PATCH v2] x86-64: Improve source operand check for TLS_CALL

2025-08-28 Thread Hongtao Liu
On Thu, Aug 28, 2025 at 10:15 AM H.J. Lu wrote: > > Source operands of 2 TLS_CALL patterns in > > (insn 10 9 11 3 (set (reg:DI 100) > (unspec:DI [ > (symbol_ref:DI ("caml_state") [flags 0x10] 0x7fe10e1d9e40 caml_state>) > ] UNSPEC_TLSDESC)) "x.c":7:16 1674 {*t

Re: [PATCH] Restrict avx256_avoid_vec_perm only for loop vectorization.

2025-08-27 Thread Hongtao Liu
On Wed, Aug 27, 2025 at 4:53 PM Richard Biener wrote: > > On Wed, Aug 27, 2025 at 6:57 AM liuhongt wrote: > > > > Since kind == vec_perm may not be a real vec_perm, just a broadcast or > > simple load in BB vectorizer. > > Btw, you can now (in some cases) do better, namely you should > always hav

Re: [PATCH] x86-64: Better compare source operands of *tls_dynamic_gnu2_call_64_di

2025-08-26 Thread Hongtao Liu
On Wed, Aug 27, 2025 at 6:32 AM H.J. Lu wrote: > > Source operands of 2 *tls_dynamic_gnu2_call_64_di patterns in > > (insn 10 9 11 3 (set (reg:DI 100) > (unspec:DI [ > (symbol_ref:DI ("caml_state") [flags 0x10] 0x7fe10e1d9e40 caml_state>) > ] UNSPEC_TLSDESC))

Re: [PATCH] x86-64: Emit the TLS call after debug marker

2025-08-26 Thread Hongtao Liu
On Wed, Aug 27, 2025 at 11:59 AM H.J. Lu wrote: > > On Tue, Aug 26, 2025 at 8:50 PM Hongtao Liu wrote: > > > > On Wed, Aug 27, 2025 at 6:30 AM H.J. Lu wrote: > > > > > > For a basic block with only a debug marker: > > > > > > (note 3 0

Re: [PATCH] x86-64: Emit the TLS call after debug marker

2025-08-26 Thread Hongtao Liu
On Wed, Aug 27, 2025 at 6:30 AM H.J. Lu wrote: > > For a basic block with only a debug marker: > > (note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > (note 2 3 5 2 NOTE_INSN_FUNCTION_BEG) > (debug_insn 5 2 16 2 (debug_marker) "x.c":6:3 -1 (nil)) > > emit the TLS call after debug marker. > > gcc/ > >

Re: [PATCH v2] [x86] Enable unroll in the vectorizer when there's reduction for FMA/DOT_PROD_EXPR/SAD_EXPR

2025-08-26 Thread Hongtao Liu
On Tue, Aug 26, 2025 at 5:31 PM Jan Hubicka wrote: > > > > > > > In general we should have a look at register pressure, I > > > > > > suppose issue_rate / m_num_reductions ensures we're never > > > > > > getting close to this in practice. > > > > > > > > > > Bootstrapped and regtested on x86_64-pc

Re: [PATCH v2] [x86] Enable unroll in the vectorizer when there's reduction for FMA/DOT_PROD_EXPR/SAD_EXPR

2025-08-25 Thread Hongtao Liu
On Mon, Aug 18, 2025 at 4:52 PM Hongtao Liu wrote: > > On Mon, Aug 11, 2025 at 8:57 PM Richard Biener wrote: > > > > On Sun, 10 Aug 2025, liuhongt wrote: > > > > > > > > > > The comment doesn't match the bool type. > > > > >

Re: [PATCH] Fix an ICE with recent GFNI changes

2025-08-25 Thread Hongtao Liu
On Tue, Aug 26, 2025 at 9:38 AM Andi Kleen wrote: > > From: Andi Kleen > > Make the expand pattern for operand 1 match the final instruction. > > PR 121658 > > gcc/ChangeLog: > > * config/i386/sse.md ("3"): Use > register_operand for rotate patterns. > > gcc/testsuite/Chan

Re: [PATCH] i386: Fix up recent changes to use GFNI for rotates/shifts [PR121658]

2025-08-25 Thread Hongtao Liu
On Tue, Aug 26, 2025 at 6:40 AM Jakub Jelinek wrote: > > Hi! > > The vgf2p8affineqb_ pattern uses "register_operand" > predicate for the first input operand, so using "general_operand" > for the rotate operand passed to it leads to ICEs, and so does > the "nonimmediate_operand" in the v16qi3 defin

Re: [PATCH v3] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-23 Thread Hongtao Liu
On Sat, Aug 23, 2025 at 1:34 AM Andi Kleen wrote: > > From: Andi Kleen > > [v3 version: Remove unnecessary _mask pattern. > Add extra FAIL case. Remove unnecessary AVX512F check. > Fix changelog.] > > [v2 version: Split rotate patterns in V16QI and V32/64QI. > Add various AVX512F checks. Remove s

Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-23 Thread Hongtao Liu
On Fri, Aug 22, 2025 at 11:26 PM Andi Kleen wrote: > > > > + else if (TARGET_GFNI && TARGET_AVX512F && CONST_INT_P (operands[2])) > > I don't think we need AVX512F here, and let's exclude >>7 cases here, > > so better be. > > else if (TARGET_GFNI > > && CONST_INT_P (operands[2]) > >

Re: [PATCH v2] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-21 Thread Hongtao Liu
On Wed, Aug 20, 2025 at 11:08 PM Andi Kleen wrote: > > From: Andi Kleen > > [v2 version: Split rotate patterns in V16QI and V32/64QI. > Add various AVX512F checks. Remove some unnecessary > masks. Add untested cond_ pattern (untested, couldn't trigger it) > Clean up some control flow. Use narrowe

Re: [PATCH] x86-64: Emit the TLS call after NOTE_INSN_BASIC_BLOCK

2025-08-20 Thread Hongtao Liu
On Thu, Aug 21, 2025 at 3:46 AM H.J. Lu wrote: > > For a basic block with only a label: > > (code_label 78 11 77 3 14 (nil) [1 uses]) > (note 77 78 54 3 [bb 3] NOTE_INSN_BASIC_BLOCK) > > emit the TLS call after NOTE_INSN_BASIC_BLOCK, instead of before > NOTE_INSN_BASIC_BLOCK, to avoid > > x.c: In

Re: [PATCH v4] x86: Place the TLS call before all register setting BBs

2025-08-19 Thread Hongtao Liu
On Wed, Aug 20, 2025 at 2:49 AM H.J. Lu wrote: > > We can't place a TLS call before a conditional jump in a basic block like > > (code_label 13 11 14 4 2 (nil) [1 uses]) > (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > (jump_insn 16 14 17 4 (set (pc) > (if_then_else (le (reg:CCNO 17 flag

Re: [PATCH v3] x86: Place the TLS call before all register setting BBs

2025-08-18 Thread Hongtao Liu
On Tue, Aug 19, 2025 at 10:51 AM H.J. Lu wrote: > > We can't place a TLS call before a conditional jump in a basic block like > > (code_label 13 11 14 4 2 (nil) [1 uses]) > (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > (jump_insn 16 14 17 4 (set (pc) > (if_then_else (le (reg:CCNO 17 fla

Re: [PATCH v2] x86: Place the TLS call before all register setting BBs

2025-08-18 Thread Hongtao Liu
On Tue, Aug 19, 2025 at 10:55 AM H.J. Lu wrote: > > On Mon, Aug 18, 2025 at 6:56 PM Hongtao Liu wrote: > > > > On Tue, Aug 19, 2025 at 4:40 AM H.J. Lu wrote: > > > > > > On Mon, Aug 18, 2025 at 12:59 AM Hongtao Liu wrote: > > > > > >

Re: [PATCH v2] x86: Place the TLS call before all register setting BBs

2025-08-18 Thread Hongtao Liu
On Tue, Aug 19, 2025 at 4:40 AM H.J. Lu wrote: > > On Mon, Aug 18, 2025 at 12:59 AM Hongtao Liu wrote: > > > > On Mon, Aug 18, 2025 at 4:12 PM Hongtao Liu wrote: > > > > > > On Mon, Aug 18, 2025 at 4:50 AM H.J. Lu wrote: > > > > > > > &

Re: [PATCH v2] [x86] Enable unroll in the vectorizer when there's reduction for FMA/DOT_PROD_EXPR/SAD_EXPR

2025-08-18 Thread Hongtao Liu
On Mon, Aug 11, 2025 at 8:57 PM Richard Biener wrote: > > On Sun, 10 Aug 2025, liuhongt wrote: > > > > > > > The comment doesn't match the bool type. > > > > > Fixed. > > > > > > > > is_gimple_assign (stmt_info->stmt) > > > > > Changed. > > > > > There's also SAD_EXPR? The vectorizer has lane_red

Re: [PATCH v2] x86: Place the TLS call before all register setting BBs

2025-08-18 Thread Hongtao Liu
On Mon, Aug 18, 2025 at 4:12 PM Hongtao Liu wrote: > > On Mon, Aug 18, 2025 at 4:50 AM H.J. Lu wrote: > > > > We can't place a TLS call before a conditional jump in a basic block like > > > > (code_label 13 11 14 4 2 (nil) [1 uses]) > > (not

Re: [PATCH v2] x86: Place the TLS call before all register setting BBs

2025-08-18 Thread Hongtao Liu
On Mon, Aug 18, 2025 at 4:50 AM H.J. Lu wrote: > > We can't place a TLS call before a conditional jump in a basic block like > > (code_label 13 11 14 4 2 (nil) [1 uses]) > (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK) > (jump_insn 16 14 17 4 (set (pc) > (if_then_else (le (reg:CCNO 17 flag

Re: [PATCH v2] x86: Add target("80387") function attribute

2025-08-16 Thread Hongtao Liu
On Sat, Aug 16, 2025 at 10:24 PM H.J. Lu wrote: > > On Sat, Aug 16, 2025 at 6:43 AM Hongtao Liu wrote: > > > > On Sat, Aug 16, 2025 at 8:45 PM Hongtao Liu wrote: > > > > > > On Fri, Aug 15, 2025 at 8:48 PM H.J. Lu wrote: > > > > > > &

Re: [PATCH v2] x86: Add target("80387") function attribute

2025-08-16 Thread Hongtao Liu
On Sat, Aug 16, 2025 at 8:45 PM Hongtao Liu wrote: > > On Fri, Aug 15, 2025 at 8:48 PM H.J. Lu wrote: > > > > On Fri, Aug 15, 2025 at 12:44 AM Hongtao Liu wrote: > > > > > > On Fri, Aug 15, 2025 at 10:07 AM H.J. Lu wrote: > > > > > > >

Re: [PATCH v2] x86: Add target("80387") function attribute

2025-08-16 Thread Hongtao Liu
On Fri, Aug 15, 2025 at 8:48 PM H.J. Lu wrote: > > On Fri, Aug 15, 2025 at 12:44 AM Hongtao Liu wrote: > > > > On Fri, Aug 15, 2025 at 10:07 AM H.J. Lu wrote: > > > > > > Add target("80387") attribute to enable and disable x87 ins

Re: [PATCH v2] x86: Add target("80387") function attribute

2025-08-15 Thread Hongtao Liu
On Fri, Aug 15, 2025 at 10:07 AM H.J. Lu wrote: > > Add target("80387") attribute to enable and disable x87 instructions in a > function. > > gcc/ > > PR target/121541 > * config/i386/i386-options.cc > (ix86_valid_target_attribute_inner_p): Add target("80387") > att

Re: [PATCH] x86: Disallow MMX and 80387 in no_caller_saved_registers function

2025-08-13 Thread Hongtao Liu
On Thu, Aug 14, 2025 at 9:22 AM H.J. Lu wrote: > > commit 9804b23198b39f85a7258be556c5e8aed44b9efc > Author: H.J. Lu > Date: Sun Apr 13 11:38:24 2025 -0700 > > x86: Add preserve_none and update no_caller_saved_registers attributes > > allowed MMX/80387 instructions in functions with no_call

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-13 Thread Hongtao Liu
On Wed, Aug 13, 2025 at 2:30 PM Hongtao Liu wrote: > > On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote: > > > > From: Andi Kleen > > > > The GFNI AVX gf2p8affineqb instruction can be used to implement > > vectorized byte shifts or rotates. This patch uses th

Re: [PATCH v3] x86-64: Remove redundant TLS calls

2025-08-13 Thread Hongtao Liu
On Wed, Aug 13, 2025 at 2:35 PM Hongtao Liu wrote: > > On Tue, Aug 12, 2025 at 10:02 PM H.J. Lu wrote: > > > > On Tue, Aug 12, 2025 at 06:47:54AM -0700, H.J. Lu wrote: > > > On Mon, Aug 11, 2025 at 11:13 PM Hongtao Liu wrote: > > > > > > >

Re: [PATCH v3] x86-64: Remove redundant TLS calls

2025-08-12 Thread Hongtao Liu
On Tue, Aug 12, 2025 at 10:02 PM H.J. Lu wrote: > > On Tue, Aug 12, 2025 at 06:47:54AM -0700, H.J. Lu wrote: > > On Mon, Aug 11, 2025 at 11:13 PM Hongtao Liu wrote: > > > > > > On Mon, Aug 4, 2025 at 11:33 PM H.J. Lu wrote: > > > > > > > >

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Hongtao Liu
On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote: > > From: Andi Kleen > > The GFNI AVX gf2p8affineqb instruction can be used to implement > vectorized byte shifts or rotates. This patch uses them to implement > shift and rotate patterns to allow the vectorizer to use them. > Previously AVX couldn

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Hongtao Liu
On Wed, Aug 13, 2025 at 1:40 AM Andi Kleen wrote: > > > > > The latter takes 5 cycles, the former takes 3 cycles. > > It's pipelined however. > > > > > Do you have any microbenchmark or real workloads to show your > > optimization is better? > > Keep in mind it only uses one port vs two. > > Yes I

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Hongtao Liu
On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote: > > From: Andi Kleen > > The GFNI AVX gf2p8affineqb instruction can be used to implement > vectorized byte shifts or rotates. This patch uses them to implement > shift and rotate patterns to allow the vectorizer to use them. > Previously AVX couldn

Re: [PATCH v3] x86-64: Remove redundant TLS calls

2025-08-11 Thread Hongtao Liu
On Mon, Aug 4, 2025 at 11:33 PM H.J. Lu wrote: > > On Mon, Aug 04, 2025 at 02:57:39PM +0800, Hongtao Liu wrote: > > > > > > + rtx_insn *before = nullptr; > > > > > > + rtx_insn *after = nullptr; > > > > > > + if (insn == BB_HE

Re: [PATCH] x86: Change unsigned 32-bit immediate to signed if needed

2025-08-11 Thread Hongtao Liu
On Tue, Aug 12, 2025 at 5:14 AM H.J. Lu wrote: > > Since SImode MOV only supports signed 32-bit immediate, change unsigned > 32-bit immediate to signed if needed. > > gcc/ > > PR target/121497 > * config/i386/i386-features.cc (ix86_place_single_vector_set): > Change unsigne

Re: [PATCH v3] x86-64: Remove redundant TLS calls

2025-08-03 Thread Hongtao Liu
On Tue, Jul 29, 2025 at 9:28 AM H.J. Lu wrote: > > On Mon, Jul 28, 2025 at 01:53:08PM -0700, H.J. Lu wrote: > > On Mon, Jul 28, 2025 at 04:51:24PM +0800, Hongtao Liu wrote: > > > On Wed, Jul 23, 2025 at 8:07 AM H.J. Lu wrote: > > > > > > > > Fo

Re: [PATCH] x86: Don't hoist non all 0s/1s vector set outside of loop

2025-08-03 Thread Hongtao Liu
On Fri, Aug 1, 2025 at 8:10 PM H.J. Lu wrote: > > Don't hoist non all 0s/1s vector set outside of the loop to avoid extra > spills. The patch LGTM. > > gcc/ > > PR target/120941 > * config/i386/i386-features.cc (x86_cse_kind): Moved before > ix86_place_single_vector_set. >

Re: [PATCH] x86: Transform to "pushq $-1; popq reg" for -Oz

2025-07-29 Thread Hongtao Liu
On Wed, Jul 30, 2025 at 11:45 AM H.J. Lu wrote: > > commit 4c80062d7b8c272e2e193b8074a8440dbb4fe588 > Author: H.J. Lu > Date: Sun May 25 07:40:29 2025 +0800 > > x86: Enable *mov_(and|or) only for -Oz > > disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for > -Oz. But fo

Re: [PATCH v4] x86: Enable *mov_(and|or) only for -Oz

2025-07-28 Thread Hongtao Liu
On Mon, Jul 28, 2025 at 11:10 PM H.J. Lu wrote: > > On Thu, Jun 19, 2025 at 2:55 AM Roger Sayle > wrote: > > > > > > Looks good to me. Sorry for any inconvenience. > > Cheers, > > Roger > > > > > -Original Message- > > >

Re: [PATCH v3] x86-64: Remove redundant TLS calls

2025-07-28 Thread Hongtao Liu
On Wed, Jul 23, 2025 at 8:07 AM H.J. Lu wrote: > > For TLS calls: > > 1. UNSPEC_TLS_GD: > > (parallel [ > (set (reg:DI 0 ax) > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > (const_int 0 [0]))) > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) >

Re: [PATCH v2] x86-64: Remove redundant TLS calls

2025-07-21 Thread Hongtao Liu
On Tue, Jul 22, 2025 at 4:47 AM H.J. Lu wrote: > > For TLS calls: > > 1. UNSPEC_TLS_GD: > > (parallel [ > (set (reg:DI 0 ax) > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > (const_int 0 [0]))) > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) >

Re: [PATCH] x86-64: Remove redundant TLS calls

2025-07-20 Thread Hongtao Liu
On Thu, Jul 17, 2025 at 11:22 PM H.J. Lu wrote: > > For TLS calls: > > 1. UNSPEC_TLS_GD: > > (parallel [ > (set (reg:DI 0 ax) > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > (const_int 0 [0]))) > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) >

Re: [PATCH] x86: Don't change mode for XOR in ix86_expand_ternlog

2025-07-16 Thread Hongtao Liu
On Thu, Jul 17, 2025 at 9:43 AM H.J. Lu wrote: > > There is no need to change mode for XOR in ix86_expand_ternlog now. > Whatever reasons for it in the first place no longer exist. Tested > on x86-64 with -m32. There are no regressions. Ok. > > * config/i386/i386.cc (ix86_expand_ternlog)

Re: [PATCH] i386: Decouple AMX-AVX512 from AVX10.2 and imply AVX512F

2025-07-15 Thread Hongtao Liu
On Tue, Jul 15, 2025 at 2:36 PM Haochen Jiang wrote: > > Hi all, > > In ISE058, the AVX10.2 imply is removed from AMX-AVX512. This > leads to re-consideration on the imply for AMX-AVX512. > > Since it is using zmm register and using zmm register only, we > need to at least imply AVX512F. AVX512VL

Re: [PATCH v3] x86: Improve vector_loop/unrolled_loop for memset/memcpy

2025-07-07 Thread Hongtao Liu
On Mon, Jul 7, 2025 at 3:27 PM Hongtao Liu wrote: > > On Tue, Jun 24, 2025 at 2:11 PM H.J. Lu wrote: > > > > On Mon, Jun 23, 2025 at 2:24 PM H.J. Lu wrote: > > > > > > On Wed, Jun 18, 2025 at 3:17 PM H.J. Lu wrote: > > > > > > > >

Re: [PATCH v3] x86: Improve vector_loop/unrolled_loop for memset/memcpy

2025-07-07 Thread Hongtao Liu
On Tue, Jun 24, 2025 at 2:11 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 2:24 PM H.J. Lu wrote: > > > > On Wed, Jun 18, 2025 at 3:17 PM H.J. Lu wrote: > > > > > > 1. Don't generate the loop if the loop count is 1. > > > 2. For memset with vector on small size, use vector if small size supports

Re: [PATCH 2/2] add masked-epilogue tuning

2025-07-07 Thread Hongtao Liu
On Mon, Jul 7, 2025 at 3:18 PM Hongtao Liu wrote: > > On Fri, Jul 4, 2025 at 5:45 PM Richard Biener wrote: > > > > The following adds a x86 tuning to enable the use of AVX512 masked > > epilogues in cases we heuristically determine it to be not detrimental > &

Re: [PATCH 2/2] add masked-epilogue tuning

2025-07-07 Thread Hongtao Liu
On Fri, Jul 4, 2025 at 5:45 PM Richard Biener wrote: > > The following adds a x86 tuning to enable the use of AVX512 masked > epilogues in cases we heuristically determine it to be not detrimental > by high chance. Basically problematic cases are when there are > data streams that are both stored

Re: [PATCH v2] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Mon, Jun 30, 2025 at 11:46 AM H.J. Lu wrote: > > On Mon, Jun 30, 2025 at 11:17 AM H.J. Lu wrote: > > > > On Mon, Jun 30, 2025 at 10:41 AM Hongtao Liu wrote: > > > > > > On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote: > > > > > &

Re: [PATCH] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Mon, Jun 30, 2025 at 11:16 AM H.J. Lu wrote: > > On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote: > > > > On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote: > > > > > > Update functions with no_callee_saved_registers/preserve_none attribute > > > t

Re: [PATCH] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote: > > Update functions with no_callee_saved_registers/preserve_none attribute > to preserve frame pointer since caller may use it to save the current > stack: > > pushq %rbp > movq %rsp, %rbp > ... > call function > ... > leave > ret > > If callee chang

Re: [PATCH] x86: Preserve frame pointer for no_callee_saved_registers attribute

2025-06-29 Thread Hongtao Liu
On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote: > > On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote: > > > > Update functions with no_callee_saved_registers/preserve_none attribute > > to preserve frame pointer since caller may use it to save the current > > stac

Re: [PATCH] x86: Handle vector broadcast source

2025-06-26 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 2:17 PM H.J. Lu wrote: > > On Thu, Jun 26, 2025 at 2:11 PM Hongtao Liu wrote: > > > > On Thu, Jun 26, 2025 at 1:59 PM H.J. Lu wrote: > > > > > > Use the inner scalar mode of vector broadcast source in: > > > > > >

Re: [PATCH] x86: Also handle all 1s float vector constant

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 2:02 PM H.J. Lu wrote: > > Since float vector constant > > (const_vector:V4SF [(const_double:SF -QNaN [-QNaN]) repeated x4]) > > is an all 1s float vector constant, update the remove_redundant_vector > pass to replace > > (insn 20 18 21 2 (set (reg:V4SF 124) > (cons

Re: [PATCH] x86: Handle vector broadcast source

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 1:59 PM H.J. Lu wrote: > > Use the inner scalar mode of vector broadcast source in: > > (set (reg:V8DF 394) >(vec_duplicate:V8DF (reg:V2DF 190 [ alpha ]))) > > to compute the vector mode for broadcast from vector source. ix86_get_vector_cse_mode (unsigned int si

Re: [PATCH] x86: Handle REG_EH_REGION note in DEF_INSN

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 1:56 PM H.J. Lu wrote: > > On Thu, Jun 26, 2025 at 1:24 PM Hongtao Liu wrote: > > > > On Thu, Jun 26, 2025 at 6:20 AM H.J. Lu wrote: > > > > > > For tcpsock_test.go in libgo tests, > > > > > > commit aba3b9d3

Re: [PATCH] x86: Handle REG_EH_REGION note in DEF_INSN

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 6:20 AM H.J. Lu wrote: > > For tcpsock_test.go in libgo tests, > > commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b > Author: H.J. Lu > Date: Fri May 9 07:17:07 2025 +0800 > > x86: Extend the remove_redundant_vector pass > > added an instruction: > > (insn 501 101 102

Re: [PATCH v3] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-06-25 Thread Hongtao Liu
On Wed, Jun 25, 2025 at 3:35 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are > used for integer parameter passing. This can be used in an interpreter > to avoid saving/rest

Re: [PATCH] x86: Add debug dump for the remove_redundant_vector pass

2025-06-25 Thread Hongtao Liu
On Thu, Jun 26, 2025 at 6:21 AM H.J. Lu wrote: > > On Tue, Jun 24, 2025 at 2:21 PM H.J. Lu wrote: > > > > Add debug dump for the remove_redundant_vector pass with the following > > output: > > > > Replace: > > > > (insn 7 4 8 2 (set (reg:V2DI 103) > > (const_vector:V2DI [ > >

Re: [PATCH v3] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-06-25 Thread Hongtao Liu
On Tue, Jun 17, 2025 at 8:54 PM Cui, Lili wrote: > > > > > -Original Message- > > From: H.J. Lu > > Sent: Monday, June 16, 2025 10:08 PM > > To: Jan Hubicka > > Cc: Uros Bizjak ; Cui, Lili ; gcc- > > patc...@gcc.gnu.org; Liu, Hongtao ; > > mjgu...@gmail.com > > Subject: [PATCH v3] x86: U

Re: [PATCH v2] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-06-24 Thread Hongtao Liu
On Fri, May 23, 2025 at 1:56 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are > used for integer parameter passing. This can be used in an interpreter > to avoid saving/rest

Re: [PATCH] x86: Update -mtune=intel for Diamond Rapids/Clearwater Forest

2025-06-24 Thread Hongtao Liu
On Wed, Jun 25, 2025 at 1:06 PM H.J. Lu wrote: > > -mtune=intel is used to generate a single binary to run well on both big > core and small core, similar to hybrid CPUs. Update -mtune=intel to tune > for Diamond Rapids and Clearwater Forest, instead of Silvermont. > > PR target/120815 > * common

Re: [PATCH v4] x86: Extend the remove_redundant_vector pass

2025-06-24 Thread Hongtao Liu
On Tue, Jun 24, 2025 at 1:26 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 4:53 PM Hongtao Liu wrote: > > > > On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu wrote: > > > > > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote: > > > > > >

Re: [PATCH v3] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common d

Re: [PATCH v4] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote: > > > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu wrote: > > > > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > > > > > > &

Re: [PATCH v3] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote: > > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu wrote: > > > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > > > > > Extend the remove_redundant_vector pass to handle vector broadcasts from > &

Re: [PATCH v3] x86: Extend the remove_redundant_vector pass

2025-06-23 Thread Hongtao Liu
On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common d

Re: [PATCH v2] x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registers

2025-06-22 Thread Hongtao Liu
On Sat, Jun 21, 2025 at 11:09 PM H.J. Lu wrote: > > On Fri, Jun 20, 2025 at 4:12 PM H.J. Lu wrote: > > > > Don't use vmovdqu16/vmovdqu8 with non-EVEX registers even if AVX512BW is > > available. > > > > gcc/ > > > > PR target/120728 > > * config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovd

Re: [PATCH] x86: Add PROCESSOR_XXX comments to processor_cost_table

2025-06-22 Thread Hongtao Liu
On Mon, Jun 23, 2025 at 11:03 AM H.J. Lu wrote: > > Add a PROCESSOR_XXX comment to each entry in processor_cost_table to > describe which processor the cost enry is applied to. Ok as obvious. > > * config/i386/i386-options.cc (processor_cost_table): Add a > PROCESSOR_XXX comment to each entry. > >

Re: [PATCH] i386: Remove CLDEMOTE for clients

2025-06-22 Thread Hongtao Liu
On Fri, Jun 20, 2025 at 10:04 AM Haochen Jiang wrote: > > Hi all, > > CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned > it will be enabled on Xeon and Atom servers, not clients. Remove them > since Alder Lake (where it is introduced). > > Also will backport this patch to GC

Re: [PATCH v4] x86: Enable *mov_(and|or) only for -Oz

2025-06-19 Thread Hongtao Liu
On Wed, Jun 18, 2025 at 6:38 PM H.J. Lu wrote: > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f > Author: Roger Sayle > Date: Thu Dec 23 12:33:07 2021 + > > x86: PR target/103773: Fix wrong-code with -Oz from pop to memory. > > added "*mov_and" and extended "*mov_or" to transform > "

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-17 Thread Hongtao Liu
On Wed, Jun 18, 2025 at 2:39 PM H.J. Lu wrote: > > On Mon, Jun 16, 2025 at 4:14 PM Hongtao Liu wrote: > > > > >+enum redundant_load_kind > > >+{ > > >+ LOAD_CONST0_VECTOR, > > >+ LOAD_CONSTM1_VECTOR, > > >+ LOAD_VECTOR > >

Re: [PATCH v3] x86: Enable *mov_(and|or) only for -Oz

2025-06-17 Thread Hongtao Liu
On Mon, May 26, 2025 at 2:30 PM H.J. Lu wrote: > > On Sun, May 25, 2025 at 7:02 PM H.J. Lu wrote: > > > > On Sun, May 25, 2025 at 8:12 AM H.J. Lu wrote: > > > > > > On Sun, May 25, 2025 at 7:47 AM H.J. Lu wrote: > > > > > > > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f > > > > Author: Rog

Re: [PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not

2025-06-16 Thread Hongtao Liu
Drop this patch since https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686830.html could be a better alternative. On Tue, Jun 10, 2025 at 9:50 AM Hongtao Liu wrote: > > Ping > > On Mon, May 19, 2025 at 10:06 AM liuhongt wrote: > > > > From: "hongtao.liu" &

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-16 Thread Hongtao Liu
On Mon, Jun 16, 2025 at 4:30 PM Hongtao Liu wrote: > > >+enum redundant_load_kind > >+{ > >+ LOAD_CONST0_VECTOR, > >+ LOAD_CONSTM1_VECTOR, > >+ LOAD_VECTOR > >+}; > Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR, > X86_CSE_CONSTM1_VECTOR, X

Re: [PATCH v2 x86: Extend the remove_redundant_vector pass

2025-06-16 Thread Hongtao Liu
>+enum redundant_load_kind >+{ >+ LOAD_CONST0_VECTOR, >+ LOAD_CONSTM1_VECTOR, >+ LOAD_VECTOR >+}; Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR, X86_CSE_CONSTM1_VECTOR, X86_CSE_VEC_DUP? LOAD sounds a bit ambiguous. Similar to ix86_get_vector_load_mode -> ix86_get_vector_cse_mode? >+

Re: [PATCH] i386: Set SRF, GRR, CWF, GNR, DMR, ARL and PTL issue rate

2025-06-12 Thread Hongtao Liu
On Thu, Jun 12, 2025 at 10:51 AM Hu, Lin1 wrote: > > Hi, > > This patch aims to set SRF issue rate to 4, GNR issue rate to 6. According to > tests about spec2017, the patch has little effect on performance. > > For GRR, CWF, DMR, ARL and PTL, the patch set their issue rate to 6. Waiting > for > m

Re: [PATCH] [AUTOFDO] Don't scale bb_count with ipa_count when ipa_count is zero but count_max is not

2025-06-09 Thread Hongtao Liu
Ping On Mon, May 19, 2025 at 10:06 AM liuhongt wrote: > > From: "hongtao.liu" > > AutoFDO profile is a scaled profile, as a result, 0 sample does not > mean never executed. especially there's profile from function > body. Prevent combine_with_ipa_count·(ipa_count) from zeroing all > bb->count. >

Re: [PATCH] x86: Extend the remove_redundant_vector pass

2025-06-09 Thread Hongtao Liu
On Tue, Jun 3, 2025 at 2:59 PM H.J. Lu wrote: > > Extend the remove_redundant_vector pass to handle vector broadcasts from > constant and variable scalars. When broadcasting from constants and > function arguments, we can place a single widest vector broadcast at > entry of the nearest common dom

Re: [PATCH] i386: Add more peephole2 for APX NDD

2025-06-03 Thread Hongtao Liu
On Thu, May 29, 2025 at 4:56 PM Hu, Lin1 wrote: > > Hi, > > The patch aims to optimize > movb(%rdi), %al > movq%rdi, %rbx > xorl%esi, %eax, %edx > movb%dl, (%rdi) > cmpb%sil, %al > jne > to > xorb%sil, (%rdi) >

Re: [PATCH] i386: Add more forms peephole2 for adc/sbb

2025-06-03 Thread Hongtao Liu
On Mon, May 26, 2025 at 4:55 PM Hu, Lin1 wrote: > > Hi, all > > Enable -mapxf will change some patterns about adc/sbb. > > Hence gcc will raise an extra mov like > movq8(%rdi), %rax > adcq%rax, 8(%rsi), %rax > movq%rax, 8(%rdi) > rather than > movq

Re: [PATCH v2 0/7] Remove -mavx10.1-256/512 and -mno-evex512

2025-05-18 Thread Hongtao Liu
On Wed, May 14, 2025 at 3:29 PM Haochen Jiang wrote: > > Hi all, > > This is the v2 patch to remove -mavx10.1/256-512 and -mno-evex512. I suppose > this time all the patches will not be held due to size. > > As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-evex512 > options in GCC

Re: [PATCH] For datarefs with big gap, split them into different groups.

2025-05-15 Thread Hongtao Liu
It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181 On Fri, May 16, 2025 at 10:02 AM liuhongt wrote: > > The patch tries to solve miss vectorization for below case. > > void > foo (int* a, int* restrict b) > { > b[0] = a[0] * a[64]; > b[1] = a[65] * a[1]; > b[2] = a[2] * a[66]; >

Re: [PATCH] x86: Add preserve_none and update no_caller_saved_registers attributes

2025-05-13 Thread Hongtao Liu
On Fri, Apr 18, 2025 at 7:10 PM H.J. Lu wrote: > > Add preserve_none attribute which is similar to no_callee_saved_registers > attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are Could you split preserve_none into a separate patch, It looks like it's different from clang's p

Re: [PATCH] Update libbid according to the latest Intel Decimal Floating-Point Math Library.

2025-05-13 Thread Hongtao Liu
On Wed, May 14, 2025 at 9:22 AM liuhongt wrote: > > The Intel Decimal Floating-Point Math Library is available as open-source on > Netlib[1]. > > [1] https://www.netlib.org/misc/intel/ > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ready push to trunk. > > libgcc/config/libbid/Ch

Re: [PATCH v3] Consider frequency in cost estimation when converting scalar to vector.

2025-05-11 Thread Hongtao Liu
On Thu, May 8, 2025 at 2:40 PM liuhongt wrote: > > The only part I changed is related to size_cost of sse_to_ineteger, as below > > 114+ /* Under TARGET_SSE4_1, it's vmovd + vpextrd/vpinsrd. > 115+ W/o it, it's movd + psrlq/unpckldq + movd. */ > 116+ else if (!TARGET_64BIT && smode != SImod

Re: [PATCH v2] x86: Insert extra move for mode size smaller than natural size

2025-05-06 Thread Hongtao Liu
On Wed, May 7, 2025 at 9:06 AM H.J. Lu wrote: > > On Tue, May 6, 2025 at 3:35 PM Hongtao Liu wrote: > > > > On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote: > > > > > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote: > > > > > > > >

Re: [PATCH] x86: Skip if the mode size is smaller than its natural size

2025-05-06 Thread Hongtao Liu
On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote: > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote: > > > > > > > > > -Original Message- > > > From: H.J. Lu > > > Sent: Tuesday, May 6, 2025 2:16 PM > > > To: Liu, Hongtao > > > Cc: GCC Patches ; Uros Bizjak > > > > > > Subject: Re: [PA

Re: [PATCH] i386: Add ix86_expand_unsigned_small_int_cst_argument

2025-04-28 Thread Hongtao Liu
On Sun, Apr 27, 2025 at 10:58 AM H.J. Lu wrote: > > When passing 0xff as an unsigned char function argument with the C frontend > promotion, expand_normal used to get > > constant > 255> > > and returned the rtx value using the sign-extended representation: > > (const_int 255 [0xff]) > > But aft

Re: [PATCH v2] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-28 Thread Hongtao Liu
On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu wrote: > > On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu wrote: > > > > > > > This is what my patch does: > > > But it iterates through vector_insns, using a def-ref chain to find > > > those insns. I think we can just record those single_set with src as > > > co

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-25 Thread Hongtao Liu
> > I am not so sure about this when it come to relatively common > instructions. Hiding things in unspec prevents combine and other RTL > passes from doing their job. I would say that it only makes sense for > siutations where RTL equivalent is very inconvenient. > In the direction of using gener

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-24 Thread Hongtao Liu
On Fri, Apr 25, 2025 at 1:26 PM Jan Hubicka wrote: > > > On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka wrote: > > > > > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > > > > or vpandn. > > > > Current register_operand/vector_operand could lose some optimization > > >

Re: [PATCH] [x86] Generate 2 FMA instructions in ix86_expand_swdivsf.

2025-04-23 Thread Hongtao Liu
On Thu, Apr 24, 2025 at 12:54 AM Jan Hubicka wrote: > > > From: "hongtao.liu" > > > > When FMA is available, N-R step can be rewritten with > > > > a / b = (a - (rcp(b) * a * b)) * rcp(b) + rcp(b) * a > > > > which have 2 fma generated.[1] > > > > [1] https://bugs.llvm.org/show_bug.cgi?id=21385 >

Re: [PATCH] Consider frequency in cost estimation when converting scalar to vector.

2025-04-23 Thread Hongtao Liu
On Thu, Apr 24, 2025 at 12:50 AM Jan Hubicka wrote: > > > In some benchmark, I notice stv failed due to cost unprofitable, but the > > igain > > is inside the loop, but sse<->integer conversion is outside the loop, > > current cost > > model doesn't consider the frequency of those gain/cost. > >

Re: [PATCH] Accept allones or 0 operand for vcond_mask op1.

2025-04-22 Thread Hongtao Liu
On Mon, Apr 21, 2025 at 2:52 PM liuhongt wrote: > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand > or vpandn. > Current register_operand/vector_operand could lose some optimization > opportunity. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for tru

Re: Improve vectorizer costs of min, max, abs, absu and const_expr on x86

2025-04-21 Thread Hongtao Liu
On Tue, Apr 22, 2025 at 10:30 AM Hongtao Liu wrote: > > On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote: > > > > Hi, > > this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR, > > MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_E

Re: Improve vectorizer costs of min, max, abs, absu and const_expr on x86

2025-04-21 Thread Hongtao Liu
On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote: > > Hi, > this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR, > MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_EXPR and > ABSU_EXPR > but it was only correct for FP variant (wehre it corresponds to andss clea

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-21 Thread Hongtao Liu
On Mon, Apr 21, 2025 at 4:30 PM H.J. Lu wrote: > > On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote: > > > > On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > > > > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > > > > >

Re: PING: [PATCH] x86: Add a pass to remove redundant all 0s/1s vector load

2025-04-20 Thread Hongtao Liu
On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote: > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote: > > > > For all different modes of all 0s/1s vectors, we can use the single widest > > all 0s/1s vector register for all 0s/1s vector uses in the whole function. > > Add a pass to generate a single wi

Re: [PATCH v2] x86: Update memcpy/memset inline strategies for -mtune=generic

2025-04-17 Thread Hongtao Liu
On Tue, Apr 8, 2025 at 3:52 AM H.J. Lu wrote: > > Simplify memcpy and memset inline strategies to avoid branches for > -mtune=generic: > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector >load and store for up to 16 * 16 (256) bytes when the data size is >fixed and kn

Re: [PATCH] APX: Don't use red-zone with APX and no caller-saved registers

2025-04-14 Thread Hongtao Liu
On Mon, Apr 14, 2025 at 8:56 PM H.J. Lu wrote: > > On Mon, Apr 14, 2025 at 2:39 AM Uros Bizjak wrote: > > > > On Mon, Apr 14, 2025 at 8:54 AM Hongtao Liu wrote: > > > > > > On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote: > > > > > >

  1   2   3   4   5   6   7   8   9   10   >