On Sun, Aug 31, 2025 at 11:15 AM H.J. Lu wrote:
>
> Since _Decimal128 arithmetic requires the round-to-nearest rounding
> mode, define DFP_INIT_ROUNDMODE and DFP_RESTORE_ROUNDMODE, similar to
> FP_INIT_ROUNDMODE in sfp-machine.h, to set the rounding mode to
> round-to-nearest at _Decimal128 relate
On Thu, Aug 28, 2025 at 10:15 AM H.J. Lu wrote:
>
> Source operands of 2 TLS_CALL patterns in
>
> (insn 10 9 11 3 (set (reg:DI 100)
> (unspec:DI [
> (symbol_ref:DI ("caml_state") [flags 0x10] 0x7fe10e1d9e40 caml_state>)
> ] UNSPEC_TLSDESC)) "x.c":7:16 1674 {*t
On Wed, Aug 27, 2025 at 4:53 PM Richard Biener
wrote:
>
> On Wed, Aug 27, 2025 at 6:57 AM liuhongt wrote:
> >
> > Since kind == vec_perm may not be a real vec_perm, just a broadcast or
> > simple load in BB vectorizer.
>
> Btw, you can now (in some cases) do better, namely you should
> always hav
On Wed, Aug 27, 2025 at 6:32 AM H.J. Lu wrote:
>
> Source operands of 2 *tls_dynamic_gnu2_call_64_di patterns in
>
> (insn 10 9 11 3 (set (reg:DI 100)
> (unspec:DI [
> (symbol_ref:DI ("caml_state") [flags 0x10] 0x7fe10e1d9e40 caml_state>)
> ] UNSPEC_TLSDESC))
On Wed, Aug 27, 2025 at 11:59 AM H.J. Lu wrote:
>
> On Tue, Aug 26, 2025 at 8:50 PM Hongtao Liu wrote:
> >
> > On Wed, Aug 27, 2025 at 6:30 AM H.J. Lu wrote:
> > >
> > > For a basic block with only a debug marker:
> > >
> > > (note 3 0
On Wed, Aug 27, 2025 at 6:30 AM H.J. Lu wrote:
>
> For a basic block with only a debug marker:
>
> (note 3 0 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> (note 2 3 5 2 NOTE_INSN_FUNCTION_BEG)
> (debug_insn 5 2 16 2 (debug_marker) "x.c":6:3 -1 (nil))
>
> emit the TLS call after debug marker.
>
> gcc/
>
>
On Tue, Aug 26, 2025 at 5:31 PM Jan Hubicka wrote:
>
> > > > > > In general we should have a look at register pressure, I
> > > > > > suppose issue_rate / m_num_reductions ensures we're never
> > > > > > getting close to this in practice.
> > > > >
> > > > > Bootstrapped and regtested on x86_64-pc
On Mon, Aug 18, 2025 at 4:52 PM Hongtao Liu wrote:
>
> On Mon, Aug 11, 2025 at 8:57 PM Richard Biener wrote:
> >
> > On Sun, 10 Aug 2025, liuhongt wrote:
> >
> > > >
> > > > The comment doesn't match the bool type.
> > > >
>
On Tue, Aug 26, 2025 at 9:38 AM Andi Kleen wrote:
>
> From: Andi Kleen
>
> Make the expand pattern for operand 1 match the final instruction.
>
> PR 121658
>
> gcc/ChangeLog:
>
> * config/i386/sse.md ("3"): Use
> register_operand for rotate patterns.
>
> gcc/testsuite/Chan
On Tue, Aug 26, 2025 at 6:40 AM Jakub Jelinek wrote:
>
> Hi!
>
> The vgf2p8affineqb_ pattern uses "register_operand"
> predicate for the first input operand, so using "general_operand"
> for the rotate operand passed to it leads to ICEs, and so does
> the "nonimmediate_operand" in the v16qi3 defin
On Sat, Aug 23, 2025 at 1:34 AM Andi Kleen wrote:
>
> From: Andi Kleen
>
> [v3 version: Remove unnecessary _mask pattern.
> Add extra FAIL case. Remove unnecessary AVX512F check.
> Fix changelog.]
>
> [v2 version: Split rotate patterns in V16QI and V32/64QI.
> Add various AVX512F checks. Remove s
On Fri, Aug 22, 2025 at 11:26 PM Andi Kleen wrote:
>
> > > + else if (TARGET_GFNI && TARGET_AVX512F && CONST_INT_P (operands[2]))
> > I don't think we need AVX512F here, and let's exclude >>7 cases here,
> > so better be.
> > else if (TARGET_GFNI
> > && CONST_INT_P (operands[2])
> >
On Wed, Aug 20, 2025 at 11:08 PM Andi Kleen wrote:
>
> From: Andi Kleen
>
> [v2 version: Split rotate patterns in V16QI and V32/64QI.
> Add various AVX512F checks. Remove some unnecessary
> masks. Add untested cond_ pattern (untested, couldn't trigger it)
> Clean up some control flow. Use narrowe
On Thu, Aug 21, 2025 at 3:46 AM H.J. Lu wrote:
>
> For a basic block with only a label:
>
> (code_label 78 11 77 3 14 (nil) [1 uses])
> (note 77 78 54 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
>
> emit the TLS call after NOTE_INSN_BASIC_BLOCK, instead of before
> NOTE_INSN_BASIC_BLOCK, to avoid
>
> x.c: In
On Wed, Aug 20, 2025 at 2:49 AM H.J. Lu wrote:
>
> We can't place a TLS call before a conditional jump in a basic block like
>
> (code_label 13 11 14 4 2 (nil) [1 uses])
> (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (jump_insn 16 14 17 4 (set (pc)
> (if_then_else (le (reg:CCNO 17 flag
On Tue, Aug 19, 2025 at 10:51 AM H.J. Lu wrote:
>
> We can't place a TLS call before a conditional jump in a basic block like
>
> (code_label 13 11 14 4 2 (nil) [1 uses])
> (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (jump_insn 16 14 17 4 (set (pc)
> (if_then_else (le (reg:CCNO 17 fla
On Tue, Aug 19, 2025 at 10:55 AM H.J. Lu wrote:
>
> On Mon, Aug 18, 2025 at 6:56 PM Hongtao Liu wrote:
> >
> > On Tue, Aug 19, 2025 at 4:40 AM H.J. Lu wrote:
> > >
> > > On Mon, Aug 18, 2025 at 12:59 AM Hongtao Liu wrote:
> > > >
> >
On Tue, Aug 19, 2025 at 4:40 AM H.J. Lu wrote:
>
> On Mon, Aug 18, 2025 at 12:59 AM Hongtao Liu wrote:
> >
> > On Mon, Aug 18, 2025 at 4:12 PM Hongtao Liu wrote:
> > >
> > > On Mon, Aug 18, 2025 at 4:50 AM H.J. Lu wrote:
> > > >
> > > &
On Mon, Aug 11, 2025 at 8:57 PM Richard Biener wrote:
>
> On Sun, 10 Aug 2025, liuhongt wrote:
>
> > >
> > > The comment doesn't match the bool type.
> > >
> > Fixed.
> >
> > >
> > > is_gimple_assign (stmt_info->stmt)
> > >
> > Changed.
> >
> > > There's also SAD_EXPR? The vectorizer has lane_red
On Mon, Aug 18, 2025 at 4:12 PM Hongtao Liu wrote:
>
> On Mon, Aug 18, 2025 at 4:50 AM H.J. Lu wrote:
> >
> > We can't place a TLS call before a conditional jump in a basic block like
> >
> > (code_label 13 11 14 4 2 (nil) [1 uses])
> > (not
On Mon, Aug 18, 2025 at 4:50 AM H.J. Lu wrote:
>
> We can't place a TLS call before a conditional jump in a basic block like
>
> (code_label 13 11 14 4 2 (nil) [1 uses])
> (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (jump_insn 16 14 17 4 (set (pc)
> (if_then_else (le (reg:CCNO 17 flag
On Sat, Aug 16, 2025 at 10:24 PM H.J. Lu wrote:
>
> On Sat, Aug 16, 2025 at 6:43 AM Hongtao Liu wrote:
> >
> > On Sat, Aug 16, 2025 at 8:45 PM Hongtao Liu wrote:
> > >
> > > On Fri, Aug 15, 2025 at 8:48 PM H.J. Lu wrote:
> > > >
> > &
On Sat, Aug 16, 2025 at 8:45 PM Hongtao Liu wrote:
>
> On Fri, Aug 15, 2025 at 8:48 PM H.J. Lu wrote:
> >
> > On Fri, Aug 15, 2025 at 12:44 AM Hongtao Liu wrote:
> > >
> > > On Fri, Aug 15, 2025 at 10:07 AM H.J. Lu wrote:
> > > >
> > >
On Fri, Aug 15, 2025 at 8:48 PM H.J. Lu wrote:
>
> On Fri, Aug 15, 2025 at 12:44 AM Hongtao Liu wrote:
> >
> > On Fri, Aug 15, 2025 at 10:07 AM H.J. Lu wrote:
> > >
> > > Add target("80387") attribute to enable and disable x87 ins
On Fri, Aug 15, 2025 at 10:07 AM H.J. Lu wrote:
>
> Add target("80387") attribute to enable and disable x87 instructions in a
> function.
>
> gcc/
>
> PR target/121541
> * config/i386/i386-options.cc
> (ix86_valid_target_attribute_inner_p): Add target("80387")
> att
On Thu, Aug 14, 2025 at 9:22 AM H.J. Lu wrote:
>
> commit 9804b23198b39f85a7258be556c5e8aed44b9efc
> Author: H.J. Lu
> Date: Sun Apr 13 11:38:24 2025 -0700
>
> x86: Add preserve_none and update no_caller_saved_registers attributes
>
> allowed MMX/80387 instructions in functions with no_call
On Wed, Aug 13, 2025 at 2:30 PM Hongtao Liu wrote:
>
> On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote:
> >
> > From: Andi Kleen
> >
> > The GFNI AVX gf2p8affineqb instruction can be used to implement
> > vectorized byte shifts or rotates. This patch uses th
On Wed, Aug 13, 2025 at 2:35 PM Hongtao Liu wrote:
>
> On Tue, Aug 12, 2025 at 10:02 PM H.J. Lu wrote:
> >
> > On Tue, Aug 12, 2025 at 06:47:54AM -0700, H.J. Lu wrote:
> > > On Mon, Aug 11, 2025 at 11:13 PM Hongtao Liu wrote:
> > > >
> > >
On Tue, Aug 12, 2025 at 10:02 PM H.J. Lu wrote:
>
> On Tue, Aug 12, 2025 at 06:47:54AM -0700, H.J. Lu wrote:
> > On Mon, Aug 11, 2025 at 11:13 PM Hongtao Liu wrote:
> > >
> > > On Mon, Aug 4, 2025 at 11:33 PM H.J. Lu wrote:
> > > >
> > > >
On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote:
>
> From: Andi Kleen
>
> The GFNI AVX gf2p8affineqb instruction can be used to implement
> vectorized byte shifts or rotates. This patch uses them to implement
> shift and rotate patterns to allow the vectorizer to use them.
> Previously AVX couldn
On Wed, Aug 13, 2025 at 1:40 AM Andi Kleen wrote:
>
> >
> > The latter takes 5 cycles, the former takes 3 cycles.
>
> It's pipelined however.
>
> >
> > Do you have any microbenchmark or real workloads to show your
> > optimization is better?
>
> Keep in mind it only uses one port vs two.
>
> Yes I
On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote:
>
> From: Andi Kleen
>
> The GFNI AVX gf2p8affineqb instruction can be used to implement
> vectorized byte shifts or rotates. This patch uses them to implement
> shift and rotate patterns to allow the vectorizer to use them.
> Previously AVX couldn
On Mon, Aug 4, 2025 at 11:33 PM H.J. Lu wrote:
>
> On Mon, Aug 04, 2025 at 02:57:39PM +0800, Hongtao Liu wrote:
> > > > > > + rtx_insn *before = nullptr;
> > > > > > + rtx_insn *after = nullptr;
> > > > > > + if (insn == BB_HE
On Tue, Aug 12, 2025 at 5:14 AM H.J. Lu wrote:
>
> Since SImode MOV only supports signed 32-bit immediate, change unsigned
> 32-bit immediate to signed if needed.
>
> gcc/
>
> PR target/121497
> * config/i386/i386-features.cc (ix86_place_single_vector_set):
> Change unsigne
On Tue, Jul 29, 2025 at 9:28 AM H.J. Lu wrote:
>
> On Mon, Jul 28, 2025 at 01:53:08PM -0700, H.J. Lu wrote:
> > On Mon, Jul 28, 2025 at 04:51:24PM +0800, Hongtao Liu wrote:
> > > On Wed, Jul 23, 2025 at 8:07 AM H.J. Lu wrote:
> > > >
> > > > Fo
On Fri, Aug 1, 2025 at 8:10 PM H.J. Lu wrote:
>
> Don't hoist non all 0s/1s vector set outside of the loop to avoid extra
> spills.
The patch LGTM.
>
> gcc/
>
> PR target/120941
> * config/i386/i386-features.cc (x86_cse_kind): Moved before
> ix86_place_single_vector_set.
>
On Wed, Jul 30, 2025 at 11:45 AM H.J. Lu wrote:
>
> commit 4c80062d7b8c272e2e193b8074a8440dbb4fe588
> Author: H.J. Lu
> Date: Sun May 25 07:40:29 2025 +0800
>
> x86: Enable *mov_(and|or) only for -Oz
>
> disabled transformation from "movq $-1,reg" to "pushq $-1; popq reg" for
> -Oz. But fo
On Mon, Jul 28, 2025 at 11:10 PM H.J. Lu wrote:
>
> On Thu, Jun 19, 2025 at 2:55 AM Roger Sayle
> wrote:
> >
> >
> > Looks good to me. Sorry for any inconvenience.
> > Cheers,
> > Roger
> >
> > > -Original Message-
> > >
On Wed, Jul 23, 2025 at 8:07 AM H.J. Lu wrote:
>
> For TLS calls:
>
> 1. UNSPEC_TLS_GD:
>
> (parallel [
> (set (reg:DI 0 ax)
> (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> (const_int 0 [0])))
> (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
>
On Tue, Jul 22, 2025 at 4:47 AM H.J. Lu wrote:
>
> For TLS calls:
>
> 1. UNSPEC_TLS_GD:
>
> (parallel [
> (set (reg:DI 0 ax)
> (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> (const_int 0 [0])))
> (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
>
On Thu, Jul 17, 2025 at 11:22 PM H.J. Lu wrote:
>
> For TLS calls:
>
> 1. UNSPEC_TLS_GD:
>
> (parallel [
> (set (reg:DI 0 ax)
> (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> (const_int 0 [0])))
> (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
>
On Thu, Jul 17, 2025 at 9:43 AM H.J. Lu wrote:
>
> There is no need to change mode for XOR in ix86_expand_ternlog now.
> Whatever reasons for it in the first place no longer exist. Tested
> on x86-64 with -m32. There are no regressions.
Ok.
>
> * config/i386/i386.cc (ix86_expand_ternlog)
On Tue, Jul 15, 2025 at 2:36 PM Haochen Jiang wrote:
>
> Hi all,
>
> In ISE058, the AVX10.2 imply is removed from AMX-AVX512. This
> leads to re-consideration on the imply for AMX-AVX512.
>
> Since it is using zmm register and using zmm register only, we
> need to at least imply AVX512F. AVX512VL
On Mon, Jul 7, 2025 at 3:27 PM Hongtao Liu wrote:
>
> On Tue, Jun 24, 2025 at 2:11 PM H.J. Lu wrote:
> >
> > On Mon, Jun 23, 2025 at 2:24 PM H.J. Lu wrote:
> > >
> > > On Wed, Jun 18, 2025 at 3:17 PM H.J. Lu wrote:
> > > >
> > > >
On Tue, Jun 24, 2025 at 2:11 PM H.J. Lu wrote:
>
> On Mon, Jun 23, 2025 at 2:24 PM H.J. Lu wrote:
> >
> > On Wed, Jun 18, 2025 at 3:17 PM H.J. Lu wrote:
> > >
> > > 1. Don't generate the loop if the loop count is 1.
> > > 2. For memset with vector on small size, use vector if small size supports
On Mon, Jul 7, 2025 at 3:18 PM Hongtao Liu wrote:
>
> On Fri, Jul 4, 2025 at 5:45 PM Richard Biener wrote:
> >
> > The following adds a x86 tuning to enable the use of AVX512 masked
> > epilogues in cases we heuristically determine it to be not detrimental
> &
On Fri, Jul 4, 2025 at 5:45 PM Richard Biener wrote:
>
> The following adds a x86 tuning to enable the use of AVX512 masked
> epilogues in cases we heuristically determine it to be not detrimental
> by high chance. Basically problematic cases are when there are
> data streams that are both stored
On Mon, Jun 30, 2025 at 11:46 AM H.J. Lu wrote:
>
> On Mon, Jun 30, 2025 at 11:17 AM H.J. Lu wrote:
> >
> > On Mon, Jun 30, 2025 at 10:41 AM Hongtao Liu wrote:
> > >
> > > On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote:
> > > >
> &
On Mon, Jun 30, 2025 at 11:16 AM H.J. Lu wrote:
>
> On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote:
> >
> > On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote:
> > >
> > > Update functions with no_callee_saved_registers/preserve_none attribute
> > > t
On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote:
>
> Update functions with no_callee_saved_registers/preserve_none attribute
> to preserve frame pointer since caller may use it to save the current
> stack:
>
> pushq %rbp
> movq %rsp, %rbp
> ...
> call function
> ...
> leave
> ret
>
> If callee chang
On Mon, Jun 30, 2025 at 10:37 AM Hongtao Liu wrote:
>
> On Sat, Jun 28, 2025 at 8:30 PM H.J. Lu wrote:
> >
> > Update functions with no_callee_saved_registers/preserve_none attribute
> > to preserve frame pointer since caller may use it to save the current
> > stac
On Thu, Jun 26, 2025 at 2:17 PM H.J. Lu wrote:
>
> On Thu, Jun 26, 2025 at 2:11 PM Hongtao Liu wrote:
> >
> > On Thu, Jun 26, 2025 at 1:59 PM H.J. Lu wrote:
> > >
> > > Use the inner scalar mode of vector broadcast source in:
> > >
> > >
On Thu, Jun 26, 2025 at 2:02 PM H.J. Lu wrote:
>
> Since float vector constant
>
> (const_vector:V4SF [(const_double:SF -QNaN [-QNaN]) repeated x4])
>
> is an all 1s float vector constant, update the remove_redundant_vector
> pass to replace
>
> (insn 20 18 21 2 (set (reg:V4SF 124)
> (cons
On Thu, Jun 26, 2025 at 1:59 PM H.J. Lu wrote:
>
> Use the inner scalar mode of vector broadcast source in:
>
> (set (reg:V8DF 394)
>(vec_duplicate:V8DF (reg:V2DF 190 [ alpha ])))
>
> to compute the vector mode for broadcast from vector source.
ix86_get_vector_cse_mode (unsigned int si
On Thu, Jun 26, 2025 at 1:56 PM H.J. Lu wrote:
>
> On Thu, Jun 26, 2025 at 1:24 PM Hongtao Liu wrote:
> >
> > On Thu, Jun 26, 2025 at 6:20 AM H.J. Lu wrote:
> > >
> > > For tcpsock_test.go in libgo tests,
> > >
> > > commit aba3b9d3
On Thu, Jun 26, 2025 at 6:20 AM H.J. Lu wrote:
>
> For tcpsock_test.go in libgo tests,
>
> commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b
> Author: H.J. Lu
> Date: Fri May 9 07:17:07 2025 +0800
>
> x86: Extend the remove_redundant_vector pass
>
> added an instruction:
>
> (insn 501 101 102
On Wed, Jun 25, 2025 at 3:35 PM H.J. Lu wrote:
>
> Add preserve_none attribute which is similar to no_callee_saved_registers
> attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
> used for integer parameter passing. This can be used in an interpreter
> to avoid saving/rest
On Thu, Jun 26, 2025 at 6:21 AM H.J. Lu wrote:
>
> On Tue, Jun 24, 2025 at 2:21 PM H.J. Lu wrote:
> >
> > Add debug dump for the remove_redundant_vector pass with the following
> > output:
> >
> > Replace:
> >
> > (insn 7 4 8 2 (set (reg:V2DI 103)
> > (const_vector:V2DI [
> >
On Tue, Jun 17, 2025 at 8:54 PM Cui, Lili wrote:
>
>
>
> > -Original Message-
> > From: H.J. Lu
> > Sent: Monday, June 16, 2025 10:08 PM
> > To: Jan Hubicka
> > Cc: Uros Bizjak ; Cui, Lili ; gcc-
> > patc...@gcc.gnu.org; Liu, Hongtao ;
> > mjgu...@gmail.com
> > Subject: [PATCH v3] x86: U
On Fri, May 23, 2025 at 1:56 PM H.J. Lu wrote:
>
> Add preserve_none attribute which is similar to no_callee_saved_registers
> attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
> used for integer parameter passing. This can be used in an interpreter
> to avoid saving/rest
On Wed, Jun 25, 2025 at 1:06 PM H.J. Lu wrote:
>
> -mtune=intel is used to generate a single binary to run well on both big
> core and small core, similar to hybrid CPUs. Update -mtune=intel to tune
> for Diamond Rapids and Clearwater Forest, instead of Silvermont.
>
> PR target/120815
> * common
On Tue, Jun 24, 2025 at 1:26 PM H.J. Lu wrote:
>
> On Mon, Jun 23, 2025 at 4:53 PM Hongtao Liu wrote:
> >
> > On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu wrote:
> > >
> > > On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote:
> > > >
> >
On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote:
>
> Extend the remove_redundant_vector pass to handle vector broadcasts from
> constant and variable scalars. When broadcasting from constants and
> function arguments, we can place a single widest vector broadcast at
> entry of the nearest common d
On Mon, Jun 23, 2025 at 4:45 PM H.J. Lu wrote:
>
> On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote:
> >
> > On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu wrote:
> > >
> > > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote:
> > > >
> > > &
On Mon, Jun 23, 2025 at 4:10 PM H.J. Lu wrote:
>
> On Mon, Jun 23, 2025 at 3:11 PM Hongtao Liu wrote:
> >
> > On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote:
> > >
> > > Extend the remove_redundant_vector pass to handle vector broadcasts from
> &
On Thu, Jun 19, 2025 at 10:25 AM H.J. Lu wrote:
>
> Extend the remove_redundant_vector pass to handle vector broadcasts from
> constant and variable scalars. When broadcasting from constants and
> function arguments, we can place a single widest vector broadcast at
> entry of the nearest common d
On Sat, Jun 21, 2025 at 11:09 PM H.J. Lu wrote:
>
> On Fri, Jun 20, 2025 at 4:12 PM H.J. Lu wrote:
> >
> > Don't use vmovdqu16/vmovdqu8 with non-EVEX registers even if AVX512BW is
> > available.
> >
> > gcc/
> >
> > PR target/120728
> > * config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovd
On Mon, Jun 23, 2025 at 11:03 AM H.J. Lu wrote:
>
> Add a PROCESSOR_XXX comment to each entry in processor_cost_table to
> describe which processor the cost enry is applied to.
Ok as obvious.
>
> * config/i386/i386-options.cc (processor_cost_table): Add a
> PROCESSOR_XXX comment to each entry.
>
>
On Fri, Jun 20, 2025 at 10:04 AM Haochen Jiang wrote:
>
> Hi all,
>
> CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned
> it will be enabled on Xeon and Atom servers, not clients. Remove them
> since Alder Lake (where it is introduced).
>
> Also will backport this patch to GC
On Wed, Jun 18, 2025 at 6:38 PM H.J. Lu wrote:
>
> commit ef26c151c14a87177d46fd3d725e7f82e040e89f
> Author: Roger Sayle
> Date: Thu Dec 23 12:33:07 2021 +
>
> x86: PR target/103773: Fix wrong-code with -Oz from pop to memory.
>
> added "*mov_and" and extended "*mov_or" to transform
> "
On Wed, Jun 18, 2025 at 2:39 PM H.J. Lu wrote:
>
> On Mon, Jun 16, 2025 at 4:14 PM Hongtao Liu wrote:
> >
> > >+enum redundant_load_kind
> > >+{
> > >+ LOAD_CONST0_VECTOR,
> > >+ LOAD_CONSTM1_VECTOR,
> > >+ LOAD_VECTOR
> >
On Mon, May 26, 2025 at 2:30 PM H.J. Lu wrote:
>
> On Sun, May 25, 2025 at 7:02 PM H.J. Lu wrote:
> >
> > On Sun, May 25, 2025 at 8:12 AM H.J. Lu wrote:
> > >
> > > On Sun, May 25, 2025 at 7:47 AM H.J. Lu wrote:
> > > >
> > > > commit ef26c151c14a87177d46fd3d725e7f82e040e89f
> > > > Author: Rog
Drop this patch since
https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686830.html could
be a better alternative.
On Tue, Jun 10, 2025 at 9:50 AM Hongtao Liu wrote:
>
> Ping
>
> On Mon, May 19, 2025 at 10:06 AM liuhongt wrote:
> >
> > From: "hongtao.liu"
&
On Mon, Jun 16, 2025 at 4:30 PM Hongtao Liu wrote:
>
> >+enum redundant_load_kind
> >+{
> >+ LOAD_CONST0_VECTOR,
> >+ LOAD_CONSTM1_VECTOR,
> >+ LOAD_VECTOR
> >+};
> Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR,
> X86_CSE_CONSTM1_VECTOR, X
>+enum redundant_load_kind
>+{
>+ LOAD_CONST0_VECTOR,
>+ LOAD_CONSTM1_VECTOR,
>+ LOAD_VECTOR
>+};
Perhaps rename to x86_cse_kind, X86_CSE_CONST0_VECTOR,
X86_CSE_CONSTM1_VECTOR, X86_CSE_VEC_DUP?
LOAD sounds a bit ambiguous.
Similar to ix86_get_vector_load_mode -> ix86_get_vector_cse_mode?
>+
On Thu, Jun 12, 2025 at 10:51 AM Hu, Lin1 wrote:
>
> Hi,
>
> This patch aims to set SRF issue rate to 4, GNR issue rate to 6. According to
> tests about spec2017, the patch has little effect on performance.
>
> For GRR, CWF, DMR, ARL and PTL, the patch set their issue rate to 6. Waiting
> for
> m
Ping
On Mon, May 19, 2025 at 10:06 AM liuhongt wrote:
>
> From: "hongtao.liu"
>
> AutoFDO profile is a scaled profile, as a result, 0 sample does not
> mean never executed. especially there's profile from function
> body. Prevent combine_with_ipa_count·(ipa_count) from zeroing all
> bb->count.
>
On Tue, Jun 3, 2025 at 2:59 PM H.J. Lu wrote:
>
> Extend the remove_redundant_vector pass to handle vector broadcasts from
> constant and variable scalars. When broadcasting from constants and
> function arguments, we can place a single widest vector broadcast at
> entry of the nearest common dom
On Thu, May 29, 2025 at 4:56 PM Hu, Lin1 wrote:
>
> Hi,
>
> The patch aims to optimize
> movb(%rdi), %al
> movq%rdi, %rbx
> xorl%esi, %eax, %edx
> movb%dl, (%rdi)
> cmpb%sil, %al
> jne
> to
> xorb%sil, (%rdi)
>
On Mon, May 26, 2025 at 4:55 PM Hu, Lin1 wrote:
>
> Hi, all
>
> Enable -mapxf will change some patterns about adc/sbb.
>
> Hence gcc will raise an extra mov like
> movq8(%rdi), %rax
> adcq%rax, 8(%rsi), %rax
> movq%rax, 8(%rdi)
> rather than
> movq
On Wed, May 14, 2025 at 3:29 PM Haochen Jiang wrote:
>
> Hi all,
>
> This is the v2 patch to remove -mavx10.1/256-512 and -mno-evex512. I suppose
> this time all the patches will not be held due to size.
>
> As mentioned in GCC 15, we will remove -mavx10.1-256/512 and -mno-evex512
> options in GCC
It's https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181
On Fri, May 16, 2025 at 10:02 AM liuhongt wrote:
>
> The patch tries to solve miss vectorization for below case.
>
> void
> foo (int* a, int* restrict b)
> {
> b[0] = a[0] * a[64];
> b[1] = a[65] * a[1];
> b[2] = a[2] * a[66];
>
On Fri, Apr 18, 2025 at 7:10 PM H.J. Lu wrote:
>
> Add preserve_none attribute which is similar to no_callee_saved_registers
> attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are
Could you split preserve_none into a separate patch,
It looks like it's different from clang's p
On Wed, May 14, 2025 at 9:22 AM liuhongt wrote:
>
> The Intel Decimal Floating-Point Math Library is available as open-source on
> Netlib[1].
>
> [1] https://www.netlib.org/misc/intel/
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ready push to trunk.
>
> libgcc/config/libbid/Ch
On Thu, May 8, 2025 at 2:40 PM liuhongt wrote:
>
> The only part I changed is related to size_cost of sse_to_ineteger, as below
>
> 114+ /* Under TARGET_SSE4_1, it's vmovd + vpextrd/vpinsrd.
> 115+ W/o it, it's movd + psrlq/unpckldq + movd. */
> 116+ else if (!TARGET_64BIT && smode != SImod
On Wed, May 7, 2025 at 9:06 AM H.J. Lu wrote:
>
> On Tue, May 6, 2025 at 3:35 PM Hongtao Liu wrote:
> >
> > On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote:
> > >
> > > On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote:
> > > >
> > > >
On Tue, May 6, 2025 at 3:06 PM H.J. Lu wrote:
>
> On Tue, May 6, 2025 at 2:30 PM Liu, Hongtao wrote:
> >
> >
> >
> > > -Original Message-
> > > From: H.J. Lu
> > > Sent: Tuesday, May 6, 2025 2:16 PM
> > > To: Liu, Hongtao
> > > Cc: GCC Patches ; Uros Bizjak
> > >
> > > Subject: Re: [PA
On Sun, Apr 27, 2025 at 10:58 AM H.J. Lu wrote:
>
> When passing 0xff as an unsigned char function argument with the C frontend
> promotion, expand_normal used to get
>
> constant
> 255>
>
> and returned the rtx value using the sign-extended representation:
>
> (const_int 255 [0xff])
>
> But aft
On Mon, Apr 28, 2025 at 5:07 PM H.J. Lu wrote:
>
> On Mon, Apr 28, 2025 at 4:26 PM H.J. Lu wrote:
> >
>
> > > > This is what my patch does:
> > > But it iterates through vector_insns, using a def-ref chain to find
> > > those insns. I think we can just record those single_set with src as
> > > co
>
> I am not so sure about this when it come to relatively common
> instructions. Hiding things in unspec prevents combine and other RTL
> passes from doing their job. I would say that it only makes sense for
> siutations where RTL equivalent is very inconvenient.
>
In the direction of using gener
On Fri, Apr 25, 2025 at 1:26 PM Jan Hubicka wrote:
>
> > On Thu, Apr 24, 2025 at 6:27 PM Jan Hubicka wrote:
> > >
> > > > Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> > > > or vpandn.
> > > > Current register_operand/vector_operand could lose some optimization
> > >
On Thu, Apr 24, 2025 at 12:54 AM Jan Hubicka wrote:
>
> > From: "hongtao.liu"
> >
> > When FMA is available, N-R step can be rewritten with
> >
> > a / b = (a - (rcp(b) * a * b)) * rcp(b) + rcp(b) * a
> >
> > which have 2 fma generated.[1]
> >
> > [1] https://bugs.llvm.org/show_bug.cgi?id=21385
>
On Thu, Apr 24, 2025 at 12:50 AM Jan Hubicka wrote:
>
> > In some benchmark, I notice stv failed due to cost unprofitable, but the
> > igain
> > is inside the loop, but sse<->integer conversion is outside the loop,
> > current cost
> > model doesn't consider the frequency of those gain/cost.
> >
On Mon, Apr 21, 2025 at 2:52 PM liuhongt wrote:
>
> Since ix86_expand_sse_movcc will simplify them into a simple vmov, vpand
> or vpandn.
> Current register_operand/vector_operand could lose some optimization
> opportunity.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for tru
On Tue, Apr 22, 2025 at 10:30 AM Hongtao Liu wrote:
>
> On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote:
> >
> > Hi,
> > this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR,
> > MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_E
On Tue, Apr 22, 2025 at 12:46 AM Jan Hubicka wrote:
>
> Hi,
> this patch adds special cases for vectorizer costs in COND_EXPR, MIN_EXPR,
> MAX_EXPR, ABS_EXPR and ABSU_EXPR. We previously costed ABS_EXPR and
> ABSU_EXPR
> but it was only correct for FP variant (wehre it corresponds to andss clea
On Mon, Apr 21, 2025 at 4:30 PM H.J. Lu wrote:
>
> On Mon, Apr 21, 2025 at 11:29 AM Hongtao Liu wrote:
> >
> > On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote:
> > >
> > > On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote:
> > > >
> > > >
On Sat, Apr 19, 2025 at 1:25 PM H.J. Lu wrote:
>
> On Sun, Dec 1, 2024 at 7:50 AM H.J. Lu wrote:
> >
> > For all different modes of all 0s/1s vectors, we can use the single widest
> > all 0s/1s vector register for all 0s/1s vector uses in the whole function.
> > Add a pass to generate a single wi
On Tue, Apr 8, 2025 at 3:52 AM H.J. Lu wrote:
>
> Simplify memcpy and memset inline strategies to avoid branches for
> -mtune=generic:
>
> 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
>load and store for up to 16 * 16 (256) bytes when the data size is
>fixed and kn
On Mon, Apr 14, 2025 at 8:56 PM H.J. Lu wrote:
>
> On Mon, Apr 14, 2025 at 2:39 AM Uros Bizjak wrote:
> >
> > On Mon, Apr 14, 2025 at 8:54 AM Hongtao Liu wrote:
> > >
> > > On Mon, Apr 14, 2025 at 7:36 AM H.J. Lu wrote:
> > > >
> >
1 - 100 of 1190 matches
Mail list logo