[PATCH] i386: Adjust rtx cost for imulq and imulw [PR115749]

2024-07-24 Thread Kong, Lingling
Tested spec2017 performance in Sierra Forest, Icelake, CascadeLake, at least there is no obvious regression. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. OK for trunk? gcc/ChangeLog: * config/i386/x86-tune-costs.h (struct processor_costs): Adjust rtx_cost of imulq

RE: [PATCH] i386: Change prefetchi output template

2024-07-22 Thread Kong, Lingling
> -Original Message- > From: Haochen Jiang > Sent: Monday, July 22, 2024 2:41 PM > To: gcc-patches@gcc.gnu.org > Cc: Liu, Hongtao ; ubiz...@gmail.com > Subject: [PATCH] i386: Change prefetchi output template > > Hi all, > > For prefetchi instructions, RIP-relative address is

[PATCH] x86: Don't enable APX_F in 32-bit mode.

2024-07-18 Thread Kong, Lingling
I adjusted my patch based on the comments by H.J. And I will add the testcase like gcc.target/i386/pr101395-1.c when the march for APX is determined. Ok for trunk? Thanks, Lingling gcc/ChangeLog: PR target/115978 * config/i386/driver-i386.cc (host_detect_local_cpu): Enable

RE: [PATCH] i386: Remove report error for -mapxf/-muintr with -m32

2024-07-17 Thread Kong, Lingling
On Thu, Jul 18, 2024, 10:00 AM kong lingling mailto:lingling.ko...@gmail.com>> wrote: Also add some comment for list cpuid are not supported in 32 bit. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/i386-opti

[PATCH] i386: Remove report error for -mapxf/-muintr with -m32

2024-07-17 Thread kong lingling
Also add some comment for list cpuid are not supported in 32 bit. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/i386-options.cc (ix86_option_override_internal): Remove compiler report error for -mapxf or -muintr with

[PATCH] i386: Support APX NF and NDD for imul/mul

2024-07-01 Thread kong lingling
Add some missing APX NF and NDD support for imul and mul. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: * config/i386/i386.md (*imulhizu): Added APX NF support. (*imulhizu): New define_insn. (*mulsi3_1_zext): Ditto.

RE: [PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-06-25 Thread Kong, Lingling
Hi, Gently ping for this. This version has removed the target hook and added a new optab for cfcmov. Thanks, Lingling From: Kong, Lingling Sent: Tuesday, June 18, 2024 3:41 PM To: gcc-patches@gcc.gnu.org Cc: Alexander Monakov ; Uros Bizjak ; lingling.ko...@gmail.com; Hongtao Liu ; Jeff Law

[PATCH v2 2/2] [APX CFCMOV] Support APX CFCMOV in backend

2024-06-18 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that test if the cfcmov can be generated. (ix86_expand_int_movcc): Expand to cfcmov pattern if ix86_can_cfcmov_p return ture. * config/i386/i386-opts.h (enum apx_features): Add

[PATCH v2 0/2] [APX CFCMOV] Support APX CFCMOV

2024-06-18 Thread Kong, Lingling
deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-1.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-cfcmov-2.c -- > -Original Message- > From: Hongtao Liu > Sent: Monday, June 17, 2024 11:05 AM > To: Jeff Law > Cc: Alexander Monakov ; Kong, L

[PATCH v2 1/2] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-06-18 Thread Kong, Lingling
APX CFCMOV feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault for conditional move. In middle-end, now we

[PATCH Committed][APX ZU] Fix test for target-support check

2024-06-17 Thread Kong, Lingling
Fix test for APX ZU. Add attribute for no-inline and target APX, and target-support check. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Committed as an obvious patch. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-zu-1.c: Add attribute for noinline,

[PATCH 3/3] [APX CFCMOV] Support APX CFCMOV in backend

2024-06-13 Thread Kong, Lingling
From: Lingling Kong Handle target hook TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP and support CFCMOV in backend. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_cfcmov_p): New function that test if the cfcmov can be generated. (ix86_expand_int_movcc): Expand to

[PATCH 2/3] [APX CFCMOV] Support APX CFCMOV in if_convert pass

2024-06-13 Thread Kong, Lingling
From: Lingling Kong After added target HOOK TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP, we could support a conditional move that load or store mem may trap or fault in if convert pass. Conditional move suppress fault for conditional mem store would not move any arithmetic calculations. For

[PATCH 1/3] [APX CFCMOV] Add a new target hook: TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP

2024-06-13 Thread Kong, Lingling
From: konglin1 APX CFCMOV feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault for conditional move. In middle-end,

[PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-13 Thread Kong, Lingling
APX CFCMOV[1] feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault for conditional move. In middle-end, now we

[PATCH 2/2] [APX CFCMOV] Support APX CFCMOV

2024-06-13 Thread Kong, Lingling
From: konglin1 mailto:lingling.k...@intel.com>> APX CFCMOV feature implements conditionally faulting which means that all memory faults are suppressed when the condition code evaluates to false and load or store a memory operand. Now we could load or store a memory operand may trap or fault

[PATCH 1/2] Add a new target hook: TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP

2024-06-13 Thread Kong, Lingling
From: konglin1 gcc/ChangeLog: * doc/tm.texi: Regenerated. * doc/tm.texi.in: Add TARGET_HAVE_CONDITIONAL_MOVE_MEM_NOTRAP * target.def (bool,): New hook. * targhooks.cc (default_have_conditional_move_mem_notrap): New function to hook

[PATCH] [APX ZU] Support APX zero-upper

2024-06-06 Thread Kong, Lingling
Enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc. gcc/ChangeLog: * config/i386/i386-opts.h (enum apx_features):Add apx_zu. * config/i386/i386.h (TARGET_APX_ZU): Define. * config/i386/i386.md (*imulhizu): New define_insn. (*setcc__zu): Ditto. *

[PATCH v3 6/8] [APX NF] Support APX NF for shld/shrd

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_nf): New define_insn. (x86_64_shld_ndd_nf): Ditto. (x86_64_shld_1_nf): Ditto. (x86_64_shld_ndd_1_nf): Ditto. (*x86_64_shld_shrd_1_nozext_nf): Ditto. (x86_shld_nf): Ditto. (x86_shld_ndd_nf):

[PATCH v3 7/8] [APX NF] Support APX NF for mul/div

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*mul3_1_nf): New define_insn. (*mulqi3_1_nf): Ditto. (*divmod4_noext_nf): Ditto. (divmodhiqi3_nf): Ditto. --- gcc/config/i386/i386.md | 47 ++--- 1 file changed, 30 insertions(+), 17

[PATCH v3 4/8] [APX NF] Support APX NF for right shift insns

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashr3_1_nf): New. (*lshr3_1_nf): Ditto. (*lshrqi3_1_nf): Ditto. (*lshrhi3_1_nf): Ditto. --- gcc/config/i386/i386.md | 82 +++-- 1 file changed, 46 insertions(+), 36 deletions(-) diff --git

[PATCH v3 8/8] [APX NF] Support APX NF for lzcnt/tzcnt/popcnt

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (clz2_lzcnt_nf): New define_insn. (*clz2_lzcnt_falsedep_nf): Ditto. (__nf): Ditto. (*__falsedep_nf): Ditto. (_hi_nf): Ditto. (popcount2_nf): Ditto. (*popcount2_falsedep_nf): Ditto.

[PATCH v3 5/8] [APX NF] Support APX NF for rotate insns

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (ashr3_cvt_nf): New define_insn. (*3_1_nf): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-nf.c: Add NF test for rotate insns. --- gcc/config/i386/i386.md| 59 +-

[PATCH v3 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (nf_nonf_attr): New subst_attr. (nf_nonf_x64_attr): Ditto. (*sub_1_nf): New define_insn. (*anddi_1_nf): Ditto. (*and_1_nf): Ditto. (*qi_1_nf): Ditto. (*_1_nf): Ditto. (*neg_1_nf): Ditto. *

[PATCH v3 3/8] [APX NF] Support APX NF for left shift insns

2024-05-28 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashl3_1_nf): New. (*ashlhi3_1_nf): Ditto. (*ashlqi3_1_nf): Ditto. * config/i386/sse.md: New define_split. --- gcc/config/i386/i386.md | 96 ++--- gcc/config/i386/sse.md | 13 ++ 2

[PATCH v3 1/8] [APX NF]: Support APX NF add

2024-05-28 Thread Kong, Lingling
Hi, compared with v2, these patches restored the original lea patten position and addressed hongtao's comment. APX NF(no flags) feature implements suppresses the update of status flags for arithmetic operations. For NF add, it is not clear whether nf add can be faster than lea. If so, the

RE: [PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-22 Thread Kong, Lingling
Cc Uros. From: Kong, Lingling Sent: Wednesday, May 22, 2024 4:35 PM To: gcc-patches@gcc.gnu.org Cc: Liu, Hongtao ; Kong, Lingling Subject: [PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg} gcc/ChangeLog: * config/i386/i386.md (nf_and_applied): New subst_attr

[PATCH v2 8/8] [APX NF] Support APX NF for lzcnt/tzcnt/popcnt

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (clz2_lzcnt_nf): New define_insn. (*clz2_lzcnt_falsedep_nf): Ditto. (__nf): Ditto. (*__falsedep_nf): Ditto. (_hi_nf): Ditto. (popcount2_nf): Ditto. (*popcount2_falsedep_nf): Ditto.

[PATCH v2 7/8] [APX NF] Support APX NF for mul/div

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*mul3_1_nf): New define_insn. (*mulqi3_1_nf): Ditto. (*divmod4_noext_nf): Ditto. (divmodhiqi3_nf): Ditto. --- gcc/config/i386/i386.md | 47 ++--- 1 file changed, 30 insertions(+), 17

[PATCH v2 6/8] [APX NF] Support APX NF for shld/shrd

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_nf): New define_insn. (x86_64_shld_ndd_nf): Ditto. (x86_64_shld_1_nf): Ditto. (x86_64_shld_ndd_1_nf): Ditto. (*x86_64_shld_shrd_1_nozext_nf): Ditto. (x86_shld_nf): Ditto. (x86_shld_ndd_nf):

[PATCH v2 5/8] [APX NF] Support APX NF for rotate insns

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (ashr3_cvt_nf): New define_insn. (*3_1_nf): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-nf.c: Add NF test for rotate insns. --- gcc/config/i386/i386.md| 53 --

[PATCH v2 4/8] [APX NF] Support APX NF for right shift insns

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashr3_1_nf): New. (*lshr3_1_nf): Ditto. (*lshrqi3_1_nf): Ditto. (*lshrhi3_1_nf): Ditto. --- gcc/config/i386/i386.md | 82 +++-- 1 file changed, 46 insertions(+), 36 deletions(-) diff --git

[PATCH v2 3/8] [APX NF] Support APX NF for left shift insns

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashl3_1_nf): New. (*ashlhi3_1_nf): Ditto. (*ashlqi3_1_nf): Ditto. * config/i386/sse.md: New define_split. --- gcc/config/i386/i386.md | 80 +++-- gcc/config/i386/sse.md | 13 +++ 2

[PATCH v2 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-22 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (nf_and_applied): New subst_attr. (nf_x64_and_applied): Ditto. (*sub_1_nf): New define_insn. (*anddi_1_nf): Ditto. (*and_1_nf): Ditto. (*qi_1_nf): Ditto.

[PATCH v2 1/8] [APX NF]: Support APX NF add

2024-05-22 Thread Kong, Lingling
> I wonder if we can use "define_subst" to conditionally add flags clobber > for !TARGET_APX_NF targets. Even the example for "Define Subst" uses the insn > w/ and w/o the clobber, so I think it is worth considering this approach. > > Uros. Good Suggestion, I defined new subst for no flags, and

RE: [PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Kong, Lingling
> -Original Message- > From: Uros Bizjak > Sent: Wednesday, May 15, 2024 4:15 PM > To: Kong, Lingling > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Wang, > Hongyu > Subject: Re: [PATCH 1/8] [APX NF]: Support APX NF add > > On Wed, May 15, 2024 at 9:43

[PATCH 8/8] [APX NF] Support APX NF for lzcnt/tzcnt/popcnt

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (clz2_lzcnt_nf): New define_insn. (*clz2_lzcnt_falsedep_nf): Ditto. (__nf): Ditto. (*__falsedep_nf): Ditto. (_hi_nf): Ditto. (popcount2_nf): Ditto. (*popcount2_falsedep_nf): Ditto.

[PATCH 6/8] [APX NF] Support APX NF for shld/shrd

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_nf): New define_insn. (x86_64_shld_ndd_nf): Ditto. (x86_64_shld_1_nf): Ditto. (x86_64_shld_ndd_1_nf): Ditto. (*x86_64_shld_shrd_1_nozext_nf): Ditto. (x86_shld_nf): Ditto. (x86_shld_ndd_nf):

[PATCH 7/8] [APX NF] Support APX NF for mul/div

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*mul3_1_nf): New define_insn. (*mulqi3_1_nf): Ditto. (*divmod4_noext_nf): Ditto. (divmodhiqi3_nf): Ditto. --- gcc/config/i386/i386.md | 86 + 1 file changed, 86 insertions(+) diff --git

[PATCH 5/8] [APX NF] Support APX NF for rotate insns

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (ashr3_cvt_nf): New define_insn. (*3_1_nf): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-nf.c: Add NF test for rotate insns. --- gcc/config/i386/i386.md| 80 ++

[PATCH 4/8] [APX NF] Support APX NF for right shift insns

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashr3_1_nf): New. (*lshr3_1_nf): Ditto. (*lshrqi3_1_nf): Ditto. (*lshrhi3_1_nf): Ditto. --- gcc/config/i386/i386.md | 85 + 1 file changed, 85 insertions(+) diff --git

[PATCH 3/8] [APX NF] Support APX NF for left shift insns

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*ashl3_1_nf): New. (*ashlhi3_1_nf): Ditto. (*ashlqi3_1_nf): Ditto. * config/i386/sse.md: New define_split. --- gcc/config/i386/i386.md | 175 gcc/config/i386/sse.md | 13 +++ 2 files

[PATCH 2/8] [APX NF] Support APX NF for {sub/and/or/xor/neg}

2024-05-15 Thread Kong, Lingling
gcc/ChangeLog: * config/i386/i386.md (*sub_1_nf): New define_insn. (*anddi_1_nf): Ditto. (*and_1_nf): Ditto. (*qi_1_nf): Ditto. (*_1_nf): Ditto. (*neg_1_nf): Ditto. * config/i386/sse.md : New define_split. gcc/testsuite/ChangeLog:

[PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Kong, Lingling
From: Hongyu Wang APX NF(no flags) feature implements suppresses the update of status flags for arithmetic operations. For NF add, it is not clear whether NF add can be faster than lea. If so, the pattern needs to be adjusted to prefer LEA generation. gcc/ChangeLog: *

[PATCH] i386: fix ix86_hardreg_mov_ok with lra_in_progress

2024-05-06 Thread Kong, Lingling
Hi, Originally eliminate_regs_in_insn will transform (parallel [ (set (reg:QI 130) (plus:QI (subreg:QI (reg:DI 19 frame) 0) (const_int 96))) (clobber (reg:CC 17 flag))]) {*addqi_1} to (set (reg:QI 130) (subreg:QI (reg:DI 19 frame) 0)) {*movqi_internal} when verify_changes. But

[PATCH] x86: Fix cmov cost model issue [PR109549]

2024-05-05 Thread Kong, Lingling
Hi, (if_then_else:SI (eq (reg:CCZ 17 flags) (const_int 0 [0])) (reg/v:SI 101 [ e ]) (reg:SI 102)) The cost is 8 for the rtx, the cost for (eq (reg:CCZ 17 flags) (const_int 0 [0])) is 4, but this is just an operator do not need to compute it's cost in cmov. Bootstrapped and

RE: [PATCH] i386: Prefer remote atomic insn for atomic_fetch{add, and, or, xor}

2022-11-07 Thread Kong, Lingling via Gcc-patches
> On Sun, Nov 6, 2022 at 2:00 PM Kong, Lingling via Gcc-patches patc...@gcc.gnu.org> wrote: > > > > Hi > > > > The patch is to add flag -mprefer-remote-atomic to control whether to > generate raoint insn for atomic operations. > > Ok for trunk? > >

[PATCH] [committed] i386: Fix typo in sse-22.c pragma

2022-11-07 Thread Kong, Lingling via Gcc-patches
gcc/testsuite/ChangeLog: * gcc.target/i386/sse-22.c: Fix typo in pragma GCC target. Pushing as obvious. Thanks, Lingling --- gcc/testsuite/gcc.target/i386/sse-22.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c

[PATCH] i386: Prefer remote atomic insn for atomic_fetch{add, and, or, xor}

2022-11-06 Thread Kong, Lingling via Gcc-patches
Hi The patch is to add flag -mprefer-remote-atomic to control whether to generate raoint insn for atomic operations. Ok for trunk? BRs, Lingling gcc/ChangeLog: * config/i386/i386.opt:Add -mprefer-remote-atomic. * config/i386/sync.md (atomic_): New define_expand.

[PATCH] Support Intel RAO-INT

2022-11-06 Thread Kong, Lingling via Gcc-patches
Hi, The patches aimed to add Intel RAO-INT. The information is based on newly released Intel Architecture Instruction Set Extensions and Future Features. The document comes following:

RE: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-11-03 Thread Kong, Lingling via Gcc-patches
> > > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html > > > index 7c6bfa6e..cd0282f1 100644 > > > --- a/htdocs/gcc-13/changes.html > > > +++ b/htdocs/gcc-13/changes.html > > > @@ -230,6 +230,8 @@ a work-in-progress. > > >For both C and C++ the __bf16 type is supported on >

RE: [wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-11-01 Thread Kong, Lingling via Gcc-patches
> > diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html > > index 7c6bfa6e..cd0282f1 100644 > > --- a/htdocs/gcc-13/changes.html > > +++ b/htdocs/gcc-13/changes.html > > @@ -230,6 +230,8 @@ a work-in-progress. > >For both C and C++ the __bf16 type is supported on > >

[wwwdocs] [GCC13] Mention Intel __bf16 support in AVX512BF16 intrinsics.

2022-10-31 Thread Kong, Lingling via Gcc-patches
Hi The patch is for mention Intel __bf16 support in AVX512BF16 intrinsics. Ok for master ? Thanks, Lingling --- htdocs/gcc-13/changes.html | 2 ++ 1 file changed, 2 insertions(+) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 7c6bfa6e..cd0282f1 100644 ---

RE: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-28 Thread Kong, Lingling via Gcc-patches
ctober 25, 2022 1:23 PM > To: Kong, Lingling > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org; Jiang, > Haochen > Subject: Re: [PATCH 4/6] Support Intel AVX-NE-CONVERT > > On Mon, Oct 24, 2022 at 2:20 PM Kong, Lingling > wrote: > > > > > From: Gcc-patches > >

[PATCH] i386: using __bf16 for AVX512BF16 intrinsics

2022-10-28 Thread Kong, Lingling via Gcc-patches
Hi, Previously we use unsigned short to represent bf16. It's not a good expression, and at the time the front end didn't support bf16 type. Now we introduced __bf16 to X86 psABI. So we can switch intrinsics to the new type. Ok for trunk ? Thanks, Lingling gcc/ChangeLog: *

RE: [PATCH 4/6] Support Intel AVX-NE-CONVERT

2022-10-24 Thread Kong, Lingling via Gcc-patches
en Jiang via Gcc-patches > wrote: > > > > From: Kong Lingling > > +(define_insn "vbcstne2ps_" > > + [(set (match_operand:VF1_128_256 0 "register_operand" "=x") > > +(vec_duplicate:VF1_128_256 > > + (unspec:SF > > +

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-20 Thread Kong, Lingling via Gcc-patches
.. > > else if (tree_fits_uhwi_p (niter) > > ... bitwise induction case...) > > ... > > > Yes, I fixed it in new patch. Thanks. > Ok for master ? > > Thanks, > Lingling > > > -Original Message- > > From: Richard Biener > >

RE: [PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]

2022-09-16 Thread Kong, Lingling via Gcc-patches
Thanks again for take a look. OK for master ? Thanks, Lingling > -Original Message- > From: Hongtao Liu > Sent: Thursday, September 15, 2022 11:46 AM > To: Kong, Lingling > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao > Subject: Re: [PATCH] i386: Fixed vec_init_dup_v16

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-15 Thread Kong, Lingling via Gcc-patches
patch. Thanks. Ok for master ? Thanks, Lingling > -Original Message- > From: Richard Biener > Sent: Wednesday, September 14, 2022 4:16 PM > To: Kong, Lingling > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao > Subject: Re: [PATCH] Enhance final_value_replacement_loop to

[PATCH] i386: Fixed vec_init_dup_v16bf [PR106887]

2022-09-14 Thread Kong, Lingling via Gcc-patches
Hi The patch is to fix vec_init_dup_v16bf, add correct handle for v16bf mode in ix86_expand_vector_init_duplicate. Add testcase with sse2 without avx2. OK for master? gcc/ChangeLog: PR target/106887 * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-09-13 Thread Kong, Lingling via Gcc-patches
((bitinv_def > > please use else if here Sorry, If use the else if here, there is no corresponding above if. I'm not sure if you mean change bitwise induction expression if to else if. Do you agree with these changes? Thanks again for taking a look. Thanks, Lingling > -Original Mess

RE: [PATCH] x86: Handle V8BF in expand_vec_perm_broadcast_1

2022-09-02 Thread Kong, Lingling via Gcc-patches
Hi, I fixed it in a new patch. And added BF vector mode in SUBST_V and avx512fmaskhalfmode for @vec_interleave_high. Ok for trunk ? > > Hi, > > > > Handle E_V8BFmode in expand_vec_perm_broadcast_1 and > ix86_expand_vector_init_duplicate. > > Ok for trunk? > > > > gcc/ChangeLog: > > > >

RE: [PATCH] middle-end: Add MULT_EXPR recognition for cond scalar reduction

2022-08-31 Thread Kong, Lingling via Gcc-patches
Hi Richard, could you help to have a look for the patch ? Ok for master ? > Hi, > > The conditional mult reduction cannot be recognized with current GCC. The > following loop cannot be vectorized. > Now add MULT_EXPR recognition for conditional scalar reduction. > > float summa(int n, float

[PATCH] x86: Handle V8BF in expand_vec_perm_broadcast_1

2022-08-31 Thread Kong, Lingling via Gcc-patches
Hi, Handle E_V8BFmode in expand_vec_perm_broadcast_1 and ix86_expand_vector_init_duplicate. Ok for trunk? gcc/ChangeLog: PR target/106742 * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate): Handle V8BF mode. (expand_vec_perm_broadcast_1): Ditto.

[PATCH] middle-end: Add MULT_EXPR recognition for cond scalar reduction

2022-08-25 Thread Kong, Lingling via Gcc-patches
Hi, The conditional mult reduction cannot be recognized with current GCC. The following loop cannot be vectorized. Now add MULT_EXPR recognition for conditional scalar reduction. float summa(int n, float *arg1, float *arg2) { int i;

RE: [PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-08-22 Thread Kong, Lingling via Gcc-patches
Hi Richard, could you help to have a look for the patch ? > Hi, > > This patch is for pr105735/pr101991. It will enable below optimization: > { > - long unsigned int bit; > - > - [local count: 32534376]: > - > - [local count: 1041207449]: > - # tmp_10 = PHI > - # bit_12 = PHI > -

[wwwdocs] [GCC13] Mention Intel __bf16 support.

2022-08-18 Thread Kong, Lingling via Gcc-patches
Hi The patch is for mention Intel __bf16 support in gcc13. Ok for master ? Thanks, Lingling htdocs/gcc-13/changes.html | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html index 57bd8724..7d98329c 100644 ---

[PATCH] Enhance final_value_replacement_loop to handle bitop with an invariant induction.[PR105735]

2022-08-18 Thread Kong, Lingling via Gcc-patches
Hi, This patch is for pr105735/pr101991. It will enable below optimization: { - long unsigned int bit; - - [local count: 32534376]: - - [local count: 1041207449]: - # tmp_10 = PHI - # bit_12 = PHI - tmp_7 = bit2_6(D) & tmp_10; - bit_8 = bit_12 + 1; - if (bit_8 != 32) -goto ;

[PATCH] x86: Support vector __bf16 type.

2022-08-16 Thread Kong, Lingling via Gcc-patches
Hi, The patch is support vector init/broadcast/set/extract for __bf16 type. The __bf16 type is a storage type. OK for master? gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle vector BFmode. (ix86_expand_vector_init_duplicate): Support vector

RE: [PATCH] x86: Enable __bf16 type for TARGET_SSE2 and above

2022-08-03 Thread Kong, Lingling via Gcc-patches
Hi, Old patch has some mistake in `*movbf_internal` , now disable BFmode constant double move in `*movbf_internal`. Thanks, Lingling > -Original Message- > From: Kong, Lingling > Sent: Tuesday, July 26, 2022 9:31 AM > To: Liu, Hongtao ; gcc-patches@gcc.gnu.org > Cc:

[PATCH] x86: Enable __bf16 type for TARGET_SSE2 and above

2022-07-25 Thread Kong, Lingling via Gcc-patches
Hi, The patch is enable __bf16 scalar type for target sse2 and above according to psABI(https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/35/diffs). The __bf16 type is a storage type like arm. OK for master? gcc/ChangeLog: * config/i386/i386-builtin-types.def (BFLOAT16): New

[PATCH] i386: Fix _mm_[u]comixx_{ss,sd} codegen and add PF result. [PR106113]

2022-07-14 Thread Kong, Lingling via Gcc-patches
Hi, The patch is to fix _mm_[u]comixx_{ss,sd} codegen and add PF result. These intrinsics have changed over time, like `_mm_comieq_ss ` old operation is `RETURN ( a[31:0] == b[31:0] ) ? 1 : 0`, and new operation update is `RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ?

RE: [PATCH] MAINTAINERS: Add myself for write after approval

2022-06-27 Thread Kong, Lingling via Gcc-patches
Matt Kraai -- 2.18.2 > -Original Message- > From: Hongyu Wang > Sent: Monday, June 27, 2022 4:32 PM > To: Kong, Lingling > Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH] MAINT

[PATCH] MAINTAINERS: Add myself for write after approval

2022-06-27 Thread Kong, Lingling via Gcc-patches
Hi, I want to add myself in MAINTANINER for write after approval. OK for master? ChangeLog: * MAINTAINERS (Write After Approval): Add myself. --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 54d8ad41a6f..49627e5d113 100644 ---

[PATCH] i386: Enable intrinsics that convert float and bf16 data to each other.

2021-12-21 Thread Kong, Lingling via Gcc-patches
Hi, This patch is to enable intrinsics that convert float and bf16 data to each other. Ok for master? gcc/ChangeLog: * config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Add new intrinsic. (_mm512_cvtpbh_ps): Likewise. (_mm512_maskz_cvtpbh_ps): Likewise.

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Kong, Lingling via Gcc-patches
OK, This is the patch I prepare to check in. -Original Message- From: Uros Bizjak Sent: Wednesday, November 24, 2021 4:49 PM To: Kong, Lingling Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode

[PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Kong, Lingling via Gcc-patches
Hi, vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. Cleared before conversion, updated movhi_internal and ix86_can_change_mode_class. And fixed some commit message. OK for master?

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-24 Thread Kong, Lingling via Gcc-patches
insn can optimize scalar load to a vector. Thanks, Lingling -Original Message- From: Uros Bizjak Sent: Wednesday, November 24, 2021 3:57 PM To: Kong, Lingling Cc: Liu, Hongtao ; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Floa

RE: [PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-23 Thread Kong, Lingling via Gcc-patches
Hi, vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. And cleared before conversion, updated movhi_internal and ix86_can_change_mode_class. OK for master? gcc/ChangeLog: PR target/102811

[PATCH] i386: add alias for f*mul_*ch intrinsics

2021-11-16 Thread Kong, Lingling via Gcc-patches
Hi, This patch is to add alias for f*mul_*ch intrinsics. Ok for master? gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_mul_pch): Add alias for _mm512_fmul_pch. (_mm512_mask_mul_pch): Likewise. (_mm512_maskz_mul_pch): Likewise. (_mm512_mul_round_pch):

[PATCH] i386: vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c [PR 102811]

2021-11-16 Thread Kong, Lingling via Gcc-patches
Hi, vcvtph2ps and vcvtps2ph should be used to convert _Float16 to SFmode with -mf16c. So added define_insn extendhfsf2 and truncsfhf2 for target_f16c. OK for master? gcc/ChangeLog: PR target/102811 * config/i386/i386.md (extendhfsf2): Add extenndhfsf2 for f16c.

[PATCH] i386: Optimization for mm512_set1_pch.

2021-11-05 Thread Kong, Lingling via Gcc-patches
Hi, This patch is to support fold _mm512_fmadd_pch (a, _mm512_set1_pch(*(b)), c) to 1 instruction vfmaddcph (%rsp){1to16}, %zmm1, %zmm2. OK for master? gcc/ChangeLog: * config/i386/sse.md (fma___pair): Add new define_insn. (fma__fmaddc_bcst): Add new

[PATCH] i386: Support complex fma/conj_fma for _Float16.

2021-11-05 Thread Kong, Lingling via Gcc-patches
Hi, This patch is to support cmla_optab, cmul_optab, cmla_conj_optab, cmul_conj_optab for vector _Float16. Ok for master? gcc/ChangeLog: * config/i386/sse.md (cmul3): add new define_expand. (cmla4): Likewise gcc/testsuite/ChangeLog: *

[PATCH] i386: Combine the FADD(A, FMA(B, C, 0)) to FMA(B, C, A) and combine FADD(A, FMUL(B, C)) to FMA(B, C, A).

2021-10-21 Thread Kong, Lingling via Gcc-patches
Hi, This patch is to support transform in fast-math something like _mm512_add_ph(x1, _mm512_fmadd_pch(a, b, _mm512_setzero_ph())) to _mm512_fmadd_pch(a, b, x1). And support transform _mm512_add_ph(x1, _mm512_fmul_pch(a, b)) to _mm512_fmadd_pch(a, b, x1). Ok for master? gcc/ChangeLog:

[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-26 Thread Kong, Lingling via Gcc-patches
Hi, For avx512f_scattersi, mask operand only affect set src, we need to refine the pattern to let gcc know mask register also affect the dest. So we put mask operand into UNSPEC_VSIBADDR. Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. Ok for master? gcc/ChangeLog:

[PATCH] i386: Fix wrong optimization for consecutive masked scatters [PR 101472]

2021-08-25 Thread Kong, Lingling via Gcc-patches
Hi, For avx512f_scattersi, mask operand only affect set src, we need to refine the pattern to let gcc know mask register also affect the dest. So we put mask operand into UNSPEC_VSIBADDR. Bootstrapped and regression tested on x86_64-linux-gnu{-m32,-m64}. Ok for master? gcc/ChangeLog:

[PATCH] i386: Fix _mm512_fpclass_ps_mask in O0 [PR 101471]

2021-08-25 Thread Kong, Lingling via Gcc-patches
Hi, For _mm512_fpclass_ps_mask in O0, mask should be (__mmask16)-1 instead of (__mmask8)-1). Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. Ok for master? gcc/ChangeLog: * gcc/config/i386/avx512dqintrin.h : fix _mm512_fpclass_ps_mask define in O0 gcc/testsuite/ChangeLog: *