Re: [PATCH] x86: Extend AVX512 Vectorization for Popcount in Various Modes

2024-09-25 Thread Hongtao Liu
On Tue, Sep 24, 2024 at 10:16 AM Levy Hsu wrote: > > This patch enables vectorization of the popcount operation for V2QI, V4QI, > V8QI, V2HI, V4HI, and V2SI modes. Ok. > > gcc/ChangeLog: > > * config/i386/mmx.md: > (VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte >

Re: [PATCH] i386, v2: Add GENERIC and GIMPLE folders of __builtin_ia32_{min,max}* [PR116738]

2024-09-25 Thread Hongtao Liu
On Wed, Sep 25, 2024 at 4:42 PM Jakub Jelinek wrote: > > On Wed, Sep 25, 2024 at 10:17:50AM +0800, Hongtao Liu wrote: > > > + for (int i = 0; i < 2; ++i) > > > + { > > > + unsigned count = vector_cst_encoded_nelts (args[i]),

Re: [PATCH] x86/{,V}AES: adjust when to force EVEX encoding

2024-09-25 Thread Hongtao Liu
On Wed, Sep 25, 2024 at 3:55 PM Jan Beulich wrote: > > On 25.09.2024 09:38, Hongtao Liu wrote: > > On Wed, Sep 25, 2024 at 2:56 PM Jan Beulich wrote: > >> > >> Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly > >> sa

Re: [PATCH] x86/{,V}AES: adjust when to force EVEX encoding

2024-09-25 Thread Hongtao Liu
On Wed, Sep 25, 2024 at 2:56 PM Jan Beulich wrote: > > Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly > said "..., but we need to emit {evex} prefix in the assembly if AES ISA > is not enabled". Yet it did so only for the TARGET_AES insns. Going from > the alternative cho

Re: [PATCH] i386: Add GENERIC and GIMPLE folders of __builtin_ia32_{min,max}* [PR116738]

2024-09-24 Thread Hongtao Liu
On Wed, Sep 25, 2024 at 1:07 AM Jakub Jelinek wrote: > > Hi! > > The following patch adds GENERIC and GIMPLE folders for various > x86 min/max builtins. > As discussed, these builtins have effectively x < y ? x : y > (or x > y ? x : y) behavior. > The GENERIC folding is done if all the (relevant)

Re: [PATCH] [x86] Define VECTOR_STORE_FLAG_VALUE

2024-09-24 Thread Hongtao Liu
On Tue, Sep 24, 2024 at 5:46 PM Uros Bizjak wrote: > > On Tue, Sep 24, 2024 at 11:23 AM liuhongt wrote: > > > > Return constm1_rtx when GET_MODE_CLASS (MODE) == MODE_VECTOR_INT. > > Otherwise NULL_RTX. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ready push to trunk. > >

Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-23 Thread Hongtao Liu
On Thu, Sep 19, 2024 at 2:08 PM Richard Biener wrote: > > On Wed, Sep 18, 2024 at 7:55 PM Richard Sandiford > wrote: > > > > Richard Biener writes: > > > On Thu, Sep 12, 2024 at 4:50 PM Hongtao Liu wrote: > > >> > > >> On Wed, Sep 11, 20

Re: [PATCH] doc: Add more alias option and reorder Intel CPU -march documentation

2024-09-18 Thread Hongtao Liu
On Wed, Sep 18, 2024 at 1:35 PM Haochen Jiang wrote: > > Hi all, > > Since r15-3539, there are requests coming in to add other alias option > documentation. This patch will add all ot them, including corei7, corei7-avx, > core-avx-i, core-avx2, atom, slm, gracemont and emerarldrapids. > > Also in

Re: [PATCH] i386: Add missing avx512f-mask-type.h include

2024-09-18 Thread Hongtao Liu
On Wed, Sep 18, 2024 at 1:40 PM Haochen Jiang wrote: > > Hi all, > > Since commit r15-3594, we fixed the bugs in MASK_TYPE for AVX10.2 > testcases, but we missed the following four. > > The tests are not FAIL since the binutils part haven't been merged > yet, which leads to UNSUPPORTED test. But t

Re: [PATCH] i386: Enhance AVX10.2 convert tests

2024-09-18 Thread Hongtao Liu
On Wed, Sep 18, 2024 at 1:42 PM Haochen Jiang wrote: > > Hi all, > > For AVX10.2 convert tests, all of them are missing mask tests > previously, this patch will add them in the tests. > > Tested on sde with assembler with these insts. Ok for trunk? Ok. > > Thx, > Haochen > > gcc/testsuite/ChangeLo

Re: [PATCH] i386: Add ssemov2, sseicvt2 for some load instructions that use memory on operand2

2024-09-18 Thread Hongtao Liu
On Thu, Sep 19, 2024 at 9:34 AM Hu, Lin1 wrote: > > Hi, all > > The memory attr of some instructions should be 'load', but these is 'none' > currently. > > This patch add two new types ssemov2, sseicvt2 for some load instructions that > use memory on operands. So their memory attr will be 'load'.

Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-12 Thread Hongtao Liu
On Wed, Sep 11, 2024 at 4:21 PM Hongtao Liu wrote: > > On Wed, Sep 11, 2024 at 4:04 PM Richard Biener > wrote: > > > > On Wed, Sep 11, 2024 at 4:17 AM liuhongt wrote: > > > > > > GCC12 enables vectorization for O2 with very cheap cost model which is >

Re: [PATCH v2] Enable V2BF/V4BF vec_cmp with AVX10.2 vcmppbf16

2024-09-11 Thread Hongtao Liu
On Thu, Sep 12, 2024 at 9:55 AM Levy Hsu wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? Ok. > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_get_mask_mode): > Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2. > * config/

Re: [RFC PATCH] Enable vectorization for unknown tripcount in very cheap cost model but disable epilog vectorization.

2024-09-11 Thread Hongtao Liu
On Wed, Sep 11, 2024 at 4:04 PM Richard Biener wrote: > > On Wed, Sep 11, 2024 at 4:17 AM liuhongt wrote: > > > > GCC12 enables vectorization for O2 with very cheap cost model which is > > restricted > > to constant tripcount. The vectorization capacity is very limited w/ > > consideration > >

Re: [PATCH] i386: Fix incorrect avx512f-mask-type.h include

2024-09-10 Thread Hongtao Liu
On Thu, Sep 5, 2024 at 10:05 AM Haochen Jiang wrote: > > Hi all, > > In avx512f-mask-type.h, we need SIZE being defined to get > MASK_TYPE defined correctly. Fix those testcases where > SIZE are not defined before the include for avv512f-mask-type.h. > > Note that for convert intrins in AVX10.2, t

Re: [PATCH] x86: Refine V4BF/V2BF FMA Testcase

2024-09-10 Thread Hongtao Liu
On Tue, Sep 10, 2024 at 3:35 PM Levy Hsu wrote: > > Simple testcase fix, ok for trunk? Ok. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit > scan > and removed register checks in spill situations. > --- > .../i386/avx10_2-par

Re: [PATCH] x86: Refine V4BF/V2BF FMA testcase

2024-09-05 Thread Hongtao Liu
On Fri, Sep 6, 2024 at 10:34 AM Jiang, Haochen wrote: > > > From: Levy Hsu > > Sent: Thursday, September 5, 2024 4:55 PM > > To: gcc-patches@gcc.gnu.org > > > > Simple testcase fix, ok for trunk? > > > > This patch removes specific register checks to account for possible > > register spills and d

Re: [PATCH] i386: Integrate BFmode for Enhanced Vectorization in ix86_preferred_simd_mode

2024-09-04 Thread Hongtao Liu
On Wed, Sep 4, 2024 at 9:32 AM Levy Hsu wrote: > > Hi > > This change adds BFmode support to the ix86_preferred_simd_mode function > enhancing SIMD vectorization for BF16 operations. The update ensures > optimized usage of SIMD capabilities improving performance and aligning > vector sizes with pr

Re: [PATCH] i386: Support partial signbit/xorsign/copysign/abs/neg/and/xor/ior/andn for V2BF/V4BF

2024-09-04 Thread Hongtao Liu
On Wed, Sep 4, 2024 at 10:53 AM Levy Hsu wrote: > > Hi > > This patch adds support for bf16 operations in V2BF and V4BF modes on i386, > handling signbit, xorsign, copysign, abs, neg, and various logical operations. > > Bootstrapped and tested on x86-64-pc-linux-gnu. > Ok for trunk? Ok. > > gcc/Ch

Re: [PATCH] i386: Support partial vectorized FMA for V2BF/V4BF

2024-09-04 Thread Hongtao Liu
On Wed, Sep 4, 2024 at 11:31 AM Levy Hsu wrote: > > Hi > > Bootstrapped and tested on x86-64-pc-linux-gnu. > Ok for trunk? Ok. > > This patch introduces support for vectorized FMA operations for bf16 types in > V2BF and V4BF modes on the i386 architecture. New mode iterators and > define_expand en

Re: [PATCH] i386: Fix vfpclassph non-optimizied intrin

2024-09-03 Thread Hongtao Liu
On Tue, Sep 3, 2024 at 2:24 PM Haochen Jiang wrote: > > Hi all, > > The intrin for non-optimized got a typo in mask type, which will cause > the high bits of __mmask32 being unexpectedly zeroed. > > The test does not fail under O0 with current 1b since the testcase is > wrong. We need to include a

Re: [r15-3359 Regression] FAIL: gcc.target/i386/avx10_2-bf-vector-cmpp-1.c (test for excess errors) on Linux/x86_64

2024-09-02 Thread Hongtao Liu
On Tue, Sep 3, 2024 at 9:45 AM Jiang, Haochen via Gcc-regression wrote: > > As each AVX10.2 testcases previously, this is caused by option combination > warning, > which is expected. > Can we put the warning for mix usage of mavx10 and -mavx512f under -Wpsabi And add -Wno-psabi in addition to -ma

Re: [PATCH] i386: Support partial vectorized V2BF/V4BF smaxmin

2024-09-02 Thread Hongtao Liu
On Mon, Sep 2, 2024 at 4:42 PM Levy Hsu wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? Ok. > > This patch supports sminmax for partial vectorized V2BF/V4BF. > > gcc/ChangeLog: > > * config/i386/mmx.md (3): New define_expand for > V2BF/V4BFsmaxmin > >

Re: [PATCH] i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt

2024-09-02 Thread Hongtao Liu
On Mon, Sep 2, 2024 at 4:33 PM Levy Hsu wrote: > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > This patch introduces new mode iterators and expands for the i386 > architecture to support partial vectorization of bf16 operations using > AVX10.2 instructions. Thes

Re: [PATCH 0/8] i386: Opmitize code with AVX10.2 new instructions

2024-09-01 Thread Hongtao Liu
On Mon, Aug 26, 2024 at 2:43 PM Haochen Jiang wrote: > > Hi all, > > I have just commited AVX10.2 new instructions patches into trunk hours > ago. The next and final part for AVX10.2 upstream is to optimize code > with AVX10.2 new instructions. > > In this patch series, it will contain the followi

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-25 Thread Hongtao Liu
On Fri, Aug 23, 2024 at 5:46 PM HAO CHEN GUI wrote: > > Hi Hongtao, > > 在 2024/8/23 11:47, Hongtao Liu 写道: > > On Fri, Aug 23, 2024 at 11:03 AM HAO CHEN GUI wrote: > >> > >> Hi Hongtao, > >> > >> 在 2024/8/23 9:47, Hongtao Liu 写道: > >&

Re: [PATCH 00/12] AVX10.2: Support new instructions

2024-08-25 Thread Hongtao Liu
On Mon, Aug 19, 2024 at 4:57 PM Haochen Jiang wrote: > > Hi all, > > The AVX10.2 ymm rounding patches has been merged to trunk around > 6 hours ago. As mentioned before, next step will be AVX10.2 new > instruction support. > > This patch series could be divided into three part. > > The first patch

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-22 Thread Hongtao Liu
On Fri, Aug 23, 2024 at 11:03 AM HAO CHEN GUI wrote: > > Hi Hongtao, > > 在 2024/8/23 9:47, Hongtao Liu 写道: > > On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI wrote: > >> > >> Hi Hongtao, > >> > >> 在 2024/8/21 11:21, Hongtao Liu 写道: > >>

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-22 Thread Hongtao Liu
On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI wrote: > > Hi Hongtao, > > 在 2024/8/21 11:21, Hongtao Liu 写道: > > r15-3058-gbb42c551905024 support const0 operand for movv16qi, please > > rebase your patch and see if there's still the regressions. > > There

Re: [PATCH] Align ix86_{move_max,store_max} with vectorizer.

2024-08-21 Thread Hongtao Liu
On Wed, Aug 21, 2024 at 4:49 PM Richard Biener wrote: > > On Wed, Aug 21, 2024 at 7:40 AM liuhongt wrote: > > > > When none of mprefer-vector-width, avx256_optimal/avx128_optimal, > > avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will > > set ix86_{move_max,store_max} as max ava

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-20 Thread Hongtao Liu
On Tue, Aug 20, 2024 at 2:50 PM Hongtao Liu wrote: > > On Tue, Aug 20, 2024 at 2:12 PM HAO CHEN GUI wrote: > > > > Hi, > > Add Hongtao Liu as the patch affects x86. > > > > 在 2024/8/20 6:32, Richard Sandiford 写道: > > > HAO CHEN GUI writes: &g

Re: [PATCH] Align predicates for operands[1] between mov and *mov_internal.

2024-08-20 Thread Hongtao Liu
On Tue, Aug 20, 2024 at 6:25 PM liuhongt wrote: > > From [1] [1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660575.html > > > It's not obvious to me why movv16qi requires a nonimmediate_operand > > > source, especially since ix86_expand_vector_mode does have code to > > > cope with con

Re: [PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs

2024-08-19 Thread Hongtao Liu
On Tue, Aug 20, 2024 at 2:12 PM HAO CHEN GUI wrote: > > Hi, > Add Hongtao Liu as the patch affects x86. > > 在 2024/8/20 6:32, Richard Sandiford 写道: > > HAO CHEN GUI writes: > >> Hi, > >> This patch adds const0 move checking for CLEAR_BY_PIECES.

Re: [PATCH 00/22] Support AVX10.2 ymm rounding

2024-08-18 Thread Hongtao Liu
On Wed, Aug 14, 2024 at 5:07 PM Haochen Jiang wrote: > > Hi all, > > The initial patch for AVX10.2 has been merged this week. > > For the upcoming patches, we will first upstream ymm rounding control part. > > In ymm rounding part, ALL the instructions in AVX512 with 512-bit rounding > control wil

Re: [PATCH v2] [x86] Movement between GENERAL_REGS and SSE_REGS for TImode doesn't need secondary reload.

2024-08-15 Thread Hongtao Liu
On Thu, Aug 15, 2024 at 3:27 PM liuhongt wrote: > > It results in 2 failures for x86_64-pc-linux-gnu{\ > -march=cascadelake}; > > gcc: gcc.target/i386/extendditi3-1.c scan-assembler cqt?o > gcc: gcc.target/i386/pr113560.c scan-assembler-times \tmulq 1 > > For pr113560.c, now GCC generates mulx ins

Re: [PATCH v2] i386: Fix some vex insns that prohibit egpr

2024-08-14 Thread Hongtao Liu
On Wed, Aug 14, 2024 at 4:23 PM Kong, Lingling wrote: > > > > -Original Message- > From: Kong, Lingling > Sent: Wednesday, August 14, 2024 4:20 PM > To: Kong, Lingling > Subject: [PATCH v2] i386: Fix some vex insns that prohibit egpr > > Although these vex insn have evex counterpart, but

Re: [PATCH 4/4] i386: Optimization for APX NDD is always zero-uppered for shift

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 3:12 PM kong lingling wrote: > > gcc/ChangeLog: > > > PR target/113729 > >* config/i386/i386.md (*ashlqi3_1_zext): > >New define_insn. > >(*ashlhi3_1_zext): Ditto. > >(*qi3_1_zext): Ditto. > >

Re: [PATCH 3/4] i386: Optimization for APX NDD is always zero-uppered for logic

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 3:12 PM kong lingling wrote: > > gcc/ChangeLog: > > >PR target/113729 > >* config/i386/i386.md (*andqi_1_zext): > >New define_insn. > >(*andhi_1_zext): Ditto. > >(*qi_1_zext): Ditto. > >

Re: [PATCH 2/4] i386: Optimization for APX NDD is always zero-uppered for sub/adc/sbb

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 3:12 PM kong lingling wrote: > > gcc/ChangeLog: > > > >PR target/113729 > >* config/i386/i386.md (*subqi_1_zext): New > >define_insn. > >(*subhi_1_zext): Ditto. > >(*addqi3_carry_zext): Ditto. >

Re: [PATCH 1/4] i386: Optimization for APX NDD is always zero-uppered for ADD

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 3:10 PM kong lingling wrote: > > For APX instruction with an NDD, the destination GPR will get the > instruction’s result in bits [OSIZE-1:0] and, if OSIZE < 64b, have its upper > bits [63:OSIZE] zeroed. Now supporting other NDD instructions. > > > Bootstrapped and regtes

Re: [PATCH] Move ix86_align_loops into a separate pass and insert the pass after pass_endbr_and_patchable_area.

2024-08-13 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 10:10 PM liuhongt wrote: > > > Are there any assumptions that BB_HEAD must be a note or label? > > Maybe we should move ix86_align_loops into a separate pass and insert > > the pass just before pass_final. > The patch inserts .p2align after endbr pass, it can also fix the i

Re: [PATCH 0/1] Initial support for AVX10.2

2024-08-12 Thread Hongtao Liu
On Thu, Aug 1, 2024 at 3:50 PM Haochen Jiang wrote: > > Hi all, > > AVX10.2 tech details has been just published on July 31st in the > following link: > > https://cdrdv2.intel.com/v1/dl/getContent/828965 > > For new features and instructions, we could divide them into two parts. > One is ymm round

Re: PING: [PATCH] x86: Update BB_HEAD when aligning BB_HEAD

2024-08-11 Thread Hongtao Liu
On Mon, Aug 12, 2024 at 6:59 AM H.J. Lu wrote: > > On Thu, Aug 8, 2024 at 6:53 PM H.J. Lu wrote: > > > > When we emit .p2align to align BB_HEAD, we must update BB_HEAD. Otherwise > > ENDBR will be inserted as the wrong place. > > > > gcc/ > > > > PR target/116174 > > * config/i38

Re: [PATCH] Fix mismatch between constraint and predicate for ashl3_doubleword.

2024-07-31 Thread Hongtao Liu
On Tue, Jul 30, 2024 at 11:04 AM liuhongt wrote: > > (insn 98 94 387 2 (parallel [ > (set (reg:TI 337 [ _32 ]) > (ashift:TI (reg:TI 329) > (reg:QI 521))) > (clobber (reg:CC 17 flags)) > ]) "test.c":11:13 953 {ashlti3_doubleword} >

Re: [PATCH] i386: Fix memory constraint for APX NF

2024-07-31 Thread Hongtao Liu
On Thu, Aug 1, 2024 at 10:03 AM Kong, Lingling wrote: > > > > > -Original Message- > > From: Liu, Hongtao > > Sent: Thursday, August 1, 2024 9:35 AM > > To: Kong, Lingling ; gcc-patches@gcc.gnu.org > > Cc: Wang, Hongyu > > Subject: RE: [PATCH] i386: Fix memory constraint for APX NF > > >

Re: [PATCH] i386: Mark target option with optimization when enabled with opt level [PR116065]

2024-07-31 Thread Hongtao Liu
On Tue, Jul 30, 2024 at 1:05 PM Hongyu Wang wrote: > > Richard Biener 于2024年7月26日周五 19:45写道: > > > > On Fri, Jul 26, 2024 at 10:50 AM Hongyu Wang wrote: > > > > > > Hi, > > > > > > When introducing munroll-only-small-loops, the option was marked as > > > Target Save and added to -O2 default whic

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Hongtao Liu
On Wed, Jul 31, 2024 at 3:17 PM Uros Bizjak wrote: > > On Wed, Jul 31, 2024 at 9:11 AM Hongtao Liu wrote: > > > > On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak wrote: > > > > > > On Tue, Jul 30, 2024 at 3:00 PM Richard Biener wrote: > > > > >

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-31 Thread Hongtao Liu
On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak wrote: > > On Tue, Jul 30, 2024 at 3:00 PM Richard Biener wrote: > > > > On Tue, 30 Jul 2024, Alexander Monakov wrote: > > > > > > > > On Tue, 30 Jul 2024, Richard Biener wrote: > > > > > > > > Oh, and please add a small comment why we don't use XFmode

Re: [PATCH] i386: Remove ndd support for *add_4 [PR113744]

2024-07-30 Thread Hongtao Liu
On Wed, Jul 31, 2024 at 2:08 PM Kong, Lingling wrote: > > *add_4 and *adddi_4 are for shorter opcode from cmp to inc/dec or add > $128. > > But NDD code is longer than the cmp code, so there is no need to support NDD. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for tr

Re: [PATCH v2] i386: Add non-optimize prefetchi intrins

2024-07-29 Thread Hongtao Liu
On Tue, Jul 30, 2024 at 9:27 AM Hongtao Liu wrote: > > On Fri, Jul 26, 2024 at 4:55 PM Haochen Jiang wrote: > > > > Hi all, > > > > I added related O0 testcase in this patch. > > > > Ok for trunk and backport to GCC 14 and GCC 13? > Ok. I mean for tru

Re: [PATCH v2] i386: Add non-optimize prefetchi intrins

2024-07-29 Thread Hongtao Liu
On Fri, Jul 26, 2024 at 4:55 PM Haochen Jiang wrote: > > Hi all, > > I added related O0 testcase in this patch. > > Ok for trunk and backport to GCC 14 and GCC 13? Ok. > > Thx, > Haochen > > --- > > Changes in v2: Add testcases. > > --- > > Under -O0, with the "newly" introduced intrins, the varia

Re: [PATCH] [x86]Refine constraint "Bk" to define_special_memory_constraint.

2024-07-28 Thread Hongtao Liu
On Thu, Jul 25, 2024 at 3:23 PM Hongtao Liu wrote: > > On Wed, Jul 24, 2024 at 3:57 PM liuhongt wrote: > > > > For below pattern, RA may still allocate r162 as v/k register, try to > > reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi > &g

Re: [PATCH] Fix mismatch between constraint and predicate for ashl3_doubleword.

2024-07-26 Thread Hongtao Liu
On Fri, Jul 26, 2024 at 2:59 PM liuhongt wrote: > > (insn 98 94 387 2 (parallel [ > (set (reg:TI 337 [ _32 ]) > (ashift:TI (reg:TI 329) > (reg:QI 521))) > (clobber (reg:CC 17 flags)) > ]) "test.c":11:13 953 {ashlti3_doubleword} >

Re: [PATCH Ping] i386: Use BLKmode for {ld,st}tilecfg

2024-07-25 Thread Hongtao Liu
On Fri, Jul 26, 2024 at 2:28 PM Jiang, Haochen wrote: > > Ping for this patch > > Thx, > Haochen > > > -Original Message- > > From: Haochen Jiang > > Sent: Thursday, July 18, 2024 9:45 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Liu, Hongtao ; hjl.to...@gmail.com; > > ubiz...@gmail.com > >

Re: [PATCH] [x86]Refine constraint "Bk" to define_special_memory_constraint.

2024-07-25 Thread Hongtao Liu
On Wed, Jul 24, 2024 at 3:57 PM liuhongt wrote: > > For below pattern, RA may still allocate r162 as v/k register, try to > reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi > which result a linker error. > > (set (reg:DI 162) > (mem/u/c:DI >(const:DI (unspec:DI >

Re: [PATCH] i386: Adjust rtx cost for imulq and imulw [PR115749]

2024-07-24 Thread Hongtao Liu
On Wed, Jul 24, 2024 at 3:11 PM Kong, Lingling wrote: > > Tested spec2017 performance in Sierra Forest, Icelake, CascadeLake, at least > there is no obvious regression. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > OK for trunk? Ok. > > gcc/ChangeLog: > > * config/i386

Re: [PATCH] x86: Don't enable APX_F in 32-bit mode.

2024-07-22 Thread Hongtao Liu
On Thu, Jul 18, 2024 at 5:29 PM Kong, Lingling wrote: > > I adjusted my patch based on the comments by H.J. > And I will add the testcase like gcc.target/i386/pr101395-1.c when the march > for APX is determined. > > Ok for trunk? Synced with LLVM folks, they agreed to this solution. Ok. > > Than

Re: [PATCH] i386, testsuite: Fix non-Unicode character

2024-07-16 Thread Hongtao Liu
On Mon, Jul 15, 2024 at 7:24 PM Paul-Antoine Arras wrote: > > This trivially fixes an incorrectly encoded character in the DejaGnu > scan pattern. > > OK for trunk? Ok. > -- > PA -- BR, Hongtao

Re: [PATCH] i386: extend trunc{128}2{16,32,64}'s scope.

2024-07-14 Thread Hongtao Liu
On Mon, Jul 15, 2024 at 1:39 PM Hu, Lin1 wrote: > > Hi, all > > Based on actual usage, trunc{128}2{16,32,64} use some instructions from > sse/sse3, so extend their scope to extend the scope of optimization. > > Bootstraped and regtest on x86-64-linux-gnu, OK for trunk? Ok. > > BRs, > Lin > > gcc/C

Re: [i386] adjust flag_omit_frame_pointer in a single function [PR113719] (was: Re: [PATCH] [i386] restore recompute to override opts after change [PR113719])

2024-07-14 Thread Hongtao Liu
On Thu, Jul 11, 2024 at 9:07 PM Alexandre Oliva wrote: > > On Jul 4, 2024, Alexandre Oliva wrote: > > > On Jul 3, 2024, Rainer Orth wrote: > > > Hmm, I wonder if leaf frame pointer has to do with that. > > It did, in a way. > > > > The first two patches for PR113719 have each regressed >

Re: [PATCH] [APX NF] Add a pass to convert legacy insn to NF insns

2024-07-14 Thread Hongtao Liu
On Wed, Jul 10, 2024 at 2:46 PM Hongyu Wang wrote: > > Hi, > > For APX ccmp, current infrastructure will always generate cstore for > the ccmp flag user, like > > cmpe%rcx, %r8 > ccmpnel %rax, %rbx > seta%dil > add %rcx, %r9 > add %r9, %rdx >

Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongtao Liu
On Mon, Jul 15, 2024 at 10:21 AM Hongyu Wang wrote: > > > Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b? > > We can still deal with BFmode permutation the same way as HFmode, so > the change in ix86_vectorize_vec_perm_const can be preserved. > > Hongt

Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongtao Liu
On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wang wrote: > > Hi, > > According to the instruction spec of AVX512BF16, the convert from float > to BF16 is not a simple truncation. It has special handling for > denormal/nan, even for normal float it will add an extra bias according > to the least signific

Re: [x86 SSE PATCH] Some AVX512 ternlog expansion refinements (take #2)

2024-07-11 Thread Hongtao Liu
strap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? Ok. > > > 2024-07-11 Roger Sayle > Hongtao Liu > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_broadcast_from_con

Re: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-10 Thread Hongtao Liu
On Wed, Jul 10, 2024 at 10:10 PM Victor Do Nascimento wrote: > > Following the migration of the dot_prod optab from a direct to a > conversion-type optab, ensure all back-end patterns incorporate the > second machine mode into pattern names. The patch LGTM. BTW you can use existing instead of new

Re: Support bitwise and/andnot/abs/neg/copysign/xorsign op for V8BF/V16BF/V32BF

2024-07-07 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 11:24 AM Levy Hsu wrote: > > This patch extends support for BF16 vector operations in GCC, including > bitwise AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and > V32BF modes. > Bootstrapped and tested on x86_64-linux-gnu. ok for trunk? > > gcc/ChangeLog: >

Re: [PATCH V2] x86: Update branch hint for Redwood Cove.

2024-07-07 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 9:30 AM liuhongt wrote: > > From: "H.J. Lu" > > >The above reads like it would be worth splitting branc_prediction_hits > >into branch_prediction_hints_taken and branch_prediction_hints_not_taken > >given not-taken is the default and thus will just increase code size? > >Ac

Re: [x86 SSE PATCH] Some AVX512 ternlog expansion refinements.

2024-07-07 Thread Hongtao Liu
On Sun, Jul 7, 2024 at 5:00 PM Roger Sayle wrote: > > > Hi Hongtao, > This should address concerns about the remaining use of force_reg. > 51@@ -25793,15 +25792,20 @@ ix86_expand_ternlog_binop (enum rtx_code code, machine_mode mode, 52 if (GET_MODE (op1) != mode) 53 op1 = gen_lowpart (mod

Re: [x86 SSE PATCH] PR target/115751: Avoid force_reg in ix86_expand_ternlog.

2024-07-04 Thread Hongtao Liu
On Fri, Jul 5, 2024 at 8:06 AM Hongtao Liu wrote: > > On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle wrote: > > > > > > This patch fixes a problem with splitting of complex AVX512 ternlog > > instructions on x86_64. A recent change allows the ternlog pattern > >

Re: [x86 SSE PATCH] PR target/115751: Avoid force_reg in ix86_expand_ternlog.

2024-07-04 Thread Hongtao Liu
On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle wrote: > > > This patch fixes a problem with splitting of complex AVX512 ternlog > instructions on x86_64. A recent change allows the ternlog pattern > to have multiple mem-like operands prior to reload, by emitting any > "reloads" as necessary during sp

Re: [PATCH] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

2024-07-03 Thread Hongtao Liu
On Tue, Jul 2, 2024 at 11:24 AM Hongyu Wang wrote: > > Hi, > > According to APX spec, the pushp/popp pairs should be matched, > otherwise the PPX hint cannot take effect and cause performance loss. > > In the ix86_expand_epilogue, there are several optimizations that may > cause the epilogue using

Re: [PATCH][committed] Move runtime check into a separate function and guard it with target ("no-avx")

2024-07-03 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 9:41 AM H.J. Lu wrote: > > > On Thu, Jul 4, 2024, 9:12 AM Hongtao Liu wrote: >> >> On Thu, Jul 4, 2024 at 6:17 AM H.J. Lu wrote: >> > >> > >> > On Wed, Jul 3, 2024, 9:37 PM Richard Biener >> > wrote: &

Re: [PATCH][committed] Move runtime check into a separate function and guard it with target ("no-avx")

2024-07-03 Thread Hongtao Liu
On Thu, Jul 4, 2024 at 6:17 AM H.J. Lu wrote: > > > On Wed, Jul 3, 2024, 9:37 PM Richard Biener > wrote: >> >> On Wed, Jul 3, 2024 at 9:25 AM liuhongt wrote: >> > >> > The patch can avoid SIGILL on non-AVX512 machine due to kmovd is >> > generated in dynamic check. >> > >> > Committed as an obv

Re: [PATCH] x86: Update branch hint for Redwood Cove.

2024-07-02 Thread Hongtao Liu
On Wed, Jul 3, 2024 at 2:10 AM Andi Kleen wrote: > > liuhongt writes: > > > From: "H.J. Lu" > > > > According to Intel® 64 and IA-32 Architectures Optimization Reference > > Manual[1], Branch Hint is updated for Redwood Cove. > > > > cut from [1]- > > Starting wit

Re: [PATCH] i386: Support APX NF and NDD for imul/mul

2024-07-01 Thread Hongtao Liu
On Mon, Jul 1, 2024 at 4:51 PM kong lingling wrote: > > Add some missing APX NF and NDD support for imul and mul. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > > Ok for trunk? Ok. > > > gcc/ChangeLog: > > * config/i386/i386.md (*imulhizu): Added APX > NF support.

Re: [x86 SSE PATCH] Remove legacy ternlog patterns from sse.md

2024-06-30 Thread Hongtao Liu
> > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/pr100711-6.c: Update to check for decimal > > immediate operand in ternlog, not hexadecimal. > I got an ICE when bootstrapped with --enable-checking=yes,rtl,extra > The ICE can be walked around with 2 separate define_predicates,

Re: [x86 SSE PATCH] Remove legacy ternlog patterns from sse.md

2024-06-30 Thread Hongtao Liu
On Mon, Jul 1, 2024 at 6:14 AM Roger Sayle wrote: > > > As promised here's the final ternlog clean-up, that deletes the now > obsolete legacy patterns and mode iterators from sse.md. It also updates > the surviving ternlog patterns to consistently use decimal immediate > operands (instead of hexa

Re: [testsuite PATCH] Fix -m32 gcc.target/i386/pr102464-vrndscaleph.c on RedHat.

2024-06-30 Thread Hongtao Liu
On Sun, Jun 30, 2024 at 7:29 PM Roger Sayle wrote: > > > This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c > with --target_board='unix{-m32}' on RedHat 7.x. The issue is that this > AVX512 test includes the system math.h, and on older systems this provides > inline versions o

Re: [x86 SSE PATCH] Some additional ternlog refinements.

2024-06-27 Thread Hongtao Liu
On Thu, Jun 27, 2024 at 4:29 PM Roger Sayle wrote: > > > This patch is another round of refinements to fine tune the new ternlog > infrastructure in i386's sse.md. This patch tweaks ix86_ternlog_idx > to allow multiple MEM/CONST_VECTOR/VEC_DUPLICATE operands prior to > splitting (before reload),

Re: [PATCH] i386: Refactor vcvttps2qq/vcvtqq2ps patterns.

2024-06-27 Thread Hongtao Liu
On Thu, Jun 27, 2024 at 9:23 AM Hu, Lin1 wrote: > > Hi, all > > This patch aims to refactor vcvttps2qq/vcvtqq2ps patterns for remove redundant > round_*_modev8sf_condition. > > Bootstrapped and regtested on x86-64-linux-gnu, OK for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: > > * config/

Re: [PATCH] [i386] restore recompute to override opts after change [PR113719]

2024-06-26 Thread Hongtao Liu
On Thu, Jun 13, 2024 at 3:32 PM Alexandre Oliva wrote: > > > The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on > toolchains configured to --enable-frame-pointer, because the > optimization node created within handle_optimize_attribute had > flag_omit_frame_pointer incorrectly set

Re: [PATCH] Fix wrong cost of MEM when addr is a lea.

2024-06-26 Thread Hongtao Liu
On Wed, Jun 26, 2024 at 4:02 PM Richard Biener wrote: > > On Wed, Jun 26, 2024 at 9:14 AM Hongtao Liu wrote: > > > > On Wed, Jun 26, 2024 at 2:52 PM Richard Biener > > wrote: > > > > > > On Wed, Jun 26, 2024 at 8:09 AM liuhongt wrote: > > >

Re: [PATCH] Fix wrong cost of MEM when addr is a lea.

2024-06-26 Thread Hongtao Liu
On Wed, Jun 26, 2024 at 2:52 PM Richard Biener wrote: > > On Wed, Jun 26, 2024 at 8:09 AM liuhongt wrote: > > > > 416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0. > > The commit adjust rtx_cost of mem to reduce cost of (add op0 disp). > > But Cost of ADDR could be cheaper tha

Re: [PING] [PATCH] AVX-512: Pacify -Wshift-overflow=2. [PR115409]

2024-06-22 Thread Hongtao Liu
On Sat, Jun 22, 2024 at 5:49 AM Collin Funk wrote: > > Hi Hongtao, > > I submitted a patch silencing -Wshift-overflow on a signed int > constant here: > > https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654016.html > > You OK'd it here: > > https://gcc.gnu.org/pipermail/gcc-patches/202

Re: [PATCH] Add a late-combine pass [PR106594]

2024-06-20 Thread Hongtao Liu
On Wed, Oct 25, 2023 at 2:49 AM Richard Sandiford wrote: > > This patch adds a combine pass that runs late in the pipeline. > There are two instances: one between combine and split1, and one > after postreload. > > The pass currently has a single objective: remove definitions by > substituting int

Re: [x86 PATCH] Allow all register_operand SUBREGs in x86_ternlog_idx.

2024-06-20 Thread Hongtao Liu
On Wed, Jun 19, 2024 at 5:04 AM Roger Sayle wrote: > > > This patch tweaks ix86_ternlog_idx to allow any SUBREG that matches > the register_operand predicate, and is split out as an independent > piece of a patch that I have to clean-up redundant ternlog patterns > in sse.md. It turns out that so

Re: [PATCH] x86: Emit cvtne2ps2bf16 for odd increasing perm in __builtin_shufflevector

2024-06-16 Thread Hongtao Liu
On Fri, Jun 14, 2024 at 9:35 AM Levy Hsu wrote: > > This patch updates the GCC x86 backend to efficiently handle > odd, incrementally increasing permutations of BF16 vectors > using the cvtne2ps2bf16 instruction. > It modifies ix86_vectorize_vec_perm_const to support these operations > and adds a

Re: [PATCH] i386: Refine all cvtt* instructions with UNSPEC instead of FIX/UNSIGNED_FIX.

2024-06-16 Thread Hongtao Liu
On Thu, Jun 13, 2024 at 3:13 PM Hu, Lin1 wrote: > > Hi, all > > This patch aims to refine all cvtt* instructions with UNSPEC instead of > FIX/UNSIGNED_FIX. Because the intrinsics should behave as documented. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Ok. > > BRs, > Lin >

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-16 Thread Hongtao Liu
On Fri, Jun 14, 2024 at 10:53 PM Hongtao Liu wrote: > > On Fri, Jun 14, 2024 at 6:31 PM Richard Biener wrote: > > > > The following retires vcond{,u,eq} optabs by stopping to use them > > from the middle-end. Targets instead (should) implement vcond_mask > > and

Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-16 Thread Hongtao Liu
On Sat, Jun 15, 2024 at 1:22 AM Jeff Law wrote: > > > > On 6/14/24 11:10 AM, Alexander Monakov wrote: > > > > On Fri, 14 Jun 2024, Kong, Lingling wrote: > > > >> APX CFCMOV[1] feature implements conditionally faulting which means that > >> all memory faults are suppressed > >> when the condition

Re: [PATCH] middle-end/114189 - drop uses of vcond{,u,eq}_optab

2024-06-14 Thread Hongtao Liu
On Fri, Jun 14, 2024 at 6:31 PM Richard Biener wrote: > > The following retires vcond{,u,eq} optabs by stopping to use them > from the middle-end. Targets instead (should) implement vcond_mask > and vec_cmp{,u,eq} optabs. The PR this change refers to lists > possibly affected targets - those imp

Re: [PATCH] i386: Handle target of __builtin_ia32_cmp[p|s][s|d] from avx into sse/sse2/avx

2024-06-12 Thread Hongtao Liu
On Thu, May 30, 2024 at 1:52 PM Hu, Lin1 wrote: > > Hi, all > > This patch aims to extend __builtin_ia32_cmp[p|s][s|d] from avx to > sse/sse2/avx, where its immediate is in range of [0, 7]. > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Ok. > > BRs, > Lin > > gcc/ChangeLog: >

Re: [PATCH] [APX ZU] Support APX zero-upper

2024-06-12 Thread Hongtao Liu
On Thu, Jun 6, 2024 at 4:49 PM Kong, Lingling wrote: > > Enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc. > > gcc/ChangeLog: > > * config/i386/i386-opts.h (enum apx_features):Add apx_zu. > * config/i386/i386.h (TARGET_APX_ZU): Define. > * config/i386/i386.md (*imulhizu

Re: [x86 PATCH] More use of m{32, 64}bcst addressing modes with ternlog.

2024-06-12 Thread Hongtao Liu
On Thu, Jun 13, 2024 at 4:20 AM Roger Sayle wrote: > > > This patch makes more use of m32bcst and m64bcst addressing modes in > ix86_expand_ternlog. Previously, the i386 backend would only consider > using a m32bcst if the inner mode of the vector was 32-bits, or using > m64bcst if the inner mode

Re: [PATCH] AVX-512: Pacify -Wshift-overflow=2. [PR115409]

2024-06-10 Thread Hongtao Liu
On Mon, Jun 10, 2024 at 2:37 PM Collin Funk wrote: > > A shift of 31 on a signed int is undefined behavior. Since unsigned > int is 32-bits wide this change fixes it and silences the warning. Ok. > > gcc/ChangeLog: > > PR target/115409 > * config/i386/avx512fp16intrin.h (_mm512_co

Re: [x86 PATCH] PR target/115397: AVX512 ternlog vs. -m32 -fPIC constant pool.

2024-06-10 Thread Hongtao Liu
On Mon, Jun 10, 2024 at 3:20 PM Roger Sayle wrote: > > > This patch fixes PR target/115397, a recent regression caused by my > ternlog patch that results in an ICE (building numpy) with -m32 -fPIC. > The problem is that ix86_broadcast_from_constant, which calls > get_pool_constant, doesn't handle

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v3)

2024-06-06 Thread Hongtao Liu
vpternlogd[ \\t] 694 > > > 2024-06-06 Roger Sayle > Hongtao Liu > > gcc/ChangeLog > * config/i386/i386-expand.cc (ix86_expand_args_builtin): Call > fixup_modeless_constant before testing predicates. Only call > copy_to_mode_reg on memory

Re: [PATCH] [APX] Adjust target-support check [PR 115341]

2024-06-05 Thread Hongtao Liu
On Thu, Jun 6, 2024 at 2:39 PM Hongyu Wang wrote: > > Current target apxf check does not specify sub-features that assembler > supports, so the check with older binutils will fail at assemble stage > for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check > for latest apx subfeatur

Re: [V2 PATCH] Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for vector mode.

2024-06-05 Thread Hongtao Liu
On Wed, Jun 5, 2024 at 10:44 PM Jeff Law wrote: > > > > On 6/4/24 10:22 PM, liuhongt wrote: > >> Can you add a testcase for this? I don't mind if it's x86 specific and > >> does a bit of asm scanning. > >> > >> Also note that the context for this patch has changed, so it won't > >> automatically

Re: [PATCH] Add AVX10.1 target_clones support

2024-06-02 Thread Hongtao Liu
On Wed, May 29, 2024 at 11:05 AM Haochen Jiang wrote: > > Hi all, > > Since AVX10 is the first major ISA introduced after AVX-512, we propose > to add target_clones support for it. > > Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since > it is only for priority but not for implica

  1   2   3   4   5   6   7   8   9   10   >