On Tue, Sep 24, 2024 at 10:16 AM Levy Hsu wrote:
>
> This patch enables vectorization of the popcount operation for V2QI, V4QI,
> V8QI, V2HI, V4HI, and V2SI modes.
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/mmx.md:
> (VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte
>
On Wed, Sep 25, 2024 at 4:42 PM Jakub Jelinek wrote:
>
> On Wed, Sep 25, 2024 at 10:17:50AM +0800, Hongtao Liu wrote:
> > > + for (int i = 0; i < 2; ++i)
> > > + {
> > > + unsigned count = vector_cst_encoded_nelts (args[i]),
On Wed, Sep 25, 2024 at 3:55 PM Jan Beulich wrote:
>
> On 25.09.2024 09:38, Hongtao Liu wrote:
> > On Wed, Sep 25, 2024 at 2:56 PM Jan Beulich wrote:
> >>
> >> Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly
> >> sa
On Wed, Sep 25, 2024 at 2:56 PM Jan Beulich wrote:
>
> Commit a79d13a01f8c ("i386: Fix aes/vaes patterns [PR114576]") correctly
> said "..., but we need to emit {evex} prefix in the assembly if AES ISA
> is not enabled". Yet it did so only for the TARGET_AES insns. Going from
> the alternative cho
On Wed, Sep 25, 2024 at 1:07 AM Jakub Jelinek wrote:
>
> Hi!
>
> The following patch adds GENERIC and GIMPLE folders for various
> x86 min/max builtins.
> As discussed, these builtins have effectively x < y ? x : y
> (or x > y ? x : y) behavior.
> The GENERIC folding is done if all the (relevant)
On Tue, Sep 24, 2024 at 5:46 PM Uros Bizjak wrote:
>
> On Tue, Sep 24, 2024 at 11:23 AM liuhongt wrote:
> >
> > Return constm1_rtx when GET_MODE_CLASS (MODE) == MODE_VECTOR_INT.
> > Otherwise NULL_RTX.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ready push to trunk.
> >
On Thu, Sep 19, 2024 at 2:08 PM Richard Biener
wrote:
>
> On Wed, Sep 18, 2024 at 7:55 PM Richard Sandiford
> wrote:
> >
> > Richard Biener writes:
> > > On Thu, Sep 12, 2024 at 4:50 PM Hongtao Liu wrote:
> > >>
> > >> On Wed, Sep 11, 20
On Wed, Sep 18, 2024 at 1:35 PM Haochen Jiang wrote:
>
> Hi all,
>
> Since r15-3539, there are requests coming in to add other alias option
> documentation. This patch will add all ot them, including corei7, corei7-avx,
> core-avx-i, core-avx2, atom, slm, gracemont and emerarldrapids.
>
> Also in
On Wed, Sep 18, 2024 at 1:40 PM Haochen Jiang wrote:
>
> Hi all,
>
> Since commit r15-3594, we fixed the bugs in MASK_TYPE for AVX10.2
> testcases, but we missed the following four.
>
> The tests are not FAIL since the binutils part haven't been merged
> yet, which leads to UNSUPPORTED test. But t
On Wed, Sep 18, 2024 at 1:42 PM Haochen Jiang wrote:
>
> Hi all,
>
> For AVX10.2 convert tests, all of them are missing mask tests
> previously, this patch will add them in the tests.
>
> Tested on sde with assembler with these insts. Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
> gcc/testsuite/ChangeLo
On Thu, Sep 19, 2024 at 9:34 AM Hu, Lin1 wrote:
>
> Hi, all
>
> The memory attr of some instructions should be 'load', but these is 'none'
> currently.
>
> This patch add two new types ssemov2, sseicvt2 for some load instructions that
> use memory on operands. So their memory attr will be 'load'.
On Wed, Sep 11, 2024 at 4:21 PM Hongtao Liu wrote:
>
> On Wed, Sep 11, 2024 at 4:04 PM Richard Biener
> wrote:
> >
> > On Wed, Sep 11, 2024 at 4:17 AM liuhongt wrote:
> > >
> > > GCC12 enables vectorization for O2 with very cheap cost model which is
>
On Thu, Sep 12, 2024 at 9:55 AM Levy Hsu wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_get_mask_mode):
> Enable BFmode for targetm.vectorize.get_mask_mode with AVX10.2.
> * config/
On Wed, Sep 11, 2024 at 4:04 PM Richard Biener
wrote:
>
> On Wed, Sep 11, 2024 at 4:17 AM liuhongt wrote:
> >
> > GCC12 enables vectorization for O2 with very cheap cost model which is
> > restricted
> > to constant tripcount. The vectorization capacity is very limited w/
> > consideration
> >
On Thu, Sep 5, 2024 at 10:05 AM Haochen Jiang wrote:
>
> Hi all,
>
> In avx512f-mask-type.h, we need SIZE being defined to get
> MASK_TYPE defined correctly. Fix those testcases where
> SIZE are not defined before the include for avv512f-mask-type.h.
>
> Note that for convert intrins in AVX10.2, t
On Tue, Sep 10, 2024 at 3:35 PM Levy Hsu wrote:
>
> Simple testcase fix, ok for trunk?
Ok.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Separated 32-bit
> scan
> and removed register checks in spill situations.
> ---
> .../i386/avx10_2-par
On Fri, Sep 6, 2024 at 10:34 AM Jiang, Haochen wrote:
>
> > From: Levy Hsu
> > Sent: Thursday, September 5, 2024 4:55 PM
> > To: gcc-patches@gcc.gnu.org
> >
> > Simple testcase fix, ok for trunk?
> >
> > This patch removes specific register checks to account for possible
> > register spills and d
On Wed, Sep 4, 2024 at 9:32 AM Levy Hsu wrote:
>
> Hi
>
> This change adds BFmode support to the ix86_preferred_simd_mode function
> enhancing SIMD vectorization for BF16 operations. The update ensures
> optimized usage of SIMD capabilities improving performance and aligning
> vector sizes with pr
On Wed, Sep 4, 2024 at 10:53 AM Levy Hsu wrote:
>
> Hi
>
> This patch adds support for bf16 operations in V2BF and V4BF modes on i386,
> handling signbit, xorsign, copysign, abs, neg, and various logical operations.
>
> Bootstrapped and tested on x86-64-pc-linux-gnu.
> Ok for trunk?
Ok.
>
> gcc/Ch
On Wed, Sep 4, 2024 at 11:31 AM Levy Hsu wrote:
>
> Hi
>
> Bootstrapped and tested on x86-64-pc-linux-gnu.
> Ok for trunk?
Ok.
>
> This patch introduces support for vectorized FMA operations for bf16 types in
> V2BF and V4BF modes on the i386 architecture. New mode iterators and
> define_expand en
On Tue, Sep 3, 2024 at 2:24 PM Haochen Jiang wrote:
>
> Hi all,
>
> The intrin for non-optimized got a typo in mask type, which will cause
> the high bits of __mmask32 being unexpectedly zeroed.
>
> The test does not fail under O0 with current 1b since the testcase is
> wrong. We need to include a
On Tue, Sep 3, 2024 at 9:45 AM Jiang, Haochen via Gcc-regression
wrote:
>
> As each AVX10.2 testcases previously, this is caused by option combination
> warning,
> which is expected.
>
Can we put the warning for mix usage of mavx10 and -mavx512f under -Wpsabi
And add -Wno-psabi in addition to -ma
On Mon, Sep 2, 2024 at 4:42 PM Levy Hsu wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
Ok.
>
> This patch supports sminmax for partial vectorized V2BF/V4BF.
>
> gcc/ChangeLog:
>
> * config/i386/mmx.md (3): New define_expand for
> V2BF/V4BFsmaxmin
>
>
On Mon, Sep 2, 2024 at 4:33 PM Levy Hsu wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> This patch introduces new mode iterators and expands for the i386
> architecture to support partial vectorization of bf16 operations using
> AVX10.2 instructions. Thes
On Mon, Aug 26, 2024 at 2:43 PM Haochen Jiang wrote:
>
> Hi all,
>
> I have just commited AVX10.2 new instructions patches into trunk hours
> ago. The next and final part for AVX10.2 upstream is to optimize code
> with AVX10.2 new instructions.
>
> In this patch series, it will contain the followi
On Fri, Aug 23, 2024 at 5:46 PM HAO CHEN GUI wrote:
>
> Hi Hongtao,
>
> 在 2024/8/23 11:47, Hongtao Liu 写道:
> > On Fri, Aug 23, 2024 at 11:03 AM HAO CHEN GUI wrote:
> >>
> >> Hi Hongtao,
> >>
> >> 在 2024/8/23 9:47, Hongtao Liu 写道:
> >&
On Mon, Aug 19, 2024 at 4:57 PM Haochen Jiang wrote:
>
> Hi all,
>
> The AVX10.2 ymm rounding patches has been merged to trunk around
> 6 hours ago. As mentioned before, next step will be AVX10.2 new
> instruction support.
>
> This patch series could be divided into three part.
>
> The first patch
On Fri, Aug 23, 2024 at 11:03 AM HAO CHEN GUI wrote:
>
> Hi Hongtao,
>
> 在 2024/8/23 9:47, Hongtao Liu 写道:
> > On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI wrote:
> >>
> >> Hi Hongtao,
> >>
> >> 在 2024/8/21 11:21, Hongtao Liu 写道:
> >>
On Thu, Aug 22, 2024 at 4:06 PM HAO CHEN GUI wrote:
>
> Hi Hongtao,
>
> 在 2024/8/21 11:21, Hongtao Liu 写道:
> > r15-3058-gbb42c551905024 support const0 operand for movv16qi, please
> > rebase your patch and see if there's still the regressions.
>
> There
On Wed, Aug 21, 2024 at 4:49 PM Richard Biener
wrote:
>
> On Wed, Aug 21, 2024 at 7:40 AM liuhongt wrote:
> >
> > When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
> > avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
> > set ix86_{move_max,store_max} as max ava
On Tue, Aug 20, 2024 at 2:50 PM Hongtao Liu wrote:
>
> On Tue, Aug 20, 2024 at 2:12 PM HAO CHEN GUI wrote:
> >
> > Hi,
> > Add Hongtao Liu as the patch affects x86.
> >
> > 在 2024/8/20 6:32, Richard Sandiford 写道:
> > > HAO CHEN GUI writes:
&g
On Tue, Aug 20, 2024 at 6:25 PM liuhongt wrote:
>
> From [1]
[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660575.html
> > > It's not obvious to me why movv16qi requires a nonimmediate_operand
> > > source, especially since ix86_expand_vector_mode does have code to
> > > cope with con
On Tue, Aug 20, 2024 at 2:12 PM HAO CHEN GUI wrote:
>
> Hi,
> Add Hongtao Liu as the patch affects x86.
>
> 在 2024/8/20 6:32, Richard Sandiford 写道:
> > HAO CHEN GUI writes:
> >> Hi,
> >> This patch adds const0 move checking for CLEAR_BY_PIECES.
On Wed, Aug 14, 2024 at 5:07 PM Haochen Jiang wrote:
>
> Hi all,
>
> The initial patch for AVX10.2 has been merged this week.
>
> For the upcoming patches, we will first upstream ymm rounding control part.
>
> In ymm rounding part, ALL the instructions in AVX512 with 512-bit rounding
> control wil
On Thu, Aug 15, 2024 at 3:27 PM liuhongt wrote:
>
> It results in 2 failures for x86_64-pc-linux-gnu{\
> -march=cascadelake};
>
> gcc: gcc.target/i386/extendditi3-1.c scan-assembler cqt?o
> gcc: gcc.target/i386/pr113560.c scan-assembler-times \tmulq 1
>
> For pr113560.c, now GCC generates mulx ins
On Wed, Aug 14, 2024 at 4:23 PM Kong, Lingling wrote:
>
>
>
> -Original Message-
> From: Kong, Lingling
> Sent: Wednesday, August 14, 2024 4:20 PM
> To: Kong, Lingling
> Subject: [PATCH v2] i386: Fix some vex insns that prohibit egpr
>
> Although these vex insn have evex counterpart, but
On Mon, Aug 12, 2024 at 3:12 PM kong lingling wrote:
>
> gcc/ChangeLog:
>
>
> PR target/113729
>
>* config/i386/i386.md (*ashlqi3_1_zext):
>
>New define_insn.
>
>(*ashlhi3_1_zext): Ditto.
>
>(*qi3_1_zext): Ditto.
>
>
On Mon, Aug 12, 2024 at 3:12 PM kong lingling wrote:
>
> gcc/ChangeLog:
>
>
>PR target/113729
>
>* config/i386/i386.md (*andqi_1_zext):
>
>New define_insn.
>
>(*andhi_1_zext): Ditto.
>
>(*qi_1_zext): Ditto.
>
>
On Mon, Aug 12, 2024 at 3:12 PM kong lingling wrote:
>
> gcc/ChangeLog:
>
>
>
>PR target/113729
>
>* config/i386/i386.md (*subqi_1_zext): New
>
>define_insn.
>
>(*subhi_1_zext): Ditto.
>
>(*addqi3_carry_zext): Ditto.
>
On Mon, Aug 12, 2024 at 3:10 PM kong lingling wrote:
>
> For APX instruction with an NDD, the destination GPR will get the
> instruction’s result in bits [OSIZE-1:0] and, if OSIZE < 64b, have its upper
> bits [63:OSIZE] zeroed. Now supporting other NDD instructions.
>
>
> Bootstrapped and regtes
On Mon, Aug 12, 2024 at 10:10 PM liuhongt wrote:
>
> > Are there any assumptions that BB_HEAD must be a note or label?
> > Maybe we should move ix86_align_loops into a separate pass and insert
> > the pass just before pass_final.
> The patch inserts .p2align after endbr pass, it can also fix the i
On Thu, Aug 1, 2024 at 3:50 PM Haochen Jiang wrote:
>
> Hi all,
>
> AVX10.2 tech details has been just published on July 31st in the
> following link:
>
> https://cdrdv2.intel.com/v1/dl/getContent/828965
>
> For new features and instructions, we could divide them into two parts.
> One is ymm round
On Mon, Aug 12, 2024 at 6:59 AM H.J. Lu wrote:
>
> On Thu, Aug 8, 2024 at 6:53 PM H.J. Lu wrote:
> >
> > When we emit .p2align to align BB_HEAD, we must update BB_HEAD. Otherwise
> > ENDBR will be inserted as the wrong place.
> >
> > gcc/
> >
> > PR target/116174
> > * config/i38
On Tue, Jul 30, 2024 at 11:04 AM liuhongt wrote:
>
> (insn 98 94 387 2 (parallel [
> (set (reg:TI 337 [ _32 ])
> (ashift:TI (reg:TI 329)
> (reg:QI 521)))
> (clobber (reg:CC 17 flags))
> ]) "test.c":11:13 953 {ashlti3_doubleword}
>
On Thu, Aug 1, 2024 at 10:03 AM Kong, Lingling wrote:
>
>
>
> > -Original Message-
> > From: Liu, Hongtao
> > Sent: Thursday, August 1, 2024 9:35 AM
> > To: Kong, Lingling ; gcc-patches@gcc.gnu.org
> > Cc: Wang, Hongyu
> > Subject: RE: [PATCH] i386: Fix memory constraint for APX NF
> >
>
On Tue, Jul 30, 2024 at 1:05 PM Hongyu Wang wrote:
>
> Richard Biener 于2024年7月26日周五 19:45写道:
> >
> > On Fri, Jul 26, 2024 at 10:50 AM Hongyu Wang wrote:
> > >
> > > Hi,
> > >
> > > When introducing munroll-only-small-loops, the option was marked as
> > > Target Save and added to -O2 default whic
On Wed, Jul 31, 2024 at 3:17 PM Uros Bizjak wrote:
>
> On Wed, Jul 31, 2024 at 9:11 AM Hongtao Liu wrote:
> >
> > On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak wrote:
> > >
> > > On Tue, Jul 30, 2024 at 3:00 PM Richard Biener wrote:
> > > >
>
On Wed, Jul 31, 2024 at 1:06 AM Uros Bizjak wrote:
>
> On Tue, Jul 30, 2024 at 3:00 PM Richard Biener wrote:
> >
> > On Tue, 30 Jul 2024, Alexander Monakov wrote:
> >
> > >
> > > On Tue, 30 Jul 2024, Richard Biener wrote:
> > >
> > > > > Oh, and please add a small comment why we don't use XFmode
On Wed, Jul 31, 2024 at 2:08 PM Kong, Lingling wrote:
>
> *add_4 and *adddi_4 are for shorter opcode from cmp to inc/dec or add
> $128.
>
> But NDD code is longer than the cmp code, so there is no need to support NDD.
>
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>
> Ok for tr
On Tue, Jul 30, 2024 at 9:27 AM Hongtao Liu wrote:
>
> On Fri, Jul 26, 2024 at 4:55 PM Haochen Jiang wrote:
> >
> > Hi all,
> >
> > I added related O0 testcase in this patch.
> >
> > Ok for trunk and backport to GCC 14 and GCC 13?
> Ok.
I mean for tru
On Fri, Jul 26, 2024 at 4:55 PM Haochen Jiang wrote:
>
> Hi all,
>
> I added related O0 testcase in this patch.
>
> Ok for trunk and backport to GCC 14 and GCC 13?
Ok.
>
> Thx,
> Haochen
>
> ---
>
> Changes in v2: Add testcases.
>
> ---
>
> Under -O0, with the "newly" introduced intrins, the varia
On Thu, Jul 25, 2024 at 3:23 PM Hongtao Liu wrote:
>
> On Wed, Jul 24, 2024 at 3:57 PM liuhongt wrote:
> >
> > For below pattern, RA may still allocate r162 as v/k register, try to
> > reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi
> &g
On Fri, Jul 26, 2024 at 2:59 PM liuhongt wrote:
>
> (insn 98 94 387 2 (parallel [
> (set (reg:TI 337 [ _32 ])
> (ashift:TI (reg:TI 329)
> (reg:QI 521)))
> (clobber (reg:CC 17 flags))
> ]) "test.c":11:13 953 {ashlti3_doubleword}
>
On Fri, Jul 26, 2024 at 2:28 PM Jiang, Haochen wrote:
>
> Ping for this patch
>
> Thx,
> Haochen
>
> > -Original Message-
> > From: Haochen Jiang
> > Sent: Thursday, July 18, 2024 9:45 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Liu, Hongtao ; hjl.to...@gmail.com;
> > ubiz...@gmail.com
> >
On Wed, Jul 24, 2024 at 3:57 PM liuhongt wrote:
>
> For below pattern, RA may still allocate r162 as v/k register, try to
> reload for address with leaq __libc_tsd_CTYPE_B@gottpoff(%rip), %rsi
> which result a linker error.
>
> (set (reg:DI 162)
> (mem/u/c:DI
>(const:DI (unspec:DI
>
On Wed, Jul 24, 2024 at 3:11 PM Kong, Lingling wrote:
>
> Tested spec2017 performance in Sierra Forest, Icelake, CascadeLake, at least
> there is no obvious regression.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>
> OK for trunk?
Ok.
>
> gcc/ChangeLog:
>
> * config/i386
On Thu, Jul 18, 2024 at 5:29 PM Kong, Lingling wrote:
>
> I adjusted my patch based on the comments by H.J.
> And I will add the testcase like gcc.target/i386/pr101395-1.c when the march
> for APX is determined.
>
> Ok for trunk?
Synced with LLVM folks, they agreed to this solution.
Ok.
>
> Than
On Mon, Jul 15, 2024 at 7:24 PM Paul-Antoine Arras wrote:
>
> This trivially fixes an incorrectly encoded character in the DejaGnu
> scan pattern.
>
> OK for trunk?
Ok.
> --
> PA
--
BR,
Hongtao
On Mon, Jul 15, 2024 at 1:39 PM Hu, Lin1 wrote:
>
> Hi, all
>
> Based on actual usage, trunc{128}2{16,32,64} use some instructions from
> sse/sse3, so extend their scope to extend the scope of optimization.
>
> Bootstraped and regtest on x86-64-linux-gnu, OK for trunk?
Ok.
>
> BRs,
> Lin
>
> gcc/C
On Thu, Jul 11, 2024 at 9:07 PM Alexandre Oliva wrote:
>
> On Jul 4, 2024, Alexandre Oliva wrote:
>
> > On Jul 3, 2024, Rainer Orth wrote:
>
> > Hmm, I wonder if leaf frame pointer has to do with that.
>
> It did, in a way.
>
>
>
> The first two patches for PR113719 have each regressed
>
On Wed, Jul 10, 2024 at 2:46 PM Hongyu Wang wrote:
>
> Hi,
>
> For APX ccmp, current infrastructure will always generate cstore for
> the ccmp flag user, like
>
> cmpe%rcx, %r8
> ccmpnel %rax, %rbx
> seta%dil
> add %rcx, %r9
> add %r9, %rdx
>
On Mon, Jul 15, 2024 at 10:21 AM Hongyu Wang wrote:
>
> > Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b?
>
> We can still deal with BFmode permutation the same way as HFmode, so
> the change in ix86_vectorize_vec_perm_const can be preserved.
>
> Hongt
On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wang wrote:
>
> Hi,
>
> According to the instruction spec of AVX512BF16, the convert from float
> to BF16 is not a simple truncation. It has special handling for
> denormal/nan, even for normal float it will add an extra bias according
> to the least signific
strap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
Ok.
>
>
> 2024-07-11 Roger Sayle
> Hongtao Liu
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_broadcast_from_con
On Wed, Jul 10, 2024 at 10:10 PM Victor Do Nascimento
wrote:
>
> Following the migration of the dot_prod optab from a direct to a
> conversion-type optab, ensure all back-end patterns incorporate the
> second machine mode into pattern names.
The patch LGTM. BTW you can use existing instead of
new
On Thu, Jul 4, 2024 at 11:24 AM Levy Hsu wrote:
>
> This patch extends support for BF16 vector operations in GCC, including
> bitwise AND, ANDNOT, ABS, NEG, COPYSIGN, and XORSIGN for V8BF, V16BF, and
> V32BF modes.
> Bootstrapped and tested on x86_64-linux-gnu. ok for trunk?
>
> gcc/ChangeLog:
>
On Thu, Jul 4, 2024 at 9:30 AM liuhongt wrote:
>
> From: "H.J. Lu"
>
> >The above reads like it would be worth splitting branc_prediction_hits
> >into branch_prediction_hints_taken and branch_prediction_hints_not_taken
> >given not-taken is the default and thus will just increase code size?
> >Ac
On Sun, Jul 7, 2024 at 5:00 PM Roger Sayle wrote:
>
>
> Hi Hongtao,
> This should address concerns about the remaining use of force_reg.
>
51@@ -25793,15 +25792,20 @@ ix86_expand_ternlog_binop (enum rtx_code
code, machine_mode mode,
52 if (GET_MODE (op1) != mode)
53 op1 = gen_lowpart (mod
On Fri, Jul 5, 2024 at 8:06 AM Hongtao Liu wrote:
>
> On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle wrote:
> >
> >
> > This patch fixes a problem with splitting of complex AVX512 ternlog
> > instructions on x86_64. A recent change allows the ternlog pattern
> >
On Fri, Jul 5, 2024 at 2:54 AM Roger Sayle wrote:
>
>
> This patch fixes a problem with splitting of complex AVX512 ternlog
> instructions on x86_64. A recent change allows the ternlog pattern
> to have multiple mem-like operands prior to reload, by emitting any
> "reloads" as necessary during sp
On Tue, Jul 2, 2024 at 11:24 AM Hongyu Wang wrote:
>
> Hi,
>
> According to APX spec, the pushp/popp pairs should be matched,
> otherwise the PPX hint cannot take effect and cause performance loss.
>
> In the ix86_expand_epilogue, there are several optimizations that may
> cause the epilogue using
On Thu, Jul 4, 2024 at 9:41 AM H.J. Lu wrote:
>
>
> On Thu, Jul 4, 2024, 9:12 AM Hongtao Liu wrote:
>>
>> On Thu, Jul 4, 2024 at 6:17 AM H.J. Lu wrote:
>> >
>> >
>> > On Wed, Jul 3, 2024, 9:37 PM Richard Biener
>> > wrote:
&
On Thu, Jul 4, 2024 at 6:17 AM H.J. Lu wrote:
>
>
> On Wed, Jul 3, 2024, 9:37 PM Richard Biener
> wrote:
>>
>> On Wed, Jul 3, 2024 at 9:25 AM liuhongt wrote:
>> >
>> > The patch can avoid SIGILL on non-AVX512 machine due to kmovd is
>> > generated in dynamic check.
>> >
>> > Committed as an obv
On Wed, Jul 3, 2024 at 2:10 AM Andi Kleen wrote:
>
> liuhongt writes:
>
> > From: "H.J. Lu"
> >
> > According to Intel® 64 and IA-32 Architectures Optimization Reference
> > Manual[1], Branch Hint is updated for Redwood Cove.
> >
> > cut from [1]-
> > Starting wit
On Mon, Jul 1, 2024 at 4:51 PM kong lingling wrote:
>
> Add some missing APX NF and NDD support for imul and mul.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
>
> Ok for trunk?
Ok.
>
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (*imulhizu): Added APX
> NF support.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.target/i386/pr100711-6.c: Update to check for decimal
> > immediate operand in ternlog, not hexadecimal.
> I got an ICE when bootstrapped with --enable-checking=yes,rtl,extra
>
The ICE can be walked around with 2 separate define_predicates,
On Mon, Jul 1, 2024 at 6:14 AM Roger Sayle wrote:
>
>
> As promised here's the final ternlog clean-up, that deletes the now
> obsolete legacy patterns and mode iterators from sse.md. It also updates
> the surviving ternlog patterns to consistently use decimal immediate
> operands (instead of hexa
On Sun, Jun 30, 2024 at 7:29 PM Roger Sayle wrote:
>
>
> This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c
> with --target_board='unix{-m32}' on RedHat 7.x. The issue is that this
> AVX512 test includes the system math.h, and on older systems this provides
> inline versions o
On Thu, Jun 27, 2024 at 4:29 PM Roger Sayle wrote:
>
>
> This patch is another round of refinements to fine tune the new ternlog
> infrastructure in i386's sse.md. This patch tweaks ix86_ternlog_idx
> to allow multiple MEM/CONST_VECTOR/VEC_DUPLICATE operands prior to
> splitting (before reload),
On Thu, Jun 27, 2024 at 9:23 AM Hu, Lin1 wrote:
>
> Hi, all
>
> This patch aims to refactor vcvttps2qq/vcvtqq2ps patterns for remove redundant
> round_*_modev8sf_condition.
>
> Bootstrapped and regtested on x86-64-linux-gnu, OK for trunk?
Ok.
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> * config/
On Thu, Jun 13, 2024 at 3:32 PM Alexandre Oliva wrote:
>
>
> The first patch for PR113719 regressed gcc.dg/ipa/iinline-attr.c on
> toolchains configured to --enable-frame-pointer, because the
> optimization node created within handle_optimize_attribute had
> flag_omit_frame_pointer incorrectly set
On Wed, Jun 26, 2024 at 4:02 PM Richard Biener
wrote:
>
> On Wed, Jun 26, 2024 at 9:14 AM Hongtao Liu wrote:
> >
> > On Wed, Jun 26, 2024 at 2:52 PM Richard Biener
> > wrote:
> > >
> > > On Wed, Jun 26, 2024 at 8:09 AM liuhongt wrote:
> > >
On Wed, Jun 26, 2024 at 2:52 PM Richard Biener
wrote:
>
> On Wed, Jun 26, 2024 at 8:09 AM liuhongt wrote:
> >
> > 416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0.
> > The commit adjust rtx_cost of mem to reduce cost of (add op0 disp).
> > But Cost of ADDR could be cheaper tha
On Sat, Jun 22, 2024 at 5:49 AM Collin Funk wrote:
>
> Hi Hongtao,
>
> I submitted a patch silencing -Wshift-overflow on a signed int
> constant here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654016.html
>
> You OK'd it here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/202
On Wed, Oct 25, 2023 at 2:49 AM Richard Sandiford
wrote:
>
> This patch adds a combine pass that runs late in the pipeline.
> There are two instances: one between combine and split1, and one
> after postreload.
>
> The pass currently has a single objective: remove definitions by
> substituting int
On Wed, Jun 19, 2024 at 5:04 AM Roger Sayle wrote:
>
>
> This patch tweaks ix86_ternlog_idx to allow any SUBREG that matches
> the register_operand predicate, and is split out as an independent
> piece of a patch that I have to clean-up redundant ternlog patterns
> in sse.md. It turns out that so
On Fri, Jun 14, 2024 at 9:35 AM Levy Hsu wrote:
>
> This patch updates the GCC x86 backend to efficiently handle
> odd, incrementally increasing permutations of BF16 vectors
> using the cvtne2ps2bf16 instruction.
> It modifies ix86_vectorize_vec_perm_const to support these operations
> and adds a
On Thu, Jun 13, 2024 at 3:13 PM Hu, Lin1 wrote:
>
> Hi, all
>
> This patch aims to refine all cvtt* instructions with UNSPEC instead of
> FIX/UNSIGNED_FIX. Because the intrinsics should behave as documented.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
Ok.
>
> BRs,
> Lin
>
On Fri, Jun 14, 2024 at 10:53 PM Hongtao Liu wrote:
>
> On Fri, Jun 14, 2024 at 6:31 PM Richard Biener wrote:
> >
> > The following retires vcond{,u,eq} optabs by stopping to use them
> > from the middle-end. Targets instead (should) implement vcond_mask
> > and
On Sat, Jun 15, 2024 at 1:22 AM Jeff Law wrote:
>
>
>
> On 6/14/24 11:10 AM, Alexander Monakov wrote:
> >
> > On Fri, 14 Jun 2024, Kong, Lingling wrote:
> >
> >> APX CFCMOV[1] feature implements conditionally faulting which means that
> >> all memory faults are suppressed
> >> when the condition
On Fri, Jun 14, 2024 at 6:31 PM Richard Biener wrote:
>
> The following retires vcond{,u,eq} optabs by stopping to use them
> from the middle-end. Targets instead (should) implement vcond_mask
> and vec_cmp{,u,eq} optabs. The PR this change refers to lists
> possibly affected targets - those imp
On Thu, May 30, 2024 at 1:52 PM Hu, Lin1 wrote:
>
> Hi, all
>
> This patch aims to extend __builtin_ia32_cmp[p|s][s|d] from avx to
> sse/sse2/avx, where its immediate is in range of [0, 7].
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
Ok.
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
On Thu, Jun 6, 2024 at 4:49 PM Kong, Lingling wrote:
>
> Enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc.
>
> gcc/ChangeLog:
>
> * config/i386/i386-opts.h (enum apx_features):Add apx_zu.
> * config/i386/i386.h (TARGET_APX_ZU): Define.
> * config/i386/i386.md (*imulhizu
On Thu, Jun 13, 2024 at 4:20 AM Roger Sayle wrote:
>
>
> This patch makes more use of m32bcst and m64bcst addressing modes in
> ix86_expand_ternlog. Previously, the i386 backend would only consider
> using a m32bcst if the inner mode of the vector was 32-bits, or using
> m64bcst if the inner mode
On Mon, Jun 10, 2024 at 2:37 PM Collin Funk wrote:
>
> A shift of 31 on a signed int is undefined behavior. Since unsigned
> int is 32-bits wide this change fixes it and silences the warning.
Ok.
>
> gcc/ChangeLog:
>
> PR target/115409
> * config/i386/avx512fp16intrin.h (_mm512_co
On Mon, Jun 10, 2024 at 3:20 PM Roger Sayle wrote:
>
>
> This patch fixes PR target/115397, a recent regression caused by my
> ternlog patch that results in an ICE (building numpy) with -m32 -fPIC.
> The problem is that ix86_broadcast_from_constant, which calls
> get_pool_constant, doesn't handle
vpternlogd[ \\t] 694
>
>
> 2024-06-06 Roger Sayle
> Hongtao Liu
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_args_builtin): Call
> fixup_modeless_constant before testing predicates. Only call
> copy_to_mode_reg on memory
On Thu, Jun 6, 2024 at 2:39 PM Hongyu Wang wrote:
>
> Current target apxf check does not specify sub-features that assembler
> supports, so the check with older binutils will fail at assemble stage
> for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check
> for latest apx subfeatur
On Wed, Jun 5, 2024 at 10:44 PM Jeff Law wrote:
>
>
>
> On 6/4/24 10:22 PM, liuhongt wrote:
> >> Can you add a testcase for this? I don't mind if it's x86 specific and
> >> does a bit of asm scanning.
> >>
> >> Also note that the context for this patch has changed, so it won't
> >> automatically
On Wed, May 29, 2024 at 11:05 AM Haochen Jiang wrote:
>
> Hi all,
>
> Since AVX10 is the first major ISA introduced after AVX-512, we propose
> to add target_clones support for it.
>
> Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since
> it is only for priority but not for implica
1 - 100 of 1012 matches
Mail list logo