[gcc r15-2430] i386: Mark target option with optimization when enabled with opt level [PR116065]

2024-07-31 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:a59c4e496fa916cb9a484a649aa1b4cebd6550f2 commit r15-2430-ga59c4e496fa916cb9a484a649aa1b4cebd6550f2 Author: Hongyu Wang Date: Fri Jul 26 08:27:01 2024 +0800 i386: Mark target option with optimization when enabled with opt level [PR116065] When introducing

Re: [PATCH] i386: Mark target option with optimization when enabled with opt level [PR116065]

2024-07-29 Thread Hongyu Wang
Richard Biener 于2024年7月26日周五 19:45写道: > > On Fri, Jul 26, 2024 at 10:50 AM Hongyu Wang wrote: > > > > Hi, > > > > When introducing munroll-only-small-loops, the option was marked as > > Target Save and added to -O2 default which makes attribute(optimize)

[PATCH] i386: Mark target option with optimization when enabled with opt level [PR116065]

2024-07-26 Thread Hongyu Wang
Hi, When introducing munroll-only-small-loops, the option was marked as Target Save and added to -O2 default which makes attribute(optimize) resets target option and causing error when cmdline has O1 and funciton attribute has O2 and other target options. Mark this option as Optimization to fix.

[gcc r15-2037] [APX NF] Add a pass to convert legacy insn to NF insns

2024-07-15 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:681ff5ccca153864eb86099eed201838d8d98bc2 commit r15-2037-g681ff5ccca153864eb86099eed201838d8d98bc2 Author: Hongyu Wang Date: Thu Apr 18 16:53:26 2024 +0800 [APX NF] Add a pass to convert legacy insn to NF insns For APX ccmp, current infrastructure

[gcc r15-2030] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:02a3bf5e2f0c18078bf67fc0002219edba1d76ff commit r15-2030-g02a3bf5e2f0c18078bf67fc0002219edba1d76ff Author: Hongyu Wang Date: Sat Jul 13 11:45:31 2024 +0800 AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889] According to the instruction spec

Re: [PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-14 Thread Hongyu Wang
> Could you just git revert 6d0b7b69d143025f271d0041cfa29cf26e6c343b? We can still deal with BFmode permutation the same way as HFmode, so the change in ix86_vectorize_vec_perm_const can be preserved. Hongtao Liu 于2024年7月15日周一 09:40写道: > > On Sat, Jul 13, 2024 at 3:44 PM Hongyu Wa

[PATCH] AVX512BF16: Do not allow permutation with vcvtne2ps2bf16 [PR115889]

2024-07-13 Thread Hongyu Wang
Hi, According to the instruction spec of AVX512BF16, the convert from float to BF16 is not a simple truncation. It has special handling for denormal/nan, even for normal float it will add an extra bias according to the least significant bit for bf number. This means we cannot use the

[PATCH] [APX NF] Add a pass to convert legacy insn to NF insns

2024-07-10 Thread Hongyu Wang
Hi, For APX ccmp, current infrastructure will always generate cstore for the ccmp flag user, like cmpe%rcx, %r8 ccmpnel %rax, %rbx seta%dil add %rcx, %r9 add %r9, %rdx testb %dil, %dil je .L2 For such case, the

[gcc r15-1833] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

2024-07-03 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:8e72b1bb3896f6e8d4f4679cbcfbc2a8212d04f9 commit r15-1833-g8e72b1bb3896f6e8d4f4679cbcfbc2a8212d04f9 Author: Hongyu Wang Date: Wed Feb 7 14:42:58 2024 +0800 [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue According to APX spec, the pushp/popp

Re: [PATCH] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

2024-07-02 Thread Hongyu Wang
apx spec, the mismatched pushp/popp pair does confused the fast-forwarding logic and turns off the PPX optimization. We just need to make sure every pushp for a certain reg has corresponding popp for that reg. Richard Biener 于2024年7月2日周二 16:18写道: > > On Tue, Jul 2, 2024 at 5:24 AM Hongyu Wang wr

[PATCH] [APX PPX] Avoid generating unmatched pushp/popp in pro/epilogue

2024-07-01 Thread Hongyu Wang
Hi, According to APX spec, the pushp/popp pairs should be matched, otherwise the PPX hint cannot take effect and cause performance loss. In the ix86_expand_epilogue, there are several optimizations that may cause the epilogue using mov to restore the regs. Check if PPX applied and prevent usage

[gcc r15-1469] i386: Fix some ISA bit test in option_override

2024-06-20 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:4867cc815531ede8bc356a2507f1c35ee6e6399c commit r15-1469-g4867cc815531ede8bc356a2507f1c35ee6e6399c Author: Hongyu Wang Date: Mon Jun 17 10:34:01 2024 +0800 i386: Fix some ISA bit test in option_override Adjust several new feature check

[PATCH] i386: Fix some ISA bit test in option_override

2024-06-19 Thread Hongyu Wang
Hi, This patch adjusts several new feature check in ix86_option_override_interal that directly use TARGET_* instead of TARGET_*_P (opts->ix86_isa_flags), which caused cmdline option overrides target_attribute isa flag. Bootstrapped && regtested on x86_64-pc-linux-gnu. Ok for trunk?

[gcc r15-1293] [APX CCMP] Add targetm.have_ccmp hook [PR115370]

2024-06-13 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:83a765768510d1f329887116757d6818d7846717 commit r15-1293-g83a765768510d1f329887116757d6818d7846717 Author: Hongyu Wang Date: Thu Jun 13 00:18:32 2024 +0800 [APX CCMP] Add targetm.have_ccmp hook [PR115370] In cfgexpand, there is an optimization for branch

Re: [PATCH] Add targetm.have_ccmp hook [PR115370]

2024-06-13 Thread Hongyu Wang
Thanks, this it the patch I'm going to check-in. Richard Sandiford 于2024年6月13日周四 17:04写道: > > Hongyu Wang writes: > > Hi, > > > > In cfgexpand, there is an optimization for branch which tests > > targetm.gen_ccmp_first == NULL. However for target like x86-64

Re: [PATCH] [i386] restore recompute to override opts after change [PR113719]

2024-06-13 Thread Hongyu Wang
Sorry for breaking the original logic, and very appreciate for your patch!! It does makes the logic more clear on top of opts and opts_set. I think the function name can be like ix86_unroll_flag_adjust instead of ix86_override_options_after_change_1, like the previous 2 functions which declares

[gcc r15-1242] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-13 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:99e6cf404e37655be303e71f20df03c284c7989e commit r15-1242-g99e6cf404e37655be303e71f20df03c284c7989e Author: Hongyu Wang Date: Thu May 9 10:12:16 2024 +0800 [APX CCMP] Use ctestcc when comparing to const 0 For CTEST, we don't have conditional AND so there's

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-13 Thread Hongyu Wang
> Perhaps the constraint can be slightly optimized to avoid repeating > (,) pairs. > > ",m," > "C ,," Yes, will check-in with this change. Thanks! Uros Bizjak 于2024年6月13日周四 14:06写道: > > On Thu, Jun 13, 2024 at 3:44 AM Hongyu Wang wrote: >

[PATCH] Add targetm.have_ccmp hook [PR115370]

2024-06-12 Thread Hongyu Wang
Hi, In cfgexpand, there is an optimization for branch which tests targetm.gen_ccmp_first == NULL. However for target like x86-64, the hook was implemented but it does not indicate that ccmp was enabled. Add a new target hook TARGET_HAVE_CCMP and replace the middle-end check for the existance of

Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Hongyu Wang
Thanks for the advice, updated patch in attachment. Bootstrapped/regtested on x86-64-pc-linux-gnu. Ok for trunk? Uros Bizjak 于2024年6月12日周三 18:12写道: > > On Wed, Jun 12, 2024 at 12:00 PM Uros Bizjak wrote: > > > > On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang wro

[PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-11 Thread Hongyu Wang
Hi, For CTEST, we don't have conditional AND so there's no optimization opportunity to write a new ctest pattern. Emit ctest when ccmp did comparison to const 0 to save bytes. Bootstrapped & regtested under x86-64-pc-linux-gnu. Ok for trunk? gcc/ChangeLog: * config/i386/i386.md

[gcc r15-1060] [APX CCMP] Support ccmp for float compare

2024-06-06 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:0b6cea8783b9e1b86c5c7c277c301cb5931bc5e0 commit r15-1060-g0b6cea8783b9e1b86c5c7c277c301cb5931bc5e0 Author: Hongyu Wang Date: Wed May 8 11:08:42 2024 +0800 [APX CCMP] Support ccmp for float compare The ccmp insn itself doesn't support fp compare, but x86 has

[gcc r15-1059] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-06-06 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:23db87301b623ecf162c9df718ce82ed9aa354a8 commit r15-1059-g23db87301b623ecf162c9df718ce82ed9aa354a8 Author: Hongyu Wang Date: Tue Apr 9 16:05:26 2024 +0800 [APX CCMP] Adjust startegy for selecting ccmp candidates For general ccmp scenario, the tree sequence

[gcc r15-1058] [APX CCMP] Support APX CCMP

2024-06-06 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:c989e59fc99d994159114304d4e715c72bedff0a commit r15-1058-gc989e59fc99d994159114304d4e715c72bedff0a Author: Hongyu Wang Date: Wed Mar 27 10:13:06 2024 +0800 [APX CCMP] Support APX CCMP APX CCMP feature implements conditional compare which executes compare

[gcc r15-1057] [APX] Adjust target-support check [PR 115341]

2024-06-06 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:f46d54a2a76acb94356989fb187853e5b58c3098 commit r15-1057-gf46d54a2a76acb94356989fb187853e5b58c3098 Author: Hongyu Wang Date: Thu Jun 6 13:00:26 2024 +0800 [APX] Adjust target-support check [PR 115341] Current target apxf check does not specify sub-features

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-06-06 Thread Hongyu Wang
st. The costs are not + meaningful for failed expansions. */ + + if (ret2 && (!ret || cost2 < cost1)) { *prep_seq = prep_seq_2; *gen_seq = gen_seq_2; -- 2.31.1 Richard Sandiford 于2024年6月5日周三 17:21写道: > > Hongyu Wang writes: > > CC'd Richard for ccmp

[PATCH] [APX] Adjust target-support check [PR 115341]

2024-06-06 Thread Hongyu Wang
Current target apxf check does not specify sub-features that assembler supports, so the check with older binutils will fail at assemble stage for new apx features like NF,CCMP or CFCMOV. Adjust the assembler check for latest apx subfeatures. Bootstrapped & regtested on x86-64-pc-linux-gnu with

[gcc r13-8811] i386: Fix ix86_option override after change [PR 113719]

2024-05-30 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:173f8763a66622f2a70ad66f60573fcff7d6b49e commit r13-8811-g173f8763a66622f2a70ad66f60573fcff7d6b49e Author: Hongyu Wang Date: Wed May 15 11:24:34 2024 +0800 i386: Fix ix86_option override after change [PR 113719] In ix86_override_options_after_change, calls

[gcc r14-10262] i386: Fix ix86_option override after change [PR 113719]

2024-05-30 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:cd161b335c2723d0dce1cab00ad216b423ec2767 commit r14-10262-gcd161b335c2723d0dce1cab00ad216b423ec2767 Author: Hongyu Wang Date: Wed May 15 11:24:34 2024 +0800 i386: Fix ix86_option override after change [PR 113719] In ix86_override_options_after_change, calls

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-29 Thread Hongyu Wang
Gently ping :) Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can help to review this part? Thanks. Hongyu Wang 于2024年5月23日周四 16:27写道: > > Gently ping for this :) > Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can > help to review this pa

[gcc r15-893] i386: Fix ix86_option override after change [PR 113719]

2024-05-29 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:499d00127d39ba894b0f7216d73660b380bdc325 commit r15-893-g499d00127d39ba894b0f7216d73660b380bdc325 Author: Hongyu Wang Date: Wed May 15 11:24:34 2024 +0800 i386: Fix ix86_option override after change [PR 113719] In ix86_override_options_after_change, calls

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-23 Thread Hongyu Wang
Gently ping for this :) Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can help to review this part? Thanks. Hongyu Wang 于2024年5月15日周三 16:25写道: > > CC'd Richard for ccmp part as previously it is added only for aarch64. > The original logic will not interrup

Re: [PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-16 Thread Hongyu Wang
Richard Biener 于2024年5月16日周四 15:05写道: > > On Thu, May 16, 2024 at 8:25 AM Hongyu Wang wrote: > > > > Hi, > > > > In ix86_override_options_after_change, calls to ix86_default_align > > and ix86_recompute_optlev_based_flags will cause mism

[PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-16 Thread Hongyu Wang
Hi, In ix86_override_options_after_change, calls to ix86_default_align and ix86_recompute_optlev_based_flags will cause mismatched target opt_set when doing cl_optimization_restore. Move them back to ix86_option_override_internal to solve the issue. Bootstrapped & regtested on

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-15 Thread Hongyu Wang
supports but ccmp not, so ret/ret2 will all be valid when comparing cost. Thanks in advance. Hongyu Wang 于2024年5月15日周三 16:22写道: > > For general ccmp scenario, the tree sequence is like > > _1 = (a < b) > _2 = (c < d) > _3 = _1 & _2 > > current ccmp expanding wil

[PATCH 1/3] [APX CCMP] Support APX CCMP

2024-05-15 Thread Hongyu Wang
APX CCMP feature implements conditional compare which executes compare when EFLAGS matches certain condition. CCMP introduces default flags value (dfv), when conditional compare does not execute, it will directly set the flags according to dfv. The instruction goes like ccmpeq {dfv=sf,of,cf,zf}

[PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-15 Thread Hongyu Wang
For general ccmp scenario, the tree sequence is like _1 = (a < b) _2 = (c < d) _3 = _1 & _2 current ccmp expanding will try to swap compare order for _1 and _2, compare the cost/cost2 between compare _1 and _2 first, then return the sequence with lower cost. For x86 ccmp, we don't support FP

[PATCH 3/3] [APX CCMP] Support ccmp for float compare

2024-05-15 Thread Hongyu Wang
The ccmp insn itself doesn't support fp compare, but x86 has fp comi insn that changes EFLAG which can be the scc input to ccmp. Allow scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD compare which can not be identified in ccmp. gcc/ChangeLog: * config/i386/i386-expand.cc

[PATCH 0/3] Support Intel APX CCMP

2024-05-15 Thread Hongyu Wang
html Hongyu Wang (3): [APX CCMP] Support APX CCMP [APX CCMP] Adjust startegy for selecting ccmp candidates [APX CCMP] Support ccmp for float compare gcc/ccmp.cc| 12 +- gcc/config/i386/i386-expand.cc | 164 + gcc/config/i386/

[gcc r14-9882] [APX] Prohibit SHA/KEYLOCKER usage of EGPR when APX enabled

2024-04-09 Thread Hongyu Wang via Gcc-cvs
https://gcc.gnu.org/g:ea665f90260acb3ffd2e39fcd2e200e702ee0ead commit r14-9882-gea665f90260acb3ffd2e39fcd2e200e702ee0ead Author: Hongyu Wang Date: Tue Apr 9 09:50:11 2024 +0800 [APX] Prohibit SHA/KEYLOCKER usage of EGPR when APX enabled The latest APX spec announced removal

[PATCH] Prohibit SHA/KEYLOCKER usage of EGPR when APX enabled

2024-04-09 Thread Hongyu Wang
The latest APX spec announced removal of SHA/KEYLOCKER evex promotion [1], which means the SHA/KEYLOCKER insn does not support EGPR when APX enabled. Update the corresponding constraints to their EGPR-disabled counterparts. Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk?

Re: [PATCH] x86: Properly implement AMX-TILE load/store intrinsics

2024-02-25 Thread Hongyu Wang
Thanks for fixing this! Didn't notice that the pointer conversion can cause this issue... Was it possible to use local array like char a[64] = (char *)p __asm__ volatile ("ldtilecfg\t%X0" :: "m" (a))); If not, for the two patterns we can use "m" instead of "jm" as APX supports EGPR extension

Re: [wwwdocs][PATCH] gcc-14/changes: Update APX inline asm behavior for x86_64

2024-01-15 Thread Hongyu Wang
I'm going to check-in this if no objection Hongyu Wang 于2024年1月9日周二 15:14写道: > > Hi, > > This patch adds missing description for inline asm behavior and related > compiler switch for APX. > > Ok for gcc-wwwdocs? > > --- > htdocs/gcc-14/changes.html | 6 ++ &

Re: [PATCH] i386: [APX] Document inline asm behavior and new switch for APX

2024-01-10 Thread Hongyu Wang
Thanks, this is the patch I'm going to check-in Hongtao Liu 于2024年1月10日周三 16:02写道: > > On Tue, Jan 9, 2024 at 3:09 PM Hongyu Wang wrote: > > > > Hi, > > > > For APX, the inline asm behavior was not mentioned in any document > > before. Add description for i

[wwwdocs][PATCH] gcc-14/changes: Update APX inline asm behavior for x86_64

2024-01-08 Thread Hongyu Wang
Hi, This patch adds missing description for inline asm behavior and related compiler switch for APX. Ok for gcc-wwwdocs? --- htdocs/gcc-14/changes.html | 6 ++ 1 file changed, 6 insertions(+) diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html index e3a68998..73a90d30

[PATCH] i386: [APX] Document inline asm behavior and new switch for APX

2024-01-08 Thread Hongyu Wang
Hi, For APX, the inline asm behavior was not mentioned in any document before. Add description for it. Ok for trunk? gcc/ChangeLog: * config/i386/i386.opt: Adjust document. * doc/invoke.texi: Add description for -mapx-inline-asm-use-gpr32. --- gcc/config/i386/i386.opt

[PATCH] i386: [APX] Add missing document for APX

2024-01-07 Thread Hongyu Wang
Hi, The supported sub-features for APX was missing in option document and target attribute section. Add those missing ones. Ok for trunk? gcc/ChangeLog: * config/i386/i386.opt: Add supported sub-features. * doc/extend.texi: Add description for target attribute. ---

[PATCH] testsuite: Require dfp for pr112943.c

2023-12-14 Thread Hongyu Wang
Hi, As Coudert points out, this test fails on darwin as it does not support _Decimal64, so require dfp for it. Pushed as obvious fix. gcc/testsuite/ChangeLog: * gcc.target/i386/pr112943.c: Require dfp. --- gcc/testsuite/gcc.target/i386/pr112943.c | 2 +- 1 file changed, 1

[PATCH] i386: Sync move_max/store_max with prefer-vector-width [PR112824]

2023-12-13 Thread Hongyu Wang
Hi, Currently move_max follows the tuning feature first, but ideally it should sync with prefer-vector-width when it is explicitly set to keep vector move and operation with same vector size. Bootstrapped/regtested on x86-64-pc-linux-gnu{-m32,} OK for trunk? gcc/ChangeLog: PR

Re: [PATCH] i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

2023-12-11 Thread Hongyu Wang
> > +__int128 u128_2 = (9223372036854775808 << 4) * foo0_u8_0; /* { > > dg-warning "integer constant is so large that it is unsigned" "so large" } > > */ > > Just you can use (9223372036854775807LL + (__int128) 1) instead of > 9223372036854775808 > to avoid the warning. > The testcase will

[PATCH] i386: Fix missed APX_NDD check for shift/rotate expanders [PR 112943]

2023-12-11 Thread Hongyu Wang
Hi, The ashl/lshr/ashr expanders calls ix86_expand_binary_operator, while they will be called for some post-reload split, and TARGET_APX_NDD is required for these calls to avoid force-load to memory at postreload stage. Bootstrapped/regtested on x86-64-pc-linux-gnu{-m32,} Ok for master?

[PATCH 13/16] [APX NDD] Support APX NDD for rotate insns

2023-12-06 Thread Hongyu Wang
gcc/ChangeLog: * config/i386/i386.md (*3_1): Extend with a new alternative to support NDD for SI/DI rotate, and adjust output template. (*si3_1_zext): Likewise. (*3_1): Likewise for QI/HI modes. (rcrsi2): Likewise, and use nonimmediate_operand for

[PATCH 16/16] [APX NDD] Support TImode shift for NDD

2023-12-06 Thread Hongyu Wang
For TImode shifts, they are splitted by splitter functions, which assume operands[0] and operands[1] to be the same. For the NDD alternative the assumption may not be true so add split functions for NDD to emit the NDD form instructions, and omit the handling of !64bit target split. Although the

[PATCH 05/16] [APX NDD] Support APX NDD for sub insns

2023-12-06 Thread Hongyu Wang
From: Kong Lingling gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy): Add use_ndd parameter and parse it. * config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy): Change define. * config/i386/i386.md (sub3): Add

[PATCH 12/16] [APX NDD] Support APX NDD for right shift insns

2023-12-06 Thread Hongyu Wang
Similar to LSHIFT, rshift do not need to omit $1 for NDD form. gcc/ChangeLog: * config/i386/i386.md (ashr3_cvt): Extend with new alternatives to support NDD, and adjust output templates. (*ashr3_1): Likewise for SI/DI mode. (*lshr3_1): Likewise.

[PATCH 11/16] [APX NDD] Support APX NDD for left shift insns

2023-12-06 Thread Hongyu Wang
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl 1 can be optimized to add. As NDD form of add requires src operand to be register since NDD cannot take 2 memory src, we currently just keep using NDD form shift instead of add. The optimization TARGET_SHIFT1 will try to

[PATCH 02/16] [APX NDD] Support APX NDD for optimization patterns of add

2023-12-06 Thread Hongyu Wang
From: Kong Lingling gcc/ChangeLog: * config/i386/i386.md: (addsi_1_zext): Add new alternatives for NDD and adjust output templates. (*add_2): Likewise. (*addsi_2_zext): Likewise. (*add_3): Likewise. (*addsi_3_zext): Likewise. (*adddi_4):

[PATCH 14/16] [APX NDD] Support APX NDD for shld/shrd insns

2023-12-06 Thread Hongyu Wang
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use +r*m as its constraint. To support NDD we added new define_insns to handle NDD form pattern with extra input and dest operand to be fixed in register. gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_ndd): New

[PATCH 15/16] [APX NDD] Support APX NDD for cmove insns

2023-12-06 Thread Hongyu Wang
gcc/ChangeLog: * config/i386/i386.md (*movcc_noc): Extend with new constraints to support NDD. (*movsicc_noc_zext): Likewise. (*movsicc_noc_zext_1): Likewise. (*movqicc_noc): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ndd-cmov.c: New

[PATCH 09/16] [APX NDD] Support APX NDD for and insn

2023-12-06 Thread Hongyu Wang
From: Kong Lingling For NDD form AND insn, there are three splitter fixes after extending legacy patterns. 1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for some optimization splitters that generates highpart zero_extract for QImode need to be prohibited under NDD

[PATCH 07/16] [APX NDD] Support APX NDD for neg insn

2023-12-06 Thread Hongyu Wang
From: Kong Lingling gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd parameter and adjust for NDD. * config/i386/i386-protos.h: Add use_ndd parameter for ix86_unary_operator_ok and ix86_expand_unary_operator. *

[PATCH 06/16] [APX NDD] Support APX NDD for sbb insn

2023-12-06 Thread Hongyu Wang
From: Kong Lingling Similar to *add3_doubleword, operands[1] may not equal to operands[0] so extra move and earlyclobber are required. gcc/ChangeLog: * config/i386/i386.md (*sub3_doubleword): Add new alternative for NDD, adopt '&' modifier to NDD dest and emit move when

[PATCH 04/16] [APX NDD] Support APX NDD for adc insns

2023-12-06 Thread Hongyu Wang
From: Kong Lingling Legacy adc patterns are commonly adopted to TImode add, when extending TImode add to NDD version, operands[0] and operands[1] can be different, so extra move should be emitted if those patterns have optimization when adding const0_rtx. For TImode insn, there could be

[PATCH 10/16] [APX NDD] Support APX NDD for or/xor insn

2023-12-06 Thread Hongyu Wang
From: Kong Lingling Similar to AND insn, two splitters need to be adjusted to prevent misoptimizaiton for NDD OR/XOR. Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will generate xor insn. gcc/ChangeLog: * config/i386/i386.md (3): Add new alternative for NDD

[PATCH 03/16] [APX NDD] Disable seg_prefixed memory usage for NDD add

2023-12-06 Thread Hongyu Wang
NDD uses evex prefix, so when segment prefix is also applied, the instruction could excceed its 15byte limit, especially adding immediates. This could happen when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will add the offset to segment register, which will be encoded

[PATCH 01/16] [APX NDD] Support Intel APX NDD for legacy add insn

2023-12-06 Thread Hongyu Wang
From: Kong Lingling APX NDD provides an extra destination register operand for several gpr related legacy insns, so a new alternative can be adopted to operand1 with "r" constraint. This first patch supports NDD for add instruction, and keeps to use lea when all operands are registers since lea

[PATCH 08/16] [APX NDD] Support APX NDD for not insn

2023-12-06 Thread Hongyu Wang
From: Kong Lingling For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be added together with xor NDD support. gcc/ChangeLog: * config/i386/i386.md (one_cmpl2): Add new constraints for NDD and adjust output template. (*one_cmpl2_1): Likewise.

[PATCH v3 00/16] Support Intel APX NDD

2023-12-06 Thread Hongyu Wang
) == ISA_APX_NDD instead of checking alternative at asm output stage. Bootstrapped & regtested on x86_64-pc-linux-gnu{-m32,} and sde. Ok for master? Hongyu Wang (7): [APX NDD] Disable seg_prefixed memory usage for NDD add [APX NDD] Support APX NDD for left shift insns [APX NDD] Support

Re: [PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled

2023-12-05 Thread Hongyu Wang
Uros Bizjak 于2023年12月5日周二 18:46写道: > > On Tue, Dec 5, 2023 at 3:29 AM Hongyu Wang wrote: > > > > Under APX NDD, previous TImode allocation will have issue that it was > > originally allocated using continuous pair, like rax:rdi, rdi:rdx. > > > > This will cau

[PATCH 10/17] [APX NDD] Support APX NDD for and insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling For NDD form AND insn, there are three splitter fixes after extending legacy patterns. 1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for some optimization splitters that generates highpart zero_extract for QImode need to be prohibited under NDD

[PATCH 17/17] [APX NDD] Support TImode shift for NDD

2023-12-04 Thread Hongyu Wang
For TImode shifts, they are splitted by splitter functions, which assume operands[0] and operands[1] to be the same. For the NDD alternative the assumption may not be true so add split functions for NDD to emit the NDD form instructions, and omit the handling of !64bit target split. Although the

[PATCH 11/17] [APX NDD] Support APX NDD for or/xor insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling Similar to AND insn, two splitters need to be adjusted to prevent misoptimizaiton for NDD OR/XOR. Also adjust *one_cmplsi2_2_zext and its corresponding splitter that will generate xor insn. gcc/ChangeLog: * config/i386/i386.md (3): Add new alternative for NDD

[PATCH 01/17] [APX NDD] Support Intel APX NDD for legacy add insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling APX NDD provides an extra destination register operand for several gpr related legacy insns, so a new alternative can be adopted to operand1 with "r" constraint. This first patch supports NDD for add instruction, and keeps to use lea when all operands are registers since lea

[PATCH 07/17] [APX NDD] Support APX NDD for sbb insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling Similar to *add3_doubleword, operands[1] may not equal to operands[0] so extra move is required. gcc/ChangeLog: * config/i386/i386.md (*sub3_doubleword): Add new alternative for NDD, and emit move when operands[0] not equal to operands[1].

[PATCH 05/17] [APX NDD] Support APX NDD for adc insns

2023-12-04 Thread Hongyu Wang
From: Kong Lingling Legacy adc patterns are commonly adopted to TImode add, when extending TImode add to NDD version, operands[0] and operands[1] can be different, so extra move should be emitted if those patterns have optimization when adding const0_rtx. NDD instructions will automatically

[PATCH 14/17] [APX NDD] Support APX NDD for rotate insns

2023-12-04 Thread Hongyu Wang
gcc/ChangeLog: * config/i386/i386.md (*3_1): Extend with a new alternative to support NDD for SI/DI rotate, and adjust output template. (*si3_1_zext): Likewise. (*3_1): Likewise for QI/HI modes. (rcrsi2): Likewise, and use nonimmediate_operand for

[PATCH 12/17] [APX NDD] Support APX NDD for left shift insns

2023-12-04 Thread Hongyu Wang
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl 1 can be optimized to add. As NDD form of add requires src operand to be register since NDD cannot take 2 memory src, we currently just keep using NDD form shift instead of add. The optimization TARGET_SHIFT1 will try to

[PATCH 15/17] [APX NDD] Support APX NDD for shld/shrd insns

2023-12-04 Thread Hongyu Wang
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use +r*m as its constraint. To support NDD we added new define_insns to handle NDD form pattern with extra input and dest operand to be fixed in register. gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_ndd): New

[PATCH 13/17] [APX NDD] Support APX NDD for right shift insns

2023-12-04 Thread Hongyu Wang
Similar to LSHIFT, rshift do not need to omit $1 for NDD form. gcc/ChangeLog: * config/i386/i386.md (ashr3_cvt): Extend with new alternatives to support NDD, and adjust output templates. (*ashr3_1): Likewise for SI/DI mode. (*lshr3_1): Likewise.

[PATCH 06/17] [APX NDD] Support APX NDD for sub insns

2023-12-04 Thread Hongyu Wang
From: Kong Lingling gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_fixup_binary_operands_no_copy): Add use_ndd parameter and parse it. * config/i386/i386-protos.h (ix86_fixup_binary_operands_no_copy): Change define. * config/i386/i386.md (sub3): Add

[PATCH 16/17] [APX NDD] Support APX NDD for cmove insns

2023-12-04 Thread Hongyu Wang
gcc/ChangeLog: * config/i386/i386.md (*movcc_noc): Extend with new constraints to support NDD. (*movsicc_noc_zext): Likewise. (*movsicc_noc_zext_1): Likewise. (*movqicc_noc): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ndd-cmov.c: New

[PATCH 04/17] [APX NDD] Disable seg_prefixed memory usage for NDD add

2023-12-04 Thread Hongyu Wang
NDD uses evex prefix, so when segment prefix is also applied, the instruction could excceed its 15byte limit, especially adding immediates. This could happen when "e" constraint accepts any UNSPEC_TPOFF/UNSPEC_NTPOFF constant and it will add the offset to segment register, which will be encoded

[PATCH 03/17] [APX NDD] Support APX NDD for optimization patterns of add

2023-12-04 Thread Hongyu Wang
From: Kong Lingling gcc/ChangeLog: * config/i386/i386.md: (addsi_1_zext): Add new alternatives for NDD and adjust output templates. (*add_2): Likewise. (*addsi_2_zext): Likewise. (*add_3): Likewise. (*addsi_3_zext): Likewise. (*adddi_4):

[PATCH 02/17] [APX NDD] Restrict TImode register usage when NDD enabled

2023-12-04 Thread Hongyu Wang
Under APX NDD, previous TImode allocation will have issue that it was originally allocated using continuous pair, like rax:rdi, rdi:rdx. This will cause issue for all TImode NDD patterns. For NDD we will not assume the arithmetic operations like add have dependency between dest and src1, then

[PATCH 08/17] [APX NDD] Support APX NDD for neg insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_unary_operator): Add use_ndd parameter and adjust for NDD. * config/i386/i386-protos.h: Add use_ndd parameter for ix86_unary_operator_ok and ix86_expand_unary_operator. *

[PATCH v2 00/17] Support Intel APX NDD

2023-12-04 Thread Hongyu Wang
-gnu{-m32,} and sde. OK for trunk? Hongyu Wang (8): [APX NDD] Restrict TImode register usage when NDD enabled [APX NDD] Disable seg_prefixed memory usage for NDD add [APX NDD] Support APX NDD for left shift insns [APX NDD] Support APX NDD for right shift insns [APX NDD] Support APX NDD

[PATCH 09/17] [APX NDD] Support APX NDD for not insn

2023-12-04 Thread Hongyu Wang
From: Kong Lingling For *one_cmplsi2_2_zext, it will be splitted to xor, so its NDD form will be added together with xor NDD support. gcc/ChangeLog: * config/i386/i386.md (one_cmpl2): Add new constraints for NDD and adjust output template. (*one_cmpl2_1): Likewise.

[PATCH] [i386] Fix push2pop2 test fail on non-linux target [PR112729]

2023-11-28 Thread Hongyu Wang
Hi, On linux x86-64, -fomit-frame-pointer was by default enabled so the push2pop2 tests cfi scans are based on it. On other target with -fno-omit-frame-pointer the cfi scan will be wrong as the frame pointer is pushed at first. Add -fomit-frame-pointer to these tests that related to cfi scan. OK

[PATCH] [APX PUSH2POP2] Adjust operand order for PUSH2POP2

2023-11-21 Thread Hongyu Wang
Hi, The push2/pop2 operand order does not match the binutils implementation for AT syntax that it will first push operands[2] then operands[1]. Correct it by reverse operand order for AT syntax. Bootstrapped/regtested on x86-64-linux-pc-gnu{-m32,} Ok for master? gcc/ChangeLog: *

Re: [PATCH] [APX PPX] Support Intel APX PPX

2023-11-20 Thread Hongyu Wang
then, thanks for the suggestion. Updated patch with just 1 new UNSPEC and removed cfa handling. Hongtao Liu 于2023年11月20日周一 14:46写道: > > On Fri, Nov 17, 2023 at 3:26 PM Hongyu Wang wrote: > > > > Intel APX PPX feature has been released in [1]. > > > > PPX stands for Push-Pop Accelera

[PATCH] [APX PPX] Support Intel APX PPX

2023-11-16 Thread Hongyu Wang
Intel APX PPX feature has been released in [1]. PPX stands for Push-Pop Acceleration. PUSH/PUSH2 and its corresponding POP can be marked with a 1-bit hint to indicate that the POP reads the value written by the PUSH from the stack. The processor tracks these marked instructions internally and

[PATCH 13/16] [APX NDD] Support APX NDD for right shift insns

2023-11-15 Thread Hongyu Wang
Similar to LSHIFT, rshift should also emit $1 for NDD form with CX_REG as operands[1]. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add LSHIFTRT and RSHIFTRT. * config/i386/i386.md (ashr3_cvt): Extend with new alternatives to support NDD, and

[PATCH 15/16] [APX NDD] Support APX NDD for shld/shrd insns

2023-11-15 Thread Hongyu Wang
For shld/shrd insns, the old pattern use match_dup 0 as its shift src and use +r*m as its constraint. To support NDD we added new define_insns to handle NDD form pattern with extra input and dest operand to be fixed in register. gcc/ChangeLog: * config/i386/i386.md (x86_64_shld_ndd): New

[PATCH 11/16] [APX NDD] Support APX NDD for or/xor insn

2023-11-15 Thread Hongyu Wang
From: Kong Lingling Similar to AND insn, two splitters need to be adjusted to prevent misoptimizaiton for NDD OR/XOR. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add IOR/XOR support. * config/i386/i386.md (3): Add NDD alternative and adjust

[PATCH 10/16] [APX NDD] Support APX NDD for and insn

2023-11-15 Thread Hongyu Wang
From: Kong Lingling For NDD form AND insn, there are three splitter fixes after extending legacy patterns. 1. APX NDD does not support high QImode registers like ah, bh, ch, dh, so for some optimization splitters that generates highpart zero_extract for QImode need to be prohibited under NDD

[PATCH 16/16] [APX NDD] Support APX NDD for cmove insns

2023-11-15 Thread Hongyu Wang
gcc/ChangeLog: * config/i386/i386.md (*movcc_noc): Extend with new constraints to support NDD. (*movsicc_noc_zext): Likewise. (*movsicc_noc_zext_1): Likewise. (*movqicc_noc): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ndd-cmov.c: New

[PATCH 07/16] [APX NDD] Support APX NDD for sbb insn

2023-11-15 Thread Hongyu Wang
From: Kong Lingling Similar to *add3_doubleword, operands[1] may not equal to operands[0] so extra move is required. gcc/ChangeLog: * config/i386/i386.md (*sub3_doubleword): Add ndd constraints, and emit move when operands[0] not equal to operands[1].

[PATCH 05/16] [APX NDD] Support APX NDD for adc insns

2023-11-15 Thread Hongyu Wang
From: Kong Lingling Legacy adc patterns are commonly adopted to TImode add, when extending TImode add to NDD version, operands[0] and operands[1] can be different, so extra move should be emitted if those patterns have optimization when adding const0_rtx. gcc/ChangeLog: *

[PATCH 12/16] [APX NDD] Support APX NDD for left shift insns

2023-11-15 Thread Hongyu Wang
For left shift, there is an optimization TARGET_DOUBLE_WITH_ADD that shl 1 can be optimized to add. As NDD form of add requires src operand to be register since NDD cannot take 2 memory src, we currently just keep using NDD form shift instead of add. The optimization TARGET_SHIFT1 will try to

[PATCH 09/16] [APX NDD] Support APX NDD for not insn

2023-11-15 Thread Hongyu Wang
From: Kong Lingling gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_can_use_ndd_p): Add NOT support. * config/i386/i386.md (one_cmpl2): Add NDD constraints, adjust output template. (*one_cmpl2_1): Likewise. (*one_cmplqi2_1): Likewise.

  1   2   3   4   >