[gcc-wwwdocs PATCH] gcc-14: Mention -march=gracemont support in x86_64

2024-09-18 Thread Haochen Jiang
Hi all, When I was backporting my doc patch in gcc trunk today, I found when adding -march=gracemont in GCC14, the corresponding wwwdoc is missing. This patch is adding that. Ok for wwwdocs trunk? Thx, Haochen --- htdocs/gcc-14/changes.html | 4 1 file changed, 4 insertions(+) diff --git

[PATCH v2] i386: Enhance AVX10.2 convert tests

2024-09-18 Thread Haochen Jiang
Hi all, For AVX10.2 convert tests, all of them are missing mask tests previously, this patch will add them in the tests. Tested on sde with assembler with corresponding insts. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Enhance mas

[PATCH] i386: Enhance AVX10.2 convert tests

2024-09-17 Thread Haochen Jiang
Hi all, For AVX10.2 convert tests, all of them are missing mask tests previously, this patch will add them in the tests. Tested on sde with assembler with these insts. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Enhance mask test.

[PATCH] i386: Add missing avx512f-mask-type.h include

2024-09-17 Thread Haochen Jiang
Hi all, Since commit r15-3594, we fixed the bugs in MASK_TYPE for AVX10.2 testcases, but we missed the following four. The tests are not FAIL since the binutils part haven't been merged yet, which leads to UNSUPPORTED test. But the avx512f-mask-type.h needs to be included, otherwise, it will be c

[PATCH] doc: Add more alias option and reorder Intel CPU -march documentation

2024-09-17 Thread Haochen Jiang
Hi all, Since r15-3539, there are requests coming in to add other alias option documentation. This patch will add all ot them, including corei7, corei7-avx, core-avx-i, core-avx2, atom, slm, gracemont and emerarldrapids. Also in the patch, I reordered that part of documentation, currently all the

[PATCH] doc: Enhance Intel CPU documentation

2024-09-05 Thread Haochen Jiang
Hi all, This patch will add those recent aliased CPU names into documentation for clearness. Ready to push for trunk and backport to GCC14 and part of the patch to GCC13 as an obvious fix if no objection. Thx, Haochen gcc/ChangeLog: PR target/116617 * doc/invoke.texi: Add meteo

[PATCH] i386: Fix incorrect avx512f-mask-type.h include

2024-09-04 Thread Haochen Jiang
Hi all, In avx512f-mask-type.h, we need SIZE being defined to get MASK_TYPE defined correctly. Fix those testcases where SIZE are not defined before the include for avv512f-mask-type.h. Note that for convert intrins in AVX10.2, they will need more modifications due to the current tests did not in

[PATCH] i386: Fix vfpclassph non-optimizied intrin

2024-09-02 Thread Haochen Jiang
Hi all, The intrin for non-optimized got a typo in mask type, which will cause the high bits of __mmask32 being unexpectedly zeroed. The test does not fail under O0 with current 1b since the testcase is wrong. We need to include avx512-mask-type.h after SIZE is defined, or it will always be __mma

[gcc-wwwdocs PATCH] gcc-15: Mention recent update for x86_64 backend

2024-08-27 Thread Haochen Jiang
Hi all, Sorry for the disturb since I mis-typoed gcc-patches to gcc-patchs, resend the patch. This patch will add documentation for recent update in x86-64 backend. Ok for wwwdocs trunk? Thx, Haochen --- Mention AVX10.2 support and Xeon Phi removal in GCC 15. --- htdocs/gcc-15/changes.html

[PATCH 4/8] i386: Support vectorized BF16 add/sub/mul/div with AVX10.2 instructions

2024-08-25 Thread Haochen Jiang
From: Levy Hsu AVX10.2 introduces several non-exception instructions for BF16 vector. Enable vectorized BF add/sub/mul/div operation by supporting standard optab for them. gcc/ChangeLog: * config/i386/sse.md (div3): New expander for BFmode div. (VF_BHSD): New mode iterator with

[PATCH 8/8] i386: Support vec_cmp for V8BF/V16BF/V32BF in AVX10.2

2024-08-25 Thread Haochen Jiang
From: Levy Hsu gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_use_mask_cmp_p): Add BFmode for int mask cmp. * config/i386/sse.md (vec_cmp): New vec_cmp expand for VBF modes. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c:

[PATCH 6/8] i386: Support vectorized BF16 smaxmin with AVX10.2 instructions

2024-08-25 Thread Haochen Jiang
From: Levy Hsu gcc/ChangeLog: * config/i386/sse.md (3): New define expand pattern for BF smaxmin. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: New test. --- gcc/config/i3

[PATCH 3/8] i386: Optimize generate insn for avx10.2 compare

2024-08-25 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_fp_compare): Add UNSPEC to support the optimization. * config/i386/i386.cc (ix86_fp_compare_code_to_integer): Add NE/EQ. * config/i386/i386.md (*cmpx): New define_insn. (*cmpxhf): Di

[PATCH 7/8] i386: Support vectorized BF16 sqrt with AVX10.2 instruction

2024-08-25 Thread Haochen Jiang
From: Levy Hsu gcc/ChangeLog: * config/i386/sse.md: Expand VF2H to VF2HB with VBF modes. --- gcc/config/i386/sse.md | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index b374783429c..2de592a9c8f 100644 ---

[PATCH 5/8] i386: Support vectorized BF16 FMA with AVX10.2 instructions

2024-08-25 Thread Haochen Jiang
From: Levy Hsu gcc/ChangeLog: * config/i386/sse.md: Add V8BF/V16BF/V32BF to mode iterator FMAMODEM. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-fma-1.c: New test. --- gcc/config/i386/sse.md

[PATCH 0/8] i386: Opmitize code with AVX10.2 new instructions

2024-08-25 Thread Haochen Jiang
Hi all, I have just commited AVX10.2 new instructions patches into trunk hours ago. The next and final part for AVX10.2 upstream is to optimize code with AVX10.2 new instructions. In this patch series, it will contain the following optimizations: - VNNI instruction auto vectorize (PATCH 1).

[PATCH 2/8] i386: Optimize ordered and nonequal

2024-08-25 Thread Haochen Jiang
From: "Hu, Lin1" Currently, when we input !__builtin_isunordered (a, b) && (a != b), gcc will emit ucomiss %xmm1, %xmm0 movl $1, %ecx setp %dl setnp %al cmovne %ecx, %edx andl %edx, %eax movzbl %al, %eax In fact, xorl %eax, %eax ucomiss %xmm1, %xmm0 setne %al is better. gcc/

[PATCH 1/8] i386: Auto vectorize sdot_prod, usdot_prod, udot_prod with AVX10.2 instructions

2024-08-25 Thread Haochen Jiang
gcc/ChangeLog: * config/i386/sse.md (VI1_AVX512VNNIBW): New. (VI2_AVX10_2): Ditto. (sdot_prod): Add AVX10.2 to auto vectorize and combine 512 bit part. (udot_prod): Ditto. (sdot_prodv64qi): Removed. (udot_prodv64qi): Ditto. (usdot_pro

[PATCH 11/12] AVX10.2: Support compare instructions

2024-08-19 Thread Haochen Jiang
in): Ditto. (ix86_expand_builtin): Change function call. * config/i386/i386.md (UNSPEC_COMX): New unspec. * config/i386/sse.md (avx10_2_vcomx): New. (_comi): Add HFmode. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-compare-1.c: New test. Co-authored-by: Hao

[PATCH 12/12] i386: Add bf8 -> fp16 intrin

2024-08-19 Thread Haochen Jiang
Since BF8 and FP16 have same bits for exponent, the type conversion between them is just a cast for fraction part. We will use a sequence of instrctions instead of new instructions to do that. For convenience, intrins are also provided. gcc/ChangeLog: * config/i386/avx10_2-512convertintri

[PATCH 10/12] AVX10.2: Support vector copy instructions

2024-08-19 Thread Haochen Jiang
From: "Zhang, Jun" gcc/ChangeLog: * config/config.gcc: Add avx10_2copyintrin.h. * config/i386/i386.md (avx10_2): New isa attribute. * config/i386/immintrin.h: Include avx10_2copyintrin.h. * config/i386/sse.md (sse_movss_): Add new constraints to handle AVX

[PATCH 09/12] AVX10.2: Support minmax instructions

2024-08-19 Thread Haochen Jiang
gcc.target/i386/avx10_2-vminmaxpd-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxph-2.c: Ditto. * gcc.target/i386/avx10_2-vminmaxps-2.c: Ditto. Co-authored-by: Lin Hu Co-authored-by: Haochen Jiang --- gcc/config.gcc|3 +- gcc/config/i3

[PATCH 06/12] [PATCH 2/2] AVX10.2: Support BF16 instructions

2024-08-19 Thread Haochen Jiang
From: konglin1 gcc/ChangeLog: * config/i386/avx10_2-512bf16intrin.h: Add new intrinsics. * config/i386/avx10_2bf16intrin.h: Diito. * config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE for new type. * config/i386/i386-builtin.def (BDESC): Add ne

[PATCH 05/12] [PATCH 1/2] AVX10.2: Support BF16 instructions

2024-08-19 Thread Haochen Jiang
From: konglin1 gcc/ChangeLog: * config.gcc: Add avx10_2-512bf16intrin.h and avx10_2bf16intrin.h. * config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE for V32BF_FTYPE_V32BF_V32BF, V16BF_FTYPE_V16BF_V16BF, V8BF_FTYPE_V8BF_V8BF, V8BF_FTYPE_V8BF_V8

[PATCH 07/12] [PATCH 1/2] AVX10.2: Support saturating convert instructions

2024-08-19 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config.gcc: Add avx10_2satcvtintrin.h and avx10_2-512satcvtintrin.h. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE (V8HI, V8BF, V8HI, UQI), (V16HI, V16BF, V16HI, UHI), (V32HI, V32BF, V32HI, USI), (V16

[PATCH 08/12] [PATCH 2/2] AVX10.2: Support saturating convert instructions

2024-08-19 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md (avx10_2_vcvttpd2dqs): New. (avx10_2_vcvttpd2qqs): Ditto. (avx10_2_vcvttps2dqs): Ditto. (avx10_2_vcvttps2qqs):

[PATCH 02/12] [PATCH 1/2] AVX10.2: Support media instructions

2024-08-19 Thread Haochen Jiang
: Ditto. * gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto. Co-authored-by: Haochen Jiang --- gcc/config.gcc| 3 +- gcc/config/i386/avx10_2-512mediaintrin.h | 234 +++ gcc/config/i386

[PATCH 03/12] [PATCH 2/2] AVX10.2: Support media instructions

2024-08-19 Thread Haochen Jiang
gcc/ChangeLog: * config/i386/avx10_2-512mediaintrin.h: Add new intrins. * config/i386/avx10_2mediaintrin.h: Ditto. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-builtins.cc (def_builtin): Handle shared builtins between AVXVNNIINT16 and

[PATCH 00/12] AVX10.2: Support new instructions

2024-08-19 Thread Haochen Jiang
Hi all, The AVX10.2 ymm rounding patches has been merged to trunk around 6 hours ago. As mentioned before, next step will be AVX10.2 new instruction support. This patch series could be divided into three part. The first patch will refactor m512-check.h under testsuite to reuse AVX-512 helper fun

[PATCH 01/12] i386: Refactor m512-check.h

2024-08-19 Thread Haochen Jiang
After AVX10 introduction, we still want to use AVX512 helper functions to avoid duplicate code. In order to reuse them, we need to do some refactor to make sure each function define happen under correct ISA to avoid ABI warnings. gcc/testsuite/ChangeLog: * gcc.target/i386/m512-check.h: Wr

[PATCH 22/22] AVX10.2 ymm rounding: Support vsqrtp{s, d, h} and vsubp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * g

[PATCH 17/22] AVX10.2 ymm rounding: Support vgetexpp{s, d, h} and vgetmantp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 20/22] AVX10.2 ymm rounding: Support vreducep{s, d, h} and vrndscalep{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (reducep): Add condition check. (_rndscale): Ditto. gcc/testsuite/ChangeLog:

[PATCH 21/22] AVX10.2 ymm rounding: Support vscalefp{s,d,h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/sse.md: (_scalef): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c:

[PATCH 14/22] AVX10.2 ymm rounding: Support vfm{sub, subadd}{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (_fmsub__mask): Add conditi

[PATCH 01/22] AVX10.2 ymm rounding: Support vadd{s, d, h} and vcmp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config.gcc: Add avx10_2roundingintrin.h. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle

[PATCH 18/22] AVX10.2 ymm rounding: Support v{max, min}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * g

[PATCH 04/22] AVX10.2 ymm rounding: Support vcvtph2p{s, d, sx} and vcvtph2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 19/22] AVX10.2 ymm rounding: Support vmulp{s, d, h} and vrangep{s, d} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 15/22] AVX10.2 ymm rounding: Support vfmulcph and vfnmadd{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * g

[PATCH 11/22] AVX10.2 ymm rounding: Support vfc{madd, mul}cph, vfixupimmp{s, d} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 16/22] AVX10.2 ymm rounding: Support vfnmsub{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (_fnmsub__mask3): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c:

[PATCH 09/22] AVX10.2 ymm rounding: Support vcvttps2{, u}{dq, qq} and vcvtu{dq, qq}2p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md (unspec_fix_truncv8sfv8si2):

[PATCH 07/22] AVX10.2 ymm rounding: Support vcvtqq2p{s, d, h} and vcvttpd2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 03/22] AVX10.2 ymm rounding: Support vcvtpd2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: Add new intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_built

[PATCH 00/22] Support AVX10.2 ymm rounding

2024-08-14 Thread Haochen Jiang
Hi all, The initial patch for AVX10.2 has been merged this week. For the upcoming patches, we will first upstream ymm rounding control part. In ymm rounding part, ALL the instructions in AVX512 with 512-bit rounding control will also have 256-bit rounding control in AVX10.2. For clearness, the

[PATCH 13/22] AVX10.2 ymm rounding: Support vfmaddcph and vfmaddsub{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (_fmaddsub__mask): Add cond

[PATCH 05/22] AVX10.2 ymm rounding: Support vcvtph2{, u}w and vcvtps2p{d, hx} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 08/22] AVX10.2 ymm rounding: Support vcvttph2{, u}{dq, qq, w} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md (avx512fp16_fix_trunc2): Ex

[PATCH 10/22] AVX10.2 ymm rounding: Support vcvt{, u}w2ph and vdivp{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 02/22] AVX10.2 ymm rounding: Support vcvtdq2p{s, h} and vcvtpd2p{s, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: Add new intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_built

[PATCH 12/22] AVX10.2 ymm rounding: Support vfmadd{132, 231, 213}p{s, d, h} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (_fmadd__mask3): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c:

[PATCH 06/22] AVX10.2 ymm rounding: Support vcvtps2{, u}{dq, qq} intrins

2024-08-14 Thread Haochen Jiang
From: "Hu, Lin1" gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin):

[PATCH 1/1] Initial support for AVX10.2

2024-08-01 Thread Haochen Jiang
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Handle avx10.2. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX10_2_256_SET): New. (OPTION_MASK_ISA2_AVX10_2_512_SET): Ditto. (OPTION_MASK_ISA2_AVX10_1_256_UNSET):

[PATCH 0/1] Initial support for AVX10.2

2024-08-01 Thread Haochen Jiang
Hi all, AVX10.2 tech details has been just published on July 31st in the following link: https://cdrdv2.intel.com/v1/dl/getContent/828965 For new features and instructions, we could divide them into two parts. One is ymm rounding control, the other is the new instructions. In the following week

[GCC12/13 PATCH] i386: Use _mm_setzero_ps/d instead of _mm_avx512_setzero_ps/d for GCC13/12

2024-07-28 Thread Haochen Jiang
Hi all, In GCC13/12, there is no _mm_avx512_setzero_ps/d since it is introduced in GCC14. Fix the backport issue as obvious in: https://gcc.gnu.org/pipermail/gcc-regression/2024-July/080385.html Thx, Haochen gcc/ChangeLog: * config/i386/avx512dqintrin.h (_mm_reduce_round_sd): Use

[PATCH v2] i386: Add non-optimize prefetchi intrins

2024-07-26 Thread Haochen Jiang
Hi all, I added related O0 testcase in this patch. Ok for trunk and backport to GCC 14 and GCC 13? Thx, Haochen --- Changes in v2: Add testcases. --- Under -O0, with the "newly" introduced intrins, the variable will be transformed as mem instead of the origin symbol_ref. The compiler will th

[PATCH v2] i386: Fix AVX512 intrin macro typo

2024-07-26 Thread Haochen Jiang
Hi all, I have added related testcases into the patch. Ok for trunk and backport to GCC 14, GCC 13 and GCC 12? Thx, Haochen --- Changes in v2: Add related testcases --- There are several typo in AVX512 intrins macro define. Correct them to solve errors when compiled with -O0. gcc/ChangeLog

[PATCH] i386: Add non-optimize prefetchi intrins

2024-07-25 Thread Haochen Jiang
Hi all, Under -O0, with the "newly" introduced intrins, the variable will be transformed as mem instead of the origin symbol_ref. The compiler will then treat the operand as invalid and turn the operation into nop, which is not expected. Use macro for non-optimize to keep the variable as symbol_re

[PATCH] i386: Fix AVX512 intrin macro typo

2024-07-25 Thread Haochen Jiang
Hi all, There are several typo in AVX512 intrins macro define. They will eventually result in errors with -O0. This patch will fix that. Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC14, GCC 13 and GCC 12? Thx, Haochen gcc/ChangeLog: * config/i386/avx512dqintrin.

[PATCH v2] i386: Change prefetchi output template

2024-07-22 Thread Haochen Jiang
Hi all, I tested with %a and it works. Therefore I suppose it is a better solution. Bootstrapped and regtested on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC 13 and 14? Thx, Haochen --- Changes in v2: Use %a in pattern --- For prefetchi instructions, RIP-relative address is explici

[PATCH] i386: Change prefetchi output template

2024-07-21 Thread Haochen Jiang
Hi all, For prefetchi instructions, RIP-relative address is explicitly mentioned for operand and assembler obeys that rule strictly. This makes instruction like: prefetchit0 bar got illegal for assembler, which should be a broad usage for prefetchi. Explicitly add (%rip) after funct

[PATCH v2] i386: Fix testcases generating invalid asm

2024-07-17 Thread Haochen Jiang
Hi all, I revised the patch according to the comment. Ok for trunk? Thx, Haochen --- Changes in v2: Add suffix for mov to make the test more robust. --- For compile test, we should generate valid asm except for special purposes. Fix the compile test that generates invalid asm. gcc/testsuite

[PATCH] i386: Fix testcases generating invalid asm

2024-07-17 Thread Haochen Jiang
Hi all, For compile test, we should generate valid asm except for special purposes. Fix the compile test that generates invalid asm. Regtested on x86-64-pc-linux-gnu. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/apx-egprs-names.c: Use ax for short and a

[PATCH] i386: Use BLKmode for {ld,st}tilecfg

2024-07-17 Thread Haochen Jiang
Hi all, For AMX instructions related with memory, we will treat the memory size as not specified since there won't be different size causing confusion for memory. This will change the output under Intel mode, which is broken for now when using with assembler and aligns to current binutils behavio

[PATCH] i386: Correct AVX10 CPUID emulation

2024-07-09 Thread Haochen Jiang
Hi all, AVX10 Documentaion has specified ecx value as 0 for AVX10 version and vector size under 0x24 subleaf. Although for ecx=1, the bits are all reserved for now, we still need to specify ecx as 0 to avoid dirty value in ecx. Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk and backport to GCC

[PATCH] Add AVX10.1 target_clones support

2024-05-28 Thread Haochen Jiang
Hi all, Since AVX10 is the first major ISA introduced after AVX-512, we propose to add target_clones support for it. Although AVX10.1-256 won't cover 512-bit part of AVX512F, but since it is only for priority but not for implication, it won't be an issue. Bootstrapped and regtested on x86_64-pc-

[PATCH v3] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Haochen Jiang
Hi all, This is the v3 patch to fix PR115069. The new testcase has passed. Changes in v3: - Simplify the testcase. Changes in v2: - Add a testcase. - Change the comment for the early exit. Thx, Haochen Since vpermq is really slow, we should avoid using it for permutation when vpmovwb is

[PATCH v2] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Haochen Jiang
Hi all, This is the v2 patch to fix PR115069. The new testcase has passed. Changes in v2: - Added a testcase. - Change the comment for the early exit. Thx, Haochen Since vpermq is really slow, we should avoid using it for permutation when vpmovwb is not available (needs AVX512BW) for ix86_e

[PATCH] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-20 Thread Haochen Jiang
Hi all, Since vpermq is really slow, we should avoid using it when it is the only instruction could be used for ix86_expand_vecop_qihi2. Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/115069 * config/i386/i386-expand.cc (i

[PATCH 2/2] Align tight&hot loop without considering max skipping bytes.

2024-05-14 Thread Haochen Jiang
From: liuhongt When hot loop is small enough to fix into one cacheline, we should align the loop with ceil_log2 (loop_size) without considering maximum skipp bytes. It will help code prefetch. gcc/ChangeLog: * config/i386/i386.cc (ix86_avoid_jump_mispredicts): Change gen_pad to

[PATCH 1/2] Adjust generic loop alignment from 16:11:8 to 16 for Intel processors

2024-05-14 Thread Haochen Jiang
Previously, we use 16:11:8 in generic tune for Intel processors, which lead to cross cache line issue and result in some random performance penalty in benchmarks with small loops commit to commit. After changing to always aligning to 16 bytes, it will somehow solve the issue. gcc/ChangeLog:

[PATCH 0/2] Align tight loops to solve cross cacheline issue

2024-05-14 Thread Haochen Jiang
n. We planned to backport it to GCC14.2. Thx, Haochen Haochen Jiang (1): Adjust generic loop alignment from 16:11:8 to 16 for Intel processors liuhongt (1): Align tight&hot loop without considering max skipping bytes. gcc/config/i386/i386.cc | 148 ++- g

[PATCH] i386: Fix array index overflow in pr105354-2.c

2024-04-26 Thread Haochen Jiang
Hi all, The array index should not be over 8 for v8hi, or it will fail under -O0 or using -fstack-protector. This patch aims to fix that, which is mentioned in PR110621. Commit as obvious and backport to GCC13. Thx, Haochen gcc/testsuite/ChangeLog: PR target/110621 * gcc.targe

[PATCH] i386: Fix behavior for both using AVX10.1-256 in options and function attribute

2024-04-23 Thread Haochen Jiang
Hi all, When we are using -mavx10.1-256 in command line and avx10.1-256 in target attribute together, zmm should never be generated. But current GCC will generate zmm since it wrongly enables EVEX512 for non-explicitly set AVX512. This patch will fix that issue. Regtested on x86_64-pc-linux-gnu.

[PATCH] i386: Fix Sierra Forest auto dispatch

2024-04-22 Thread Haochen Jiang
Hi all, This patch fixes an bug in mapping which caused auto dispatch failed. Sierra Forest is in processor_types enum, but not processor_subtypes. Committed as obvious and backport to GCC13. Thx, Haochen gcc/ChangeLog: * common/config/i386/i386-common.cc (processor_alias_table):

[gcc-wwwdocs PATCH] Uncomment MCore part title

2024-04-12 Thread Haochen Jiang
Hi all, When I am checking GCC14 documentation, I found that MCore forgot to uncomment the title for their part, which caused the documentation is mixed with x86. Uncomment that and commit as obvious. Thx, Haochen --- htdocs/gcc-14/changes.html | 2 +- 1 file changed, 1 insertion(+), 1 deletio

[PATCH] i386: Modify testcases failed under -DDEBUG

2024-01-21 Thread Haochen Jiang
Hi all, Recently, I happened to run i386.exp under -DDEBUG and found some fail. This patch aims to fix that. Ok for trunk? Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/adx-check.h: Include stdio.h when DEBUG is defined. * gcc.target/i386/avx512fp16-vscalefph-

[PATCH] i386: Remove redundant move in vnni pattern

2024-01-11 Thread Haochen Jiang
Hi all, This patch removes all redundant set in vnni patterns. Ok for trunk? Thx, Haochen gcc/ChangeLog: * config/i386/sse.md (sdot_prod): Remove redundant SET. (usdot_prod): Ditto. (sdot_prod): Ditto. (udot_prod): Ditto. --- gcc/config/i386/sse.md | 4 1

[PATCH] i386: Add AVX10.1 related macros

2024-01-09 Thread Haochen Jiang
Hi all, This patch aims to add AVX10.1 related macros for libgomp's request. The request comes following: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.html Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/113288 * config/i386/i386-c.cc (ix86_target_macros_i

[PATCH] Add -mevex512 into invoke.texi

2024-01-09 Thread Haochen Jiang
Hi Richard, It seems that I send out a not updated patch. This patch should what I want to send. Thx, Haochen gcc/ChangeLog: * doc/invoke.texi: Add -mevex512. --- gcc/doc/invoke.texi | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc

[PATCH] Add -mevex512 into invoke.texi

2024-01-08 Thread Haochen Jiang
Hi all, In invoke.texi, -mevex512 is missing. This patch adds that. Ok for trunk? Thx, Haochen gcc/ChangeLog: * doc/invoke.texi: Add -mevex512. --- gcc/doc/invoke.texi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 6

[PATCH] i386: Fix recent testcase fail

2024-01-08 Thread Haochen Jiang
After commit 01f4251b8775c832a92d55e2df57c9ac72eaceef, early break vectorization is supported. The two testcases need to be fixed. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-xorsign-1.c: Fix testcase. * gcc.target/i386/part-vect-absneghf.c: Ditto. --- gcc/testsuite/gcc

[gcc-wwwdocs PATCH v2] gcc-13/14: Mention recent update for x86_64 backend

2023-12-21 Thread Haochen Jiang
Hi all, This is the v2 patch for the wwwdocs change regarding to review. If there is no objection, I will push this change next Tuesday. Changes is v2: - Remove RAO-INT from Grand Ridge - Remove the mask register restriction for -mno-evex512 - Arrange the options alphabetically - Other

[PATCH] i386: Allow 64 bit mask register for -mno-evex512

2023-12-14 Thread Haochen Jiang
Hi all, There is a recent change in AVX10 documentation which allows 64 bit mask register instructions in AVX10-256, the documentation comes following: Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification https://cdrdv2.intel.com/v1/dl/getContent/784267 The Converged Vecto

[PATCH] i386: Remove RAO-INT from Grand Ridge

2023-12-13 Thread Haochen Jiang
Hi all, According to ISE050 published at the end of September, RAO-INT will not be in Grand Ridge anymore. This patch aims to remove it. The documentation comes following: https://cdrdv2.intel.com/v1/dl/getContent/671368 Regtested on x86_64-pc-linux-gnu. Ok for trunk and backport to GCC13? Thx

[PATCH] i386: Fix PR110790 testcase

2023-12-12 Thread Haochen Jiang
Hi all, This patch will fix the testcase fail previously introduced. Approved by another thread: https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640288.html Pushed to trunk. Thx, Haochen gcc/testsuite/ChangeLog: * gcc.target/i386/pr110790-2.c: Change scan-assembler from shrq

[gcc-wwwdocs PATCH] gcc-13/14: Mention recent update for x86_64 backend

2023-12-07 Thread Haochen Jiang
Hi all, This patch will mention the following changes in wwwdocs for x86_64 backend: - AVX10.1 support - APX EGPR, PUSH2POP2, PPX and NDD support - Xeon Phi ISAs deprecated Also I adjust the words in x86_64 part for GCC 13. Ok for gcc-wwwdocs? Thx, Haochen Mention AVX10.1 support, APX su

[PATCH] i386: Mark Xeon Phi ISAs as deprecated

2023-11-30 Thread Haochen Jiang
Since Knight Landing and Knight Mill microarchitectures are EOL, we would like to remove its support in GCC 15. In GCC 14, we will first emit a warning for the usage. gcc/ChangeLog: * config/i386/driver-i386.cc (host_detect_local_cpu): Do not append "-mno-" for Xeon Phi ISAs.

[RFC] i386: Remove Xeon Phi ISA support

2023-11-30 Thread Haochen Jiang
Hi all, Since Knight Landing and Knight Mill microarchitectures were EOL in 2019 and previously ICC and ICX has removed the support and emitted errors, we would also like to remove the support in GCC to reduce maintainence effort. The deprecated Xeon Phi ISAs are AVX512PF, AVX512ER, AVX5124VNNIW,

[PATCH] i386: Fix AVX512 and AVX10 option issues

2023-11-22 Thread Haochen Jiang
Hi all, This patch should be able to fix the current issue mentioned in PR112643. Also, I fixed some legacy issues in code related to AVX512/AVX10. Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/112643 * config/i386/driver-i386.cc (check_avx10_avx512_features): Re

[PATCH] Initial support for AVX10.1

2023-11-09 Thread Haochen Jiang
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Add avx10_set and version and detect avx10.1. (cpu_indicator_init): Handle avx10.1-512. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVX10_1_256_SET): New. (OPTION_MASK_

[RFC] Intel AVX10.1 Compiler Design and Support

2023-11-09 Thread Haochen Jiang
Hi all, This RFC patch aims to add AVX10.1 options. After we added -m[no-]evex512 support, it makes a lot easier to add them comparing to the August version. Detail for AVX10 is shown below: Intel Advanced Vector Extensions 10 (Intel AVX10) Architecture Specification It describes the Intel Advan

[PATCH] i386: Fix isa attribute for TI/TF andnot mode

2023-11-06 Thread Haochen Jiang
Hi all, This patch aims fo fix the wrong isa attribute which caused regression on PR111907. Regtested on x86_64-pc-linux-gnu. Ok for trunk? Thx, Haochen gcc/ChangeLog: PR target/111907 * config/i386/i386.md (avx_noavx512vl): Add missing definition. * config/i386/sse.md

[PATCH 3/4] [PATCH 3/3] Change internal intrin call for AVX512 intrins

2023-10-30 Thread Haochen Jiang
gcc/ChangeLog: * config/i386/avx512bf16vlintrin.h (_mm_avx512_castsi128_ps): New. (_mm256_avx512_castsi256_ps): Ditto. (_mm_avx512_slli_epi32): Ditto. (_mm256_avx512_slli_epi32): Ditto. (_mm_avx512_cvtepi16_epi32): Ditto. (_mm256_avx512_cvtep

[PATCH 2/4] [PATCH 2/3] Change internal intrin call for AVX512 intrins

2023-10-30 Thread Haochen Jiang
gcc/ChangeLog: * config/i386/avx512bf16vlintrin.h: Change intrin call. * config/i386/avx512fintrin.h (_mm_avx512_undefined_ps): New. (_mm_avx512_undefined_pd): Ditto. (__attribute__): Change intrin call. * config/i386/avx512vbmivlintrin.h: Ditto.

[PATCH 4/4] Push no-evex512 target for 128/256 bit intrins

2023-10-30 Thread Haochen Jiang
gcc/ChangeLog: PR target/111889 * config/i386/avx512bf16intrin.h: Push no-evex512 target. * config/i386/avx512bf16vlintrin.h: Ditto. * config/i386/avx512bitalgvlintrin.h: Ditto. * config/i386/avx512bwintrin.h: Ditto. * config/i386/avx512dqintrin.h: D

[PATCH 0/4] Fix no-evex512 function attribute

2023-10-30 Thread Haochen Jiang
Hi all, These four patches are going to fix no-evex512 function attribute. The detail of the issue comes following: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889 My proposal for this problem is to also push "no-evex512" when defining 128/256 intrins in AVX512. Besides, I added some new in

[PATCH] Fix incorrect option mask and avx512cd target push

2023-10-30 Thread Haochen Jiang
Hi all, This patch fixed two obvious bug in current evex512 implementation. Also, I moved AVX512CD+AVX512VL part out of the AVX512VL to avoid accidental handle miss in avx512cd in the future. Ok for trunk? BRs, Haochen gcc/ChangeLog: * config/i386/avx512cdintrin.h (target): Push evex5

[gccwwwdocs PATCH] gcc-13/14: Mention Intel new ISA and march support

2023-10-22 Thread Haochen Jiang
Hi all, This patch mentions recent update for x86-64 backend, including ISAs enabled update on previous introduced CPU and newly introduced options/ISAs/CPUs. Ok for wwwdocs? Thx, Haochen --- htdocs/gcc-13/changes.html | 8 htdocs/gcc-14/changes.html | 19 +++ 2 files

  1   2   >