Re: [PATCH] [x86] Remove unused mmx_pinsrw.

2023-10-20 Thread Uros Bizjak
On Fri, Oct 20, 2023 at 8:54 AM liuhongt wrote: > > When I'm working on enable more 32/64-bit vectorization for _Float16, > I notice there's 1 redundant define_expand, the patch removed the expander. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: >

Re: [x86 PATCH] PR target/110511: Fix reg allocation for widening multiplications.

2023-10-19 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 9:05 PM Roger Sayle wrote: > > > This patch contains clean-ups of the widening multiplication patterns in > i386.md, and provides variants of the existing highpart multiplication > peephole2 transformations (that tidy up register allocation after > reload), and thereby fixe

Re: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-18 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 7:54 PM Roger Sayle wrote: > > > Hi Uros, > Thanks for the speedy review. > > > From: Uros Bizjak > > Sent: 17 October 2023 17:38 > > > > On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle > > wrote: > > > > > > &g

Re: [x86 PATCH] PR 106245: Split (x<<31)>>31 as -(x&1) in i386.md

2023-10-17 Thread Uros Bizjak
On Tue, Oct 17, 2023 at 3:08 PM Roger Sayle wrote: > > > This patch is the backend piece of a solution to PRs 101955 and 106245, > that adds a define_insn_and_split to the i386 backend, to perform sign > extension of a single (least significant) bit using AND $1 then NEG. > > Previously, (x<<31)>>

Re: [PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Mon, Oct 16, 2023 at 9:58 PM Fangrui Song wrote: > > On Mon, Oct 16, 2023 at 12:10 PM Uros Bizjak wrote: > > > > On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote: > > > > > > On 2023-10-16, Uros Bizjak wrote: > > > >On Tue, Aug 1, 2023 at 9:5

Re: [PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song wrote: > > On 2023-10-16, Uros Bizjak wrote: > >On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote: > >> > >> When using -mcmodel=medium, large data objects larger than the > >> -mlarge-data-threshold thresh

Re: [PATCH v4] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song wrote: > > When using -mcmodel=medium, large data objects larger than the > -mlarge-data-threshold threshold are placed into large data sections > (.lrodata, .ldata, .lbss and some variants). GNU ld and ld.lld 17 place > .l* sections into separate outpu

Re: [X86 PATCH] Implement doubleword right shifts by 1 bit using s[ha]r+rcr.

2023-10-09 Thread Uros Bizjak
On Fri, Oct 6, 2023 at 3:59 PM Roger Sayle wrote: > > > Grr! I've done it again. ENOPATCH. > > > -Original Message- > > From: Roger Sayle > > Sent: 06 October 2023 14:58 > > To: 'gcc-patches@gcc.gnu.org' > > Cc: 'Uros Bizja

[COMMITTED] i386: Improve memory copy from named address space [PR111657]

2023-10-05 Thread Uros Bizjak
The stringop strategy selection algorithm falls back to a libcall strategy when it exhausts its pool of available strategies. The memory area copy function (memcpy) is not available from the system library for non-default address spaces, so the compiler emits the most trivial byte-at-a-time copy l

Re: [X86 PATCH] Implement doubleword shift left by 1 bit using add+adc.

2023-10-05 Thread Uros Bizjak
On Thu, Oct 5, 2023 at 1:45 PM Roger Sayle wrote: > > Doh! ENOPATCH. > > > -Original Message- > > From: Roger Sayle > > Sent: 05 October 2023 12:44 > > To: 'gcc-patches@gcc.gnu.org' > > Cc: 'Uros Bizjak' > > Subject: [X8

Re: [X86 PATCH] Split lea into shorter left shift by 2 or 3 bits with -Oz.

2023-10-05 Thread Uros Bizjak
On Thu, Oct 5, 2023 at 11:06 AM Roger Sayle wrote: > > > This patch avoids long lea instructions for performing x<<2 and x<<3 > by splitting them into shorter sal and move (or xchg instructions). > Because this increases the number of instructions, but reduces the > total size, its suitable for -O

[committed] i386: Handle CONST_WIDE_INT in output_pic_addr_const [PR111340]

2023-09-11 Thread Uros Bizjak via Gcc-patches
PR target/111340 gcc/ChangeLog: * config/i386/i386.cc (output_pic_addr_const): Handle CONST_WIDE_INT. Call output_addr_const for CASE_CONST_SCALAR_INT. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111340.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-06 Thread Uros Bizjak via Gcc-patches
On Wed, Sep 6, 2023 at 9:43 PM Vladimir Makarov wrote: > > > On 9/1/23 05:07, Hongyu Wang wrote: > > Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:16写道: > >> On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote: > >>> From: Kong Lingling > >>> >

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-04 Thread Uros Bizjak via Gcc-patches
On Mon, Sep 4, 2023 at 2:28 AM Hongtao Liu wrote: > > > > > > > I think there should be some constraint which explicitly has all > > > > > > > the 32 > > > > > > > GPRs, like there is one for just all 16 GPRs (h), so that > > > > > > > regardless of > > > > > > > -mapx-inline-asm-use-gpr32 one

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-01 Thread Uros Bizjak via Gcc-patches
On Fri, Sep 1, 2023 at 12:36 PM Hongtao Liu wrote: > > On Fri, Sep 1, 2023 at 5:38 PM Uros Bizjak via Gcc-patches > wrote: > > > > On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang wrote: > > > > > > Uros Bizjak via Gcc-patches 于2023年8月31日周四 > > > 18:

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-01 Thread Uros Bizjak via Gcc-patches
On Fri, Sep 1, 2023 at 11:10 AM Hongyu Wang wrote: > > Uros Bizjak via Gcc-patches 于2023年8月31日周四 18:01写道: > > > > On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches > > wrote: > > > > > > On Thu, Aug 31, 2023 at 04:20:17PM +0800

Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-08-31 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote: > > From: Kong Lingling > > Current reload infrastructure does not support selective base_reg_class > for backend insn. Add insn argument to base_reg_class for > lra/reload usage. I don't think this is the correct approach. Ideally, a memory co

Re: [PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)

2023-08-31 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 31, 2023 at 10:20 AM Hongyu Wang wrote: > > From: Kong Lingling > > These legacy insn in opcode map0/1 only support GPR16, > and do not have vex/evex counterpart, directly adjust constraints and > add gpr32 attr to patterns. > > insn list: > 1. xsave/xsave64, xrstor/xrstor64 > 2. xsav

Re: [PATCH 06/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-08-31 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 31, 2023 at 11:18 AM Jakub Jelinek via Gcc-patches wrote: > > On Thu, Aug 31, 2023 at 04:20:17PM +0800, Hongyu Wang via Gcc-patches wrote: > > From: Kong Lingling > > > > In inline asm, we do not know if the insn can use EGPR, so disable EGPR > > usage by default from mapping the comm

[PATCH] fortran: Rename TRUE/FALSE to true/false in *.cc files

2023-08-25 Thread Uros Bizjak via Gcc-patches
gcc/fortran/ChangeLog: * match.cc (gfc_match_equivalence): Rename TRUE/FALSE to true/false. * module.cc (check_access): Ditto. * primary.cc (match_real_constant): Ditto. * trans-array.cc (gfc_trans_allocate_array_storage): Ditto. (get_array_ctor_strlen): Ditto. * trans-comm

[committed] treewide: Rename TRUE/FALSE to true/false in *.cc files

2023-08-25 Thread Uros Bizjak via Gcc-patches
gcc/c-family/ChangeLog: * c-format.cc (read_any_format_width): Rename TRUE/FALSE to true/false. gcc/ChangeLog: * caller-save.cc (new_saved_hard_reg): Rename TRUE/FALSE to true/false. (setup_save_areas): Ditto. * gcc.cc (set_collect_gcc_options): Ditto. (driver::build_

[committed] i386: Optimize pinsrq of 0 with index 1 into movq [PR94866]

2023-08-24 Thread Uros Bizjak via Gcc-patches
Add new pattern involving vec_merge RTX that is produced by combine from the combination of sse4_1_pinsrq and *movdi_internal: 7: r86:DI=0 8: r85:V2DI=vec_merge(vec_duplicate(r86:DI),r87:V2DI,0x2) REG_DEAD r87:V2DI REG_DEAD r86:DI Successfully matched this instruction: (set (re

Re: [PATCH 6/12] i386: Enable _BitInt on x86-64 [PR102989]

2023-08-23 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 8:19 PM Jakub Jelinek wrote: > > Hi! > > The following patch enables _BitInt support on x86-64, the only > target which has _BitInt specified in psABI. > > 2023-08-09 Jakub Jelinek > > PR c/102989 > * config/i386/i386.cc (classify_argument): Handle BITINT_

[committed] i386: Fix register spill failure with concat RTX [PR111010]

2023-08-23 Thread Uros Bizjak via Gcc-patches
Disable (=&r,m,m) alternative for 32-bit targets. The combination of two memory operands (possibly with complex addressing mode), early clobbered output, frame pointer and PIC registers uses too many registers on a register constrained 32-bit target. Also merge two similar patterns using DWIH mode

[committed] i386: Micro-optimize ix86_expand_sse_extend

2023-08-20 Thread Uros Bizjak via Gcc-patches
Partial vector src is forced to a register as ops[1], we can use it instead of SRC in the call to ix86_expand_sse_cmp. This change avoids forcing operand[1] to a register in sign/zero-extend expanders. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_sse_extend): Use ops[1] inste

[committed]: i386: Use PUNPCKL?? to implement vector extend and zero_extend for TARGET_SSE2 [PR111023]

2023-08-18 Thread Uros Bizjak via Gcc-patches
Implement vector extend and zero_extend functionality for TARGET_SSE2 using PUNPCKL?? family of instructions. The code for e.g. zero-extend from V2SI to V2DImode improves from: movd%xmm0, %edx pshufd $85, %xmm0, %xmm0 movd%xmm0, %eax movq%rdx, (%rdi)

Re: [PATCH] Generate vmovapd instead of vmovsd for moving DFmode between SSE_REGS.

2023-08-14 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 14, 2023 at 4:46 AM liuhongt via Gcc-patches wrote: > > vmovapd can enable register renaming and have same code size as > vmovsd. Similar for vmovsh vs vmovaps, vmovaps is 1 byte less than > vmovsh. > > When TARGET_AVX512VL is not available, still generate > vmovsd/vmovss/vmovsh to avo

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-10 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 10, 2023 at 9:40 AM Richard Biener wrote: > > On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > > > Currently we have 3 different independent tunes for gather > > "use_gather,use_gather_2parts,use_gather_4parts", > > similar for scatter, there're > > "use_scatter,use_scatter_2parts,

Re: [PATCH] Support -m[no-]gather -m[no-]scatter to enable/disable vectorization for all gather/scatter instructions.

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 10, 2023 at 3:13 AM liuhongt wrote: > > Currently we have 3 different independent tunes for gather > "use_gather,use_gather_2parts,use_gather_4parts", > similar for scatter, there're > "use_scatter,use_scatter_2parts,use_scatter_4parts" > > The patch support 2 standardizing options to

Re: [PATCH] i386: Do not sanitize upper part of V2HFmode and V4HFmode reg with -fno-trapping-math [PR110832]

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 10, 2023 at 2:49 AM liuhongt wrote: > > Also add ix86_partial_vec_fp_math to to condition of V2HF/V4HF named > patterns in order to avoid generation of partial vector V8HFmode > trapping instructions. > > Bootstrapped and regtseted on x86_64-pc-linux-gnu{-m32,} > Ok for trunk? > > gcc/

Re: [PATCH] i386: Clear upper bits of XMM register for V4HFmode/V2HFmode operations [PR110762]

2023-08-09 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 7, 2023 at 1:20 PM Richard Biener wrote: > > Please also note the RFC patch [1] that relaxes clears for V2SFmode > > with -fno-trapping-math. The patched compiler will then emit the same > > code as clang does for -O2. Which raises another question - should gcc > > default to -fno-tra

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 8:38 AM Uros Bizjak wrote: > > On Wed, Aug 9, 2023 at 8:37 AM Liu, Hongtao wrote: > > > > > > > > > -Original Message- > > > From: Uros Bizjak > > > Sent: Wednesday, August 9, 2023 2:33 PM > > > To: Liu,

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 8:37 AM Liu, Hongtao wrote: > > > > > -Original Message- > > From: Uros Bizjak > > Sent: Wednesday, August 9, 2023 2:33 PM > > To: Liu, Hongtao > > Cc: gcc-patches@gcc.gnu.org > > Subject: Re: [PATCH V2] [X86] Work

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 3:48 AM liuhongt wrote: > > > Please rather do it in a more self-descriptive way, as proposed in the > > attached patch. You won't need a comment then. > > > > Adjusted in V2 patch. > > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is > supported via EAX. > > Intel

Re: [PATCH V2] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 9, 2023 at 3:48 AM liuhongt wrote: > > > Please rather do it in a more self-descriptive way, as proposed in the > > attached patch. You won't need a comment then. > > > > Adjusted in V2 patch. > > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is > supported via EAX. > > Intel

[committed] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-08-08 Thread Uros Bizjak via Gcc-patches
Also introduce -m[no-]partial-vector-fp-math option to disable trapping V2SF named patterns in order to avoid generation of partial vector V4SFmode trapping instructions. The new option is enabled by default, because even with sanitization, a small but consistent speed up of 2 to 3% with Polyhedro

Re: [PATCH] [X86] Workaround possible CPUID bug in Sandy Bridge.

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 8, 2023 at 9:58 AM liuhongt wrote: > > Don't access leaf 7 subleaf 1 unless subleaf 0 says it is > supported via EAX. > > Intel documentation says invalid subleaves return 0. We had been > relying on that behavior instead of checking the max sublef number. > > It appears that some Sand

Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 8, 2023 at 12:08 PM Richard Biener wrote: > > > > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > > > > > named patterns in order to avoid generation of partial vector > > > > > > V4SFmode > > > > > > trapping instructions. > > > > > > > > > > > > The new

Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-08-08 Thread Uros Bizjak via Gcc-patches
On Tue, Aug 8, 2023 at 10:07 AM Richard Biener wrote: > > On Mon, 7 Aug 2023, Uros Bizjak wrote: > > > On Mon, Jul 31, 2023 at 11:40?AM Richard Biener wrote: > > > > > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > > > > > Also introduce

Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-08-07 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 31, 2023 at 11:40 AM Richard Biener wrote: > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > named patterns in order to avoid generation of partial vector V4SFmode > > trapping in

Re: PR target/107671: Make more use of btl/btq on x86_64.

2023-08-07 Thread Uros Bizjak via Gcc-patches
ith and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-08-07 Roger Sayle > Uros Bizjak > > gcc/ChangeLog > PR target/107671 > * config/i386/i386.md (*bt_setc_mask): Allow the > shift count to

Re: [PATCH] i386: Clear upper bits of XMM register for V4HFmode/V2HFmode operations [PR110762]

2023-08-07 Thread Uros Bizjak via Gcc-patches
On Mon, Aug 7, 2023 at 10:57 AM liuhongt wrote: > > Similar like r14-2786-gade30fad6669e5, the patch is for V4HF/V2HFmode. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/110762 > * config/i386/mmx.md (3): Changed from

Re: [x86 PATCH] Split SUBREGs of SSE vector registers into vec_select insns.

2023-08-03 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 3, 2023 at 9:10 AM Roger Sayle wrote: > > > This patch is the final piece in the series to improve the ABI issues > affecting PR 88873. The previous patches tackled inserting DFmode > values into V2DFmode registers, by introducing insvti_{low,high}part > patterns. This patch improves

Re: [x86 PATCH] PR target/110792: Early clobber issues with rot32di2_doubleword.

2023-08-02 Thread Uros Bizjak via Gcc-patches
On Thu, Aug 3, 2023 at 12:18 AM Roger Sayle wrote: > > > This patch is a conservative fix for PR target/110792, a wrong-code > regression affecting doubleword rotations by BITS_PER_WORD, which > effectively swaps the highpart and lowpart words, when the source to be > rotated resides in memory. Th

Re: [PATCH] Optimize vlddqu + inserti128 to vbroadcasti128

2023-08-01 Thread Uros Bizjak via Gcc-patches
On Wed, Aug 2, 2023 at 3:33 AM liuhongt wrote: > > In [1], I propose a patch to generate vmovdqu for all vlddqu intrinsics > after AVX2, it's rejected as > > The instruction is reachable only as __builtin_ia32_lddqu* (aka > > _mm_lddqu_si*), so it was chosen by the programmer for a reason. I > > t

Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-07-31 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 31, 2023 at 11:40 AM Richard Biener wrote: > > On Sun, 30 Jul 2023, Uros Bizjak wrote: > > > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > > named patterns in order to avoid generation of partial vector V4SFmode > > trapping in

[RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832]

2023-07-30 Thread Uros Bizjak via Gcc-patches
Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF named patterns in order to avoid generation of partial vector V4SFmode trapping instructions. The new option is enabled by default, because even with sanitization, a small but consistent speed up of 2 to 3% with Polyhedron capaci

[committed] testsuite: Fix gfortran.dg/ieee/comparisons_3.F90 testsuite failures

2023-07-26 Thread Uros Bizjak via Gcc-patches
The testcase should use dg-additional-options instead of dg-options to not overwrite default compile flags that include path for finding the IEEE modules. gcc/testsuite/ChangeLog: * gfortran.dg/ieee/comparisons_3.F90: Use dg-additional-options instead of dg-options. Tested on x86_64-linu

[committed] i386: Clear upper half of XMM register for V2SFmode operations [PR110762]

2023-07-26 Thread Uros Bizjak via Gcc-patches
Clear the upper half of a V4SFmode operand register in front of all potentially trapping instructions. The testcase: --cut here-- typedef float v2sf __attribute__((vector_size(8))); typedef float v4sf __attribute__((vector_size(16))); v2sf test(v4sf x, v4sf y) { v2sf x2, y2; x2 = __builtin_s

Re: [x86 PATCH] Don't use insvti_{high, low}part with -O0 (for compile-time).

2023-07-22 Thread Uros Bizjak via Gcc-patches
On Sat, Jul 22, 2023 at 4:17 PM Roger Sayle wrote: > > > This patch attempts to help with PR rtl-optimization/110587, a regression > of -O0 compile time for the pathological pr28071.c. My recent patch helps > a bit, but hasn't returned -O0 compile-time to where it was before my > ix86_expand_move

Re: [x86 PATCH] Use QImode for offsets in zero_extract/sign_extract in i386.md

2023-07-22 Thread Uros Bizjak via Gcc-patches
On Sat, Jul 22, 2023 at 5:37 PM Roger Sayle wrote: > > > As suggested by Uros, this patch changes the ZERO_EXTRACTs and SIGN_EXTRACTs > in i386.md to consistently use QImode for bit offsets (i.e. third and fourth > operands), matching the use of QImode for bit counts in shifts and rotates. > > The

[committed] i386: Double-word sign-extension missed-optimization [PR110717]

2023-07-20 Thread Uros Bizjak via Gcc-patches
When sign-extending the value in a double-word register pair using shift and ashiftrt sequence with the same count immediate value less than word width, there is no need to shift the lower word of the value. The sign-extension could be limited to the upper word, but we uselessly shift the lower wor

Re: [PATCH] Optimize vlddqu to vmovdqu for TARGET_AVX

2023-07-20 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 20, 2023 at 9:35 AM liuhongt wrote: > > For Intel processors, after TARGET_AVX, vmovdqu is optimized as fast > as vlddqu, UNSPEC_LDDQU can be removed to enable more optimizations. > Can someone confirm this with AMD folks? > If AMD doesn't like such optimization, I'll put my optimizati

Re: [x86_64 PATCH] More TImode parameter passing improvements.

2023-07-20 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 20, 2023 at 9:44 AM Roger Sayle wrote: > > > Hi Uros, > > > From: Uros Bizjak > > Sent: 20 July 2023 07:50 > > > > On Wed, Jul 19, 2023 at 10:07 PM Roger Sayle > > wrote: > > > > > > This patch is the next piece of a solut

Re: [x86_64 PATCH] More TImode parameter passing improvements.

2023-07-19 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 19, 2023 at 10:07 PM Roger Sayle wrote: > > > This patch is the next piece of a solution to the x86_64 ABI issues in > PR 88873. This splits the *concat3_3 define_insn_and_split > into two patterns, a TARGET_64BIT *concatditi3_3 and a !TARGET_64BIT > *concatsidi3_3. This allows us to

Re: [GCC 13 PATCH] PR target/109973: CCZmode and CCCmode variants of [v]ptest.

2023-07-19 Thread Uros Bizjak via Gcc-patches
tested reverting > r13-2006-ga56c1641e9d25e successfully. Can we choose between the > options please? Sorry I'm only bringing this up now but 13.2 RC is due > tomorrow. > > Thank you, > Richard. > > > > > > > 2023-06-10 Roger Sayle &g

[committed] dwarf2: Change return type of predicate functions from int to bool

2023-07-18 Thread Uros Bizjak via Gcc-patches
Also change some internal variables and function arguments from int to bool. gcc/ChangeLog: * dwarf2asm.cc: Change FALSE to false. * dwarf2cfi.cc (execute_dwarf2_frame): Change return type to void. * dwarf2out.cc (matches_main_base): Change return type from int to bool. Change "l

[committed] combine: Change return type of predicate functions from int to bool

2023-07-17 Thread Uros Bizjak via Gcc-patches
Also change some internal variables and function arguments from int to bool. gcc/ChangeLog: * combine.cc (struct reg_stat_type): Change last_set_invalid to bool. (cant_combine_insn_p): Change return type from int to bool and adjust function body accordingly. (can_combine_p): Ditto

Re: [PATCH 1/2] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-07-17 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 17, 2023 at 10:28 AM Hongtao Liu wrote: > > I'd like to ping for this patch (only patch 1/2, for patch 2/2, I > think that may not be necessary). > > On Mon, May 15, 2023 at 9:20 AM Hongtao Liu wrote: > > > > ping. > > > > On Fri, Apr 21, 2023 at 9:55 PM liuhongt wrote: > > > > > > >

Re: [PATCH] Add peephole to eliminate redundant comparison after cmpccxadd.

2023-07-17 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 17, 2023 at 8:44 AM Hongtao Liu wrote: > > Ping. > > On Tue, Jul 11, 2023 at 5:16 PM liuhongt via Gcc-patches > wrote: > > > > Similar like we did for CMPXCHG, but extended to all > > ix86_comparison_int_operator since CMPCCXADD set EFLAGS exactly same > > as CMP. > > > > When operand

Re: [PATCH] x86: replace "extendhfdf2" expander

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 11:44 AM Jan Beulich wrote: > > The corresponding insn serves this purpose quite fine, and leads to > slightly less (generated) code. All we need is the insn to not have a > leading * in its name, while retaining that * for "extendhfsf2". > Introduce a mode attribute in exc

Re: [x86 PATCH] PR target/110588: Add *bt_setncqi_2 to generate btl

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 11:27 AM Roger Sayle wrote: > > > > From: Uros Bizjak > > Sent: 13 July 2023 19:21 > > > > On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle > > wrote: > > > > > > This patch resolves PR target/110588 to catch another

Re: [PATCH] cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206]

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 10:53 AM Richard Biener wrote: > > On Fri, 14 Jul 2023, Uros Bizjak wrote: > > > On Fri, Jul 14, 2023 at 10:31?AM Richard Biener wrote: > > > > > > On Fri, 14 Jul 2023, Uros Bizjak wrote: > > > > > > > cprop1 pass

Re: [PATCH] cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206]

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 10:31 AM Richard Biener wrote: > > On Fri, 14 Jul 2023, Uros Bizjak wrote: > > > cprop1 pass does not consider paradoxical subreg and for (insn 22) claims > > that it equals 8 elements of HImodeby setting REG_EQUAL note: > > > > (

Re: [x86_64 PATCH] Improved insv of DImode/DFmode {high,low}parts into TImode.

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 13, 2023 at 6:45 PM Roger Sayle wrote: > > > This is the next piece towards a fix for (the x86_64 ABI issues affecting) > PR 88873. This patch generalizes the recent tweak to ix86_expand_move > for setting the highpart of a TImode reg from a DImode source using > *insvti_highpart_1, t

Re: [PATCH] i386: Auto vectorize usdot_prod, udot_prod with AVXVNNIINT16 instruction.

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 8:24 AM Haochen Jiang wrote: > > Hi all, > > This patch aims to auto vectorize usdot_prod and udot_prod with newly > introduced AVX-VNNI-INT16. > > Also I refined the redundant mode iterator in the patch. > > Regtested on x86_64-pc-linux-gnu. Ok for trunk after AVX-VNNI-INT

[PATCH] cprop: Do not set REG_EQUAL note when simplifying paradoxical subreg [PR110206]

2023-07-13 Thread Uros Bizjak via Gcc-patches
cprop1 pass does not consider paradoxical subreg and for (insn 22) claims that it equals 8 elements of HImodeby setting REG_EQUAL note: (insn 21 19 22 4 (set (reg:V4QI 98) (mem/u/c:V4QI (symbol_ref/u:DI ("*.LC1") [flags 0x2]) [0 S4 A32])) "pr110206.c":12:42 1530 {*movv4qi_internal} (

Re: [x86 PATCH] PR target/110588: Add *bt_setncqi_2 to generate btl

2023-07-13 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle wrote: > > > This patch resolves PR target/110588 to catch another case in combine > where the i386 backend should be generating a btl instruction. This adds > another define_insn_and_split to recognize the RTL representation for this > case. > > I also

[committed] alpha: Fix computation mode in alpha_emit_set_long_cost [PR106966]

2023-07-13 Thread Uros Bizjak via Gcc-patches
PR target/106966 gcc/ChangeLog: * config/alpha/alpha.cc (alpha_emit_set_long_const): Always use DImode when constructing long const. gcc/testsuite/ChangeLog: * gcc.target/alpha/pr106966.c: New test. Bootstrapped and regression tested by Matthias on alpha-linux-gnu. Uros. diff

[committed] IRA+LRA: Change return type of predicate functions from int to bool

2023-07-12 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog: * ira.cc (equiv_init_varies_p): Change return type from int to bool and adjust function body accordingly. (equiv_init_movable_p): Ditto. (memref_used_between_p): Ditto. * lra-constraints.cc (valid_address_p): Ditto. Bootstrapped and regression tested on x86_64-l

[committed] ifcvt: Change return type of predicate functions from int to bool

2023-07-12 Thread Uros Bizjak via Gcc-patches
Also change some internal variables and function arguments from int to bool. gcc/ChangeLog: * ifcvt.cc (cond_exec_changed_p): Change variable to bool. (last_active_insn): Change "skip_use_p" function argument to bool. (noce_operand_ok): Change return type from int to bool. (find_c

Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-12 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 12, 2023 at 12:58 PM Uros Bizjak wrote: > > On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford > wrote: > > > > Richard Biener via Gcc-patches writes: > > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak wrote: > > >> > > &g

Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-12 Thread Uros Bizjak via Gcc-patches
On Wed, Jul 12, 2023 at 12:23 PM Richard Sandiford wrote: > > Richard Biener via Gcc-patches writes: > > On Mon, Jul 10, 2023 at 1:01 PM Uros Bizjak wrote: > >> > >> On Mon, Jul 10, 2023 at 11:47 AM Richard Biener > >> wrote: > >> > > &g

Re: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c

2023-07-12 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 11, 2023 at 10:07 PM Roger Sayle wrote: > > > The recent change in TImode parameter passing on x86_64 results in the > FAIL of pr91681-1.c. The issue is that with the extra flexibility, > the combine pass is now spoilt for choice between using either the > *add3_doubleword_concat or t

Re: [x86 PATCH] PR target/110598: Fix rega = 0; rega ^= rega regression.

2023-07-12 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 11, 2023 at 9:07 PM Roger Sayle wrote: > > > This patch fixes the regression PR target/110598 caused by my recent > addition of a peephole2. The intention of that optimization was to > simplify zeroing a register, followed by an IOR, XOR or PLUS operation > on it into a move, or as de

[committed] cfg+gcse: Change return type of predicate functions from int to bool

2023-07-11 Thread Uros Bizjak via Gcc-patches
Also change some internal variables from int to bool. gcc/ChangeLog: * cfghooks.cc (verify_flow_info): Change "err" variable to bool. * cfghooks.h (struct cfg_hooks): Change return type of verify_flow_info from integer to bool. * cfgrtl.cc (can_delete_note_p): Change return type f

[committed] reorg: Change return type of predicate functions from int to bool

2023-07-10 Thread Uros Bizjak via Gcc-patches
Also change some internal variables and function arguments from int to bool. gcc/ChangeLog: * reorg.cc (stop_search_p): Change return type from int to bool and adjust function body accordingly. (resource_conflicts_p): Ditto. (insn_references_resource_p): Change return type from in

Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-10 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 10, 2023 at 11:47 AM Richard Biener wrote: > > On Mon, Jul 10, 2023 at 11:26 AM Uros Bizjak wrote: > > > > On Mon, Jul 10, 2023 at 11:17 AM Richard Biener > > wrote: > > > > > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches >

Re: [PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-10 Thread Uros Bizjak via Gcc-patches
On Mon, Jul 10, 2023 at 11:17 AM Richard Biener wrote: > > On Sun, Jul 9, 2023 at 10:53 AM Uros Bizjak via Gcc-patches > wrote: > > > > As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx: > > > > (gdb) list > > 469 if (code =

Re: [X86 PATCH] Add new insvti_lowpart_1 and insvdi_lowpart_1 patterns.

2023-07-09 Thread Uros Bizjak via Gcc-patches
On Sun, Jul 9, 2023 at 11:30 PM Roger Sayle wrote: > > > This patch implements another of Uros' suggestions, to investigate a > insvti_lowpart_1 pattern to improve TImode parameter passing on x86_64. > In PR 88873, the RTL the middle-end expands for passing V2DF in TImode > is subtly different fro

Re: [x86 PATCH] Add AVX512 support for STV of SI/DImode rotation by constant.

2023-07-09 Thread Uros Bizjak via Gcc-patches
On Sun, Jul 9, 2023 at 10:35 PM Roger Sayle wrote: > > > Following Uros' suggestion, this patch adds support for AVX512VL's > vpro[lr][dq] instructions to the recently added scalar-to-vector (STV) > enhancements to handle DImode and SImode rotations by a constant. > > For the test cases: > > unsig

[PATCH] simplify-rtx: Fix invalid simplification with paradoxical subregs [PR110206]

2023-07-09 Thread Uros Bizjak via Gcc-patches
As shown in the PR, simplify_gen_subreg call in simplify_replace_fn_rtx: (gdb) list 469 if (code == SUBREG) 470 { 471 op0 = simplify_replace_fn_rtx (SUBREG_REG (x), old_rtx, fn, data); 472 if (op0 == SUBREG_REG (x)) 473 return x; 47

[committed] cprop: Change return type of predicate functions from int to bool

2023-07-08 Thread Uros Bizjak via Gcc-patches
Also change some internal variables from int to bool. gcc/ChangeLog: * cprop.cc (reg_available_p): Change return type from int to bool. (reg_not_set_p): Ditto. (try_replace_reg): Ditto. Change "success" variable to bool. (cprop_jump): Change return type from int to void and a

[committed] gcse: Change return type of predicate functions from int to bool

2023-07-08 Thread Uros Bizjak via Gcc-patches
Also change some internal variables and function arguments from int to bool. gcc/ChangeLog: * gcse.cc (expr_equiv_p): Change return type from int to bool. (oprs_unchanged_p): Change return type from int to void and adjust function body accordingly. (oprs_anticipatable_p): Ditto.

Re: [PATCH V2] [x86] Add pre_reload splitter to detect fp min/max pattern.

2023-07-06 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 7, 2023 at 7:31 AM liuhongt wrote: > > > Please split the above pattern into two, one emitting UNSPEC_IEEE_MAX > > and the other emitting UNSPEC_IEEE_MIN. > Splitted. > > > The test involves blendv instruction, which is SSE4.1, so it is > > pointless to test it without -msse4.1. Please

Re: [x86_64 PATCH] Improve __int128 argument passing (in ix86_expand_move).

2023-07-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 3:48 PM Roger Sayle wrote: > > > On Thu, Jul 6, 2023 at 2:04 PM Roger Sayle > > wrote: > > > > > > > > > Passing 128-bit integer (TImode) parameters on x86_64 can sometimes > > > result in surprising code. Consider the example below (from PR 43644): > > > > > > __uint128 f

Re: [x86_64 PATCH] Improve __int128 argument passing (in ix86_expand_move).

2023-07-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 2:04 PM Roger Sayle wrote: > > > Passing 128-bit integer (TImode) parameters on x86_64 can sometimes > result in surprising code. Consider the example below (from PR 43644): > > __uint128 foo(__uint128 x, unsigned long long y) { > return x+y; > } > > which currently resul

Re: [PATCH] i386: Update document for inlining rules

2023-07-06 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 8:39 AM Hongyu Wang wrote: > > Hi, > > This is a follow-up patch for > https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623525.html > that updates document about x86 inlining rules. > > Ok for trunk? > > gcc/ChangeLog: > > * doc/extend.texi: Move x86 inlining rule

Re: [PATCH 1/2] [x86] Add pre_reload splitter to detect fp min/max pattern.

2023-07-05 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 3:20 AM liuhongt wrote: > > We have ix86_expand_sse_fp_minmax to detect min/max sematics, but > it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for > the testcase in the PR, there's an extra move from cmp_op0 to if_true, > and it failed ix86_expand_sse_fp_m

Re: [PATCH 2/2] Adjust rtx_cost for DF/SFmode AND/IOR/XOR/ANDN operations.

2023-07-05 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 3:20 AM liuhongt wrote: > > They should have same cost as vector mode since both generate > pand/pandn/pxor/por instruction. > > Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.cc (ix86_rtx_costs): Ad

Re: [PATCH] Disparage slightly for the alternative which move DFmode between SSE_REGS and GENERAL_REGS.

2023-07-05 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 6, 2023 at 3:14 AM liuhongt wrote: > > For testcase > > void __cond_swap(double* __x, double* __y) { > bool __r = (*__x < *__y); > auto __tmp = __r ? *__x : *__y; > *__y = __r ? *__y : *__x; > *__x = __tmp; > } > > GCC-14 with -O2 and -march=x86-64 options generates the followi

[committed] sched: Change return type of predicate functions from int to bool

2023-07-05 Thread Uros Bizjak via Gcc-patches
Also change some internal variables to bool. gcc/ChangeLog: * sched-int.h (struct haifa_sched_info): Change can_schedule_ready_p, scehdule_more_p and contributes_to_priority indirect frunction type from int to bool. (no_real_insns_p): Change return type from int to bool. (cont

Re: [PATCH V2] i386: Inline function with default arch/tune to caller

2023-07-04 Thread Uros Bizjak via Gcc-patches
le description to the new subsubsection? > > > Looking at the above, perhaps inlining of different arches can also be > > forced with always_inline? This would allow developers some control of > > inlining, and would not be surprising. > > If so, I'd like to add the a

Re: [PATCH V2] i386: Inline function with default arch/tune to caller

2023-07-03 Thread Uros Bizjak via Gcc-patches
On Tue, Jul 4, 2023 at 5:12 AM Hongyu Wang wrote: > > Hi, > > For function with different target attributes, current logic rejects to > inline the callee when any arch or tune is mismatched. Relax the > condition to allow callee with default arch/tune to be inlined. > > Boostrapped/regtested on x8

[committed] tree+ggc: Change return type of predicate functions from int to bool

2023-07-03 Thread Uros Bizjak via Gcc-patches
Also change internal variable from int to bool. gcc/ChangeLog: * tree.h (tree_int_cst_equal): Change return type from int to bool. (operand_equal_for_phi_arg_p): Ditto. (tree_map_base_marked_p): Ditto. * tree.cc (contains_placeholder_p): Update function body for bool return ty

[committed] fold-const+optabs: Change return type of predicate functions from int to bool

2023-06-30 Thread Uros Bizjak via Gcc-patches
Also change some internal variables and function argument from int to bool. gcc/ChangeLog: * fold-const.h (multiple_of_p): Change return type from int to bool. * fold-const.cc (split_tree): Change negl_p, neg_litp_p, neg_conp_p and neg_var_p variables to bool. (const_binop): Chang

Re: [x86 PATCH] Add STV support for DImode and SImode rotations by constant.

2023-06-30 Thread Uros Bizjak via Gcc-patches
On Fri, Jun 30, 2023 at 9:29 AM Roger Sayle wrote: > > > This patch implements scalar-to-vector (STV) support for DImode and SImode > rotations by constant bit counts. Scalar rotations are almost always > optimal on x86, requiring only one or two instructions, but it is also > possible to impleme

[committed] cselib+expr+bitmap: Change return type of predicate functions from int to bool

2023-06-29 Thread Uros Bizjak via Gcc-patches
gcc/ChangeLog: * cselib.h (rtx_equal_for_cselib_1): Change return type from int to bool. (references_value_p): Ditto. (rtx_equal_for_cselib_p): Ditto. * expr.h (can_store_by_pieces): Ditto. (try_casesi): Ditto. (try_tablejump): Ditto. (safe_from_p): Ditto. * sbi

[committed] final+varasm: Change return type of predicate functions from int to bool

2023-06-28 Thread Uros Bizjak via Gcc-patches
Also change some internal variables to bool and change return type of compute_alignments to void. gcc/ChangeLog: * output.h (leaf_function_p): Change return type from int to bool. (final_forward_branch_p): Ditto. (only_leaf_regs_used): Ditto. (maybe_assemble_visibility): Ditto.

Re: [PATCH] i386: Relax inline requirement for functions with different target attrs

2023-06-28 Thread Uros Bizjak via Gcc-patches
ne specified. We expect "default" callee to have properties that allow inlining it into all callers, independent of callers arch/tune target attribute. Uros. > > Uros Bizjak 于2023年6月28日周三 14:43写道: > > > > On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang wrote: > > >

<    1   2   3   4   5   6   7   8   9   10   >