Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs
On 15/06/2023 10:58, Richard Biener wrote: On Thu, 15 Jun 2023, Andrew Stubbs wrote: On 14/06/2023 15:29, Richard Biener wrote: Am 14.06.2023 um 16:27 schrieb Andrew Stubbs : On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote: This implemens fully masked vectorization or a masked

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-15 Thread Andrew Stubbs
On 14/06/2023 15:29, Richard Biener wrote: Am 14.06.2023 um 16:27 schrieb Andrew Stubbs : On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote: This implemens fully masked vectorization or a masked epilog for AVX512 style masks which single themselves out by representing each lane

Re: [PATCH 3/3] AVX512 fully masked vectorization

2023-06-14 Thread Andrew Stubbs
On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote: This implemens fully masked vectorization or a masked epilog for AVX512 style masks which single themselves out by representing each lane with a single bit and by using integer modes for the mask (both is much like GCN). AVX512 is also s

[PATCH] vect: Vectorize via libfuncs

2023-06-13 Thread Andrew Stubbs
This patch allows vectorization when operators are available as libfuncs, rather that only as insns. This will be useful for amdgcn where we plan to vectorize loops that contain integer division or modulus, but don't want to generate inline instructions for the division algorithm every time.

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Andrew Stubbs
On 09/06/2023 10:02, Richard Sandiford wrote: Andrew Stubbs writes: On 07/06/2023 20:42, Richard Sandiford wrote: I don't know if this helps (probably not), but we have a similar situation on AArch64: a 64-bit mode like V8QI can be doubled to a 128-bit vector or to a pair of 64-bit ve

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-09 Thread Andrew Stubbs
On 07/06/2023 20:42, Richard Sandiford wrote: I don't know if this helps (probably not), but we have a similar situation on AArch64: a 64-bit mode like V8QI can be doubled to a 128-bit vector or to a pair of 64-bit vectors. We used V16QI for the former and "V2x8QI" for the latter. V2x8QI is for

Re: [Patch] libgomp: plugin-gcn - support 'unified_address'

2023-06-06 Thread Andrew Stubbs
On 06/06/2023 16:33, Tobias Burnus wrote: Andrew: Does the GCN change look okay to you? This patch permits to use GCN devices with 'omp requires unified_address' which in principle works already, except that the requirement handling did disable it. (It also updates libgomp.texi for this chan

Re: [PATCH] Add COMPLEX_VECTOR_INT modes

2023-06-05 Thread Andrew Stubbs
On 30/05/2023 07:26, Richard Biener wrote: On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote: Hi all, I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just do it because the GCC middle-end models DIVMOD's return value as "complex int" type, and

Re: [patch] amdgcn: Change -m(no-)xnack to -mxnack=(on,off,any)

2023-05-26 Thread Andrew Stubbs
OK. Andrew On 26/05/2023 15:58, Tobias Burnus wrote: (Update the syntax of the amdgcn commandline option in anticipation of later patches; while -m(no-)xnack is in mainline since r12-2396-gaad32a00b7d2b6 (for PR100208), -mxsnack (contrary to -msram-ecc) is currently mostly a stub for later pat

[PATCH] Add COMPLEX_VECTOR_INT modes

2023-05-26 Thread Andrew Stubbs
Hi all, I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just do it because the GCC middle-end models DIVMOD's return value as "complex int" type, and there are no vector equivalents of that type. Therefore, this patch adds minimal support for "complex vector int" modes.

Re: [Patch] GCN: Silence unused-variable warning

2023-05-05 Thread Andrew Stubbs
On 05/05/2023 12:10, Tobias Burnus wrote: Probably added for symmetry with out_mode/out_n but at the end not used. That function was added in commit   r13-6423-gce9cd7258d0 amdgcn: Enable SIMD vectorization of math functions Tested the removal by building with that patch applied. OK for mainl

[committed] amdgcn: Fix addsub bug

2023-04-27 Thread Andrew Stubbs
I've committed this patch to fix a couple of bugs introduced in the recent CMul patch. First, the fmsubadd insn was accidentally all adds and no substracts. Second, there were input dependencies on the undefined output register which caused the compiler to reserve unnecessary slots in the stac

[committed][OG10] amdgcn, openmp: Fix concurrency in low-latency allocator

2023-04-20 Thread Andrew Stubbs
I've committed this to the devel/omp/gcc-12 branch. The patch fixes a concurrency issue where the spin-locks didn't work well if many GPU threads tried to free low-latency memory all at once. Adding a short sleep instruction is enough for the hardware thread to yield and allow another to proc

[committed] amdgcn: bug fix ldexp insn

2023-04-20 Thread Andrew Stubbs
The hardfp division patch exposed a flaw in the ldexp pattern at -O0; the compiler was trying to use out-of-range immediates on VOP3 instruction encodings. This patch changes the constraints appropriately, and also takes the opportunity to combine the two patterns into one using the newly ava

[committed] amdgcn: update target-supports.exp

2023-04-20 Thread Andrew Stubbs
Recent patches have enabled new capabilities on AMD GCN, but not all the testsuite features were enabled. The hardfp divide patch actually had an test regression because the expected results were too conservative. This patch corrects both issues. Andrewamdgcn: update target-supports.exp The b

[committed] amdgcn: HardFP divide

2023-04-18 Thread Andrew Stubbs
This patch switches amdgcn from using softfp for division to using the hardware support. There isn't a single division instruction, but there is an instruction sequence that gives the necessary accuracy. This implementation also allows fully vectorized division, so gives good performance impro

Re: [r13-7135 Regression] FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r] [^\\n]* = foo\\.simdclone" 2 on Linux/x86_64

2023-04-13 Thread Andrew Stubbs
Hi Andre, I don't have a cascadelake device to test on, nor any knowledge about what makes it different from regular x86_64. If the cascadelake device is supposed to work the same as other x86_64 devices for these vectors then the test has found a bug in the compiler and you should be lookin

[committed] amdgcn: Add 64-bit vector not

2023-04-04 Thread Andrew Stubbs
I've committed this patch to add a missing vector operator on amdgcn. The architecture doesn't have a 64-bit not instruction so we didn't have an insn for it, but the vectorizer didn't like that and caused the v64df_pow function to use 2MB of stack frame. This is a problem when you typically h

Re: [og12] libgomp: Document OpenMP 'pinned' memory (was: [PATCH] libgomp, openmp: pinned memory)

2023-03-27 Thread Andrew Stubbs
On 27/03/2023 12:26, Thomas Schwinge wrote: Hi! On 2023-03-27T09:27:31+, "Stubbs, Andrew" wrote: -Original Message- From: Thomas Schwinge Sent: 24 March 2023 15:50 On 2022-01-04T15:32:17+0000, Andrew Stubbs wrote: This patch implements the OpenMP pinned memory trait

[committed] amdgcn: Fix register size bug

2023-03-23 Thread Andrew Stubbs
This patch fixes a bug in which the function prologue would save more registers to the stack than there was space allocated. This would cause data corruption when the epilogue restored the registers if a child function had overwritten that memory. The problem was caused by insn constraints tha

[committed] amdgcn: vec_extract no-op insns

2023-03-23 Thread Andrew Stubbs
This patch adds new pseudo-insns for no-op vector extractions. These were previously modelled as simple move instructions, but the register allocator has unhelpful special handling for these that triggered spills to memory. Modelling them as a vec_select does the right thing in the register al

Re: [PATCH] amdgcn: Add accumulator VGPR registers

2023-03-21 Thread Andrew Stubbs
On 21/03/2023 13:42, Andrew Jenner wrote: This patch gives GCC to use the accumulator VGPR registers on CDNA1 and later architectures. The backend does not yet attempt to make use of the matrix acceleration instructions, but the new registers are still useful as fast space for register spills.

Re: [PATCH] amdgcn: Add instruction patterns for vector operations on complex numbers

2023-03-21 Thread Andrew Stubbs
On 21/03/2023 13:35, Andrew Jenner wrote: I have updated this patch to incorporate the feedback from Andrew Stubbs. Tested on CDNA2 GFX90a. gcc/ChangeLog:     * config/gcn/gcn-protos.h (gcn_expand_dpp_swap_pairs_insn)     (gcn_expand_dpp_distribute_even_insn

Re: [PATCH] testsuite: Fix up vect-simd-clone1[678]*.c tests [PR108898]

2023-03-21 Thread Andrew Stubbs
On 21/03/2023 12:14, Jakub Jelinek wrote: Hi! As mentioned in the PR, vect-simd-clone-1[678]{,f}.c tests FAIL on x86_64-linux with -m64/-march=cascadelake or -m32/-march=cascadelake, there are 3 matches for the calls rather than expected two. As suggested by Richi, this patch changes those tests

Re: [Patch] gcn/mkoffload.cc: Pass -save-temps on for the hsaco step

2023-03-13 Thread Andrew Stubbs
On 13/03/2023 12:25, Tobias Burnus wrote: Found when comparing '-v -Wl,-v' output as despite -save-temps multiple runs yielded differed results. Fixed as attached. OK for mainline? OK. Andrew

Re: [Patch] GCN update for wwwdocs / libgomp.texi

2023-03-08 Thread Andrew Stubbs
On 08/03/2023 11:06, Tobias Burnus wrote: Next try – this time with both patches. On 08.03.23 12:05, Tobias Burnus wrote: Hi Andrew, attached are two patches related to GCN, one for libgomp.texi documenting an env var and a release-notes update in www docs. OK? Comments? LGTM Andrew

Re: [PATCH] amdgcn: Add instruction patterns for conditional min/max operations

2023-03-06 Thread Andrew Stubbs
On 03/03/2023 17:05, Paul-Antoine Arras wrote: Le 02/03/2023 à 18:18, Andrew Stubbs a écrit : On 01/03/2023 16:56, Paul-Antoine Arras wrote: This patch introduces instruction patterns for conditional min and max operations (cond_{f|s|u}{max|min}) in the GCN machine description. It also allows

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-02 Thread Andrew Stubbs
On 02/03/2023 15:07, Kwok Cheung Yeung wrote: Hello I've made the suggested changes. Should I hold off on committing this until GCC 13 has been branched off? No need, amdgcn is not a primary target and this stuff won't affect anyone else. Please go ahead and commit. Andrew

Re: [PATCH] amdgcn: Add instruction patterns for conditional min/max operations

2023-03-02 Thread Andrew Stubbs
On 01/03/2023 16:56, Paul-Antoine Arras wrote: This patch introduces instruction patterns for conditional min and max operations (cond_{f|s|u}{max|min}) in the GCN machine description. It also allows the exec register to be saved in SGPRs to avoid spilling to memory. Tested on GCN3 Fiji gfx803

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-01 Thread Andrew Stubbs
On 01/03/2023 10:52, Andre Vieira (lists) wrote: On 01/03/2023 10:01, Andrew Stubbs wrote: > On 28/02/2023 23:01, Kwok Cheung Yeung wrote: >> Hello >> >> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION >> target hook for the AMD GCN a

Re: [PATCH] amdgcn: Enable SIMD vectorization of math functions

2023-03-01 Thread Andrew Stubbs
On 28/02/2023 23:01, Kwok Cheung Yeung wrote: Hello This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION target hook for the AMD GCN architecture, such that when vectorized, calls to builtin standard math functions such as asinf, exp, pow etc. are converted to calls to the r

[committed][OG12] libgomp: no need to attach USM pointers

2023-02-23 Thread Andrew Stubbs
This patch fixes a bug in which libgomp doesn't know what to do with attached pointers in fortran derived types when using Unified Shared Memory instead of explicit mappings. I've committed it to the devel/omp/gcc-12 branch (OG12) and will fold it into the next rebase/repost of the USM patches

Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-02-23 Thread Andrew Stubbs
On 10/02/2023 09:11, Jakub Jelinek wrote: I've tried to fix the -flto thing and I can't figure out how. The problem seems to be that there are two dump files from the two compiler invocations and it scans the wrong one. Aarch64 has the same problem. Two dumps are because it is in a dg-do run te

Re: [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator)

2023-02-20 Thread Andrew Stubbs
On 16/02/2023 21:11, Thomas Schwinge wrote: --- /dev/null +++ b/libgomp/basic-allocator.c +#ifndef BASIC_ALLOC_YIELD +#deine BASIC_ALLOC_YIELD +#endif In file included from [...]/libgomp/config/nvptx/allocator.c:49: [...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: in

Re: [og12] Attempt to register OpenMP pinned memory using a device instead of 'mlock' (was: [PATCH] libgomp, openmp: pinned memory)

2023-02-20 Thread Andrew Stubbs
On 17/02/2023 08:12, Thomas Schwinge wrote: Hi Andrew! On 2023-02-16T23:06:44+0100, I wrote: On 2023-02-16T16:17:32+, "Stubbs, Andrew via Gcc-patches" wrote: The mmap implementation was not optimized for a lot of small allocations, and I can't see that issue changing here That's corre

[OG12][committed] amdgcn: OpenMP low-latency allocator

2023-02-16 Thread Andrew Stubbs
These patches implement an LDS memory allocator for OpenMP on AMD. 1. 230216-basic-allocator.patch Separate the allocator from NVPTX so the code can be shared. 2. 230216-amd-low-lat.patch Allocate the memory, adjust the default address space, and hook up the allocator. They will need to be

Re: [og12] In 'libgomp/allocator.c:omp_realloc', route 'free' through 'MEMSPACE_FREE' (was: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator)

2023-02-14 Thread Andrew Stubbs
On 14/02/2023 12:54, Thomas Schwinge wrote: Hi Andrew! On 2022-01-13T11:13:51+, Andrew Stubbs wrote: Updated patch: this version fixes some missed cases of malloc in the realloc implementation. Right, and as it seems I've run into another issue: a stray 'free'.

Re: [PATCH] amdgcn: Add instruction patterns for vector operations on complex numbers

2023-02-14 Thread Andrew Stubbs
On 09/02/2023 20:13, Andrew Jenner wrote: This patch introduces instruction patterns for complex number operations in the GCN machine description. These patterns are cmul, cmul_conj, vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls (cmla_conj and cmls_conj were not found t

Re: -foffload-memory=pinned (was: [PATCH 1/5] openmp: Add -foffload-memory)

2023-02-13 Thread Andrew Stubbs
On 13/02/2023 14:38, Thomas Schwinge wrote: Hi! On 2022-03-08T11:30:55+, Hafiz Abid Qadeer wrote: From: Andrew Stubbs Add a new option. It will be used in follow-up patches. --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi +@option{-foffload-memory=pinned} forces all host

Re: [PATCH] RISC-V: Bugfix for mode tieable of the rvv bool types

2023-02-13 Thread Andrew Stubbs
I presume I've been CC'd on this conversation because weird vector architecture problems have happened to me before. :) However, I'm not sure I can help much because AMD GCN does not use BImode vectors at all. This is partly because loading boolean values into a GCN vector would have 31 paddin

Re: [PATCH] libgomp, openmp: pinned memory

2023-02-10 Thread Andrew Stubbs
On 10/02/2023 15:11, Thomas Schwinge wrote: Hi! Re OpenMP 'pinned' memory allocator trait semantics vs. 'omp_realloc': On 2022-01-13T13:53:03+, Andrew Stubbs wrote: On 05/01/2022 17:07, Andrew Stubbs wrote: [...], I'm working on an implementation using mmap ins

Re: [PATCH 3/5] openmp, nvptx: ompx_unified_shared_mem_alloc

2023-02-10 Thread Andrew Stubbs
On 10/02/2023 14:21, Thomas Schwinge wrote: Is the correct fix the following (conceptually like 'linux_memspace_alloc' cited above), or is there something that I fail to understand? static void * linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int pin) {

[committed] amdgcn: Pass -mstack-size through to runtime

2023-02-06 Thread Andrew Stubbs
The -mstack-size option has been marked obsolete in favour of setting an environment variable at runtime ("GCN_STACK_SIZE"), but some testcases still need the option set or they have stack overflow. I could change them to use the envvar, but my testing setup uses remote execute which doesn't su

Re: [Patch] libgomp: enable reverse offload for AMDGCN

2023-02-02 Thread Andrew Stubbs
On 02/02/2023 14:59, Tobias Burnus wrote: Maybe it becomes better reviewable with an attached patch ... On 02.02.23 15:31, Tobias Burnus wrote: Now that the stack handling has been changed for AMDGCN, this patch enables reverse offload. (cf. today's "[committed] amdgcn, libgomp: Manually alloca

[committed] amdgcn, libgomp: Manually allocated stacks

2023-02-02 Thread Andrew Stubbs
I've committed this patch to change the ways stacks are initialized on amdgcn. The patch only touches GCN files, or the GCN-only portions of libgomp files, so I'm allowing it despite stage 4 because I want the ABI change done for GCC 13, and because it enables Tobias's reverse offload-patch tha

Re: [PATCH] amdgcn: Add instruction pattern for conditional shift operations

2023-02-02 Thread Andrew Stubbs
On 01/02/2023 15:35, Paul-Antoine Arras wrote: This patch introduces an instruction pattern for conditional shift operations (cond_{ashl|ashr|lshr}) in the GCN machine description. Tested on GCN3 Fiji gfx803. OK to commit? The changelog will need to be wrapped to 80 columns. OK otherwise. A

Re: [OG12][committed] amdgcn, libgomp: custom USM allocator

2023-01-13 Thread Andrew Stubbs
I changed it to use 128-byte alignment to match the GPU cache-lines. Committed to OG12. Andrew On 11/01/2023 18:05, Andrew Stubbs wrote: This patch fixes a runtime issue I encountered with the AMD GCN Unified Shared Memory implementation. We were using regular malloc'd memory confi

[OG12][committed] amdgcn, libgomp: custom USM allocator

2023-01-11 Thread Andrew Stubbs
This patch fixes a runtime issue I encountered with the AMD GCN Unified Shared Memory implementation. We were using regular malloc'd memory configured into USM mode, but there were random intermittent crashes. I can't be completely sure, but my best guess is that the HSA driver is using malloc

Re: [PATCH 3/3] vect: inbranch SIMD clones

2023-01-06 Thread Andrew Stubbs
Here's a new version of the patch. On 01/12/2022 14:16, Jakub Jelinek wrote: +void __attribute__((noinline)) You should use noipa attribute instead of noinline on callers which aren't declare simd (on declare simd it would prevent cloning which is essential for the declare simd behavior), so t

[OG12][committed] libgomp: Fix USM bugs

2022-12-16 Thread Andrew Stubbs
I've committed this patch to the devel/omp/gcc-12 branch. It fixes some missed cases in the Unified Shared Memory implementation that were especially noticeable in Fortran because the size of arrays are known. This patch will have to be folded into the mainline USM patches that were submitted

Re: [PATCH 02/17] libgomp: pinned memory

2022-12-08 Thread Andrew Stubbs
On 08/12/2022 14:02, Tobias Burnus wrote: On 08.12.22 13:51, Andrew Stubbs wrote: On 08/12/2022 12:11, Jakub Jelinek wrote: On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote: Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall.  Pinned allocations are

Re: [PATCH 02/17] libgomp: pinned memory

2022-12-08 Thread Andrew Stubbs
On 08/12/2022 12:11, Jakub Jelinek wrote: On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote: Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when

Re: [PATCH] amdgcn: Add preprocessor builtins for every processor type

2022-12-01 Thread Andrew Stubbs
On 01/12/2022 14:35, Paul-Antoine Arras wrote: I believe this patch addresses your comments regarding the GCN bits. The new builtins are consistent with the LLVM naming convention (lower case, canonical name). For gfx803, I also kept '__fiji__' to be consistent with -march=fiji. Is it OK for

Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-12-01 Thread Andrew Stubbs
On 30/11/2022 15:37, Jakub Jelinek wrote: On Wed, Nov 30, 2022 at 03:17:30PM +, Andrew Stubbs wrote: --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c @@ -0,0 +1,89 @@ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -

Re: [PATCH][OG12] amdgcn: Support AMD-specific 'isa' and 'arch' traits in OpenMP context selectors

2022-12-01 Thread Andrew Stubbs
On 01/12/2022 11:10, Paul-Antoine Arras wrote: + if (TARGET_FIJI) \ + builtin_define ("__FIJI__"); \ + else if (TARGET_VEGA10) \ +

Re: [PATCH 3/3] vect: inbranch SIMD clones

2022-11-30 Thread Andrew Stubbs
On 09/09/2022 15:31, Jakub Jelinek wrote: --- a/gcc/tree-if-conv.cc +++ b/gcc/tree-if-conv.cc @@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt, vec refs) tree fndecl = gimple_call_fndecl (stmt); if (fndecl) { + /* We can vectorize some builtins and

Re: [Patch] libgomp.texi: List GCN's 'gfx803' under OpenMP Context Selectors (was: amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors)

2022-11-30 Thread Andrew Stubbs
On 29/11/2022 18:26, Tobias Burnus wrote: Hi PA, hi Andrew, hi Jakub, hi all, On 29.11.22 16:56, Paul-Antoine Arras wrote: This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP context selectors, [...] I think this should be documented somewhere. We have https://gcc.gnu.org/on

Re: [PATCH] amdgcn: Support AMD-specific 'isa' traits in OpenMP context selectors

2022-11-29 Thread Andrew Stubbs
On 29/11/2022 15:56, Paul-Antoine Arras wrote: Hi all, This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP context selectors, so as to be consistent with LLVM. It also adds test cases checking all supported AMD ISAs are properly recognised when used in a 'declare variant' co

Re: [Patch] gcn: Fix __builtin_gcn_first_call_this_thread_p

2022-11-28 Thread Andrew Stubbs
On 28/11/2022 07:40, Tobias Burnus wrote: It turned out that cprop cleverly propagated the unspec_volatile to the preceding (pseudo)register, permitting to remove the 'set (s0) (pseudoregister)' at -O2.  Unfortunately, it does matter whether the assignment is done to 's2' (previously: pseudoregis

Re: [Patch] libgomp/gcn: fix/improve struct output (was: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling)

2022-11-21 Thread Andrew Stubbs
On 21/11/2022 13:40, Tobias Burnus wrote: Working on the builtins, I realized that I mixed up (again) bits and byes. While 'uint64_t var[2]' has a size of 128 bits, 'char var[128]' has a size of 128 bytes. Thus, there is sufficient space for 16 pointer-size/uin64_t values but I only need 6. T

Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-19 Thread Andrew Stubbs
On 19/11/2022 10:46, Tobias Burnus wrote: On 18.11.22 18:49, Andrew Stubbs wrote: On 18/11/2022 17:20, Tobias Burnus wrote: This looks wrong: +    /* stackbase = (stack_segment_decr & 0x) +    + stack_wave_offset); +   seg_size = dispatch_ptr->private_segme

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-11-18 Thread Andrew Stubbs
On 18/11/2022 17:41, Tobias Burnus wrote: Attached is the updated/rediffed version, which now uses the builtin instead of the 'asm("s8"). The code in principle works; that is: If no private stack variables are copied, it works. Or in other words: reverse-offload target regions that don't use

Re: [Patch] gcn: Add __builtin_gcn_{get_stack_limit,first_call_this_thread_p}

2022-11-18 Thread Andrew Stubbs
On 18/11/2022 17:20, Tobias Burnus wrote: This patch adds two builtins (getting end-of-stack pointer and a Boolean answer whether it was the first call to the builtin on this thread). The idea is to replace some hard-coded values in newlib, permitting to move later to a manually allocated stac

Re: [patch] gcn: Add __builtin_gcn_kernarg_ptr

2022-11-16 Thread Andrew Stubbs
On 16/11/2022 11:42, Tobias Burnus wrote: This is a part of a patch by Andrew (hi!) - namely that part that only adds the __builtin_gcn_kernarg_ptr. More is planned, see below. The short term benefit of this patch is to permit replacing hardcoded numbers by a builtin – like in libgomp (see pa

Re: [PATCH] amdgcn: Add builtins for vectorized native versions of abs, floorf and floor

2022-11-08 Thread Andrew Stubbs
On 08/11/2022 14:35, Kwok Cheung Yeung wrote: Hello This patch adds three extra builtins for the vectorized forms of the abs, floorf and floor math functions, which are implemented by native GCN instructions. I have also added a test to check that they generate the expected assembler instruct

Re: [PATCH] amdgcn: Fix instruction generation for exp2 and log2 operations

2022-11-03 Thread Andrew Stubbs
On 03/11/2022 17:47, Kwok Cheung Yeung wrote: Hello This patch fixes a bug introduced in a previous patch adding support for generating native instructions for the exp2 and log2 patterns. The problem is that the name of the instruction implementing the exp2 operation is v_exp (and not v_exp2)

[committed] amdgcn: multi-size vector reductions

2022-10-31 Thread Andrew Stubbs
My recent patch to add additional vector lengths didn't address the vector reductions yet. This patch adds the missing support. Shorter vectors use fewer reduction steps, and the means to extract the final value has been adjusted. Lacking from this is any useful costs, so for loops the vect p

[committed] amdgcn: add fmin/fmax patterns

2022-10-31 Thread Andrew Stubbs
This patch adds patterns for the fmin and fmax operators, for scalars, vectors, and vector reductions. The compiler uses smin and smax for most floating-point optimizations, etc., but not where the user calls fmin/fmax explicitly. On amdgcn the hardware min/max instructions are already IEEE c

[committed] amdgcn: Silence unused parameter warning

2022-10-31 Thread Andrew Stubbs
A function parameter was left over from a previous draft of my multiple-vector-length patch. This patch silences the harmless warning. Andrewamdgcn: Silence unused parameter warning gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Set base_type a

Re: [OG12 commit] vect: WORKAROUND vectorizer bug

2022-10-27 Thread Andrew Stubbs
On 24/10/2022 19:06, Richard Biener wrote: Am 24.10.2022 um 18:51 schrieb Andrew Stubbs : I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better. Without this. the libgomp.oac

[OG12 commit] vect: WORKAROUND vectorizer bug

2022-10-24 Thread Andrew Stubbs
I've committed this to the OG12 branch to remove some test failures. We probably ought to have something on mainline also, but a proper fix would be better. Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase fails to compile due to an ICE. The OpenACC worker broadcasting

[OG12 commit] amdgcn: disallow USM on gfx908

2022-10-24 Thread Andrew Stubbs
I've committed this patch to the devel/omp/gcc-12 branch. I will have to fold it into my previous OpenMP memory management patch series when I repost it. The GFX908 (MI100) devices only partially support the Unified Shared Memory model that we have, and only then with additional kernel boot p

[OG12 commit] amdgcn, libgomp: USM allocation update

2022-10-24 Thread Andrew Stubbs
I've committed this patch to the devel/omp/gcc-12 branch. I will have to fold it into my previous OpenMP memory management patch series when I repost it. The patch changes the internal memory allocation method such that memory is allocated in the regular heap and then marked as "coarse-grained

[PATCH] libgomp: fix hang on fatal error

2022-10-14 Thread Andrew Stubbs
This patch fixes a problem in which fatal errors inside mutex-locked regions (i.e. basically anything in the plugin) will cause it to hang up trying to take the lock to clean everything up. Using abort() instead of exit(1) bypasses the atexit handlers and solves the problem. OK for mainline?

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-14 Thread Andrew Stubbs
On 14/10/2022 08:07, Richard Biener wrote: On Tue, 11 Oct 2022, Richard Sandiford wrote: Richard Biener writes: On Mon, 10 Oct 2022, Andrew Stubbs wrote: On 10/10/2022 12:03, Richard Biener wrote: The following picks up the prototype by Ju-Zhe Zhong for vectorizing first order recurrences

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-10-12 Thread Andrew Stubbs
On 12/10/2022 15:29, Tobias Burnus wrote: On 29.09.22 18:24, Andrew Stubbs wrote: On 27/09/2022 14:16, Tobias Burnus wrote: Andrew did suggest a while back to piggyback on the console_output handling, avoiding another atomic access. - If this is still wanted, I like to have some guidance

Re: [committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Andrew Stubbs
On 11/10/2022 12:29, Richard Biener wrote: On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs wrote: This patch series adds additional vector sizes for the amdgcn backend. The hardware supports any arbitrary vector length up to 64-lanes via masking, but GCC cannot (yet) make full use of them due

[committed 6/6] amdgcn: vector testsuite tweaks

2022-10-11 Thread Andrew Stubbs
The testsuite needs a few tweaks following my patches to add multiple vector sizes for amdgcn. gcc/testsuite/ChangeLog: * gcc.dg/pr104464.c: Xfail on amdgcn. * gcc.dg/signbit-2.c: Likewise. * gcc.dg/signbit-5.c: Likewise. * gcc.dg/vect/bb-slp-68.c: Likewise.

[committed 5/6] amdgcn: Add vector integer negate insn

2022-10-11 Thread Andrew Stubbs
Another example of the vectorizer needing explicit insns where the scalar expander just works. gcc/ChangeLog: * config/gcn/gcn-valu.md (neg2): New define_expand. --- gcc/config/gcn/gcn-valu.md | 13 + 1 file changed, 13 insertions(+) diff --git a/gcc/config/gcn/gcn-valu.md

[committed 1/6] amdgcn: add multiple vector sizes

2022-10-11 Thread Andrew Stubbs
The vectors sizes are simulated using implicit masking, but they make life easier for the autovectorizer and SLP passes. gcc/ChangeLog: * config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF, V16QI, V16HI, V16SI, V16

[committed 4/6] amdgcn: vec_init for multiple vector sizes

2022-10-11 Thread Andrew Stubbs
Implements vec_init when the input is a vector of smaller vectors, or of vector MEM types, or a smaller vector duplicated several times. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_init): New. * config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3. (GEN_VNM): Add gathervNm

[committed 3/6] amdgcn: Add vec_extract for partial vectors

2022-10-11 Thread Andrew Stubbs
Add vec_extract expanders for all valid pairs of vector types. gcc/ChangeLog: * config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants. * config/gcn/gcn-valu.md (vec_extract): New define_expand. * config/gcn/gcn.cc (get_exec): Export the existing func

[committed 2/6] amdgcn: Resolve insn conditions at compile time

2022-10-11 Thread Andrew Stubbs
GET_MODE_NUNITS isn't a compile time constant, so we end up with many impossible insns in the machine description. Adding MODE_VF allows the insns to be eliminated completely. gcc/ChangeLog: * config/gcn/gcn-valu.md (2): Use MODE_VF. (2): Likewise. * config/gcn/g

[committed 0/6] amdgcn: Add V32, V16, V8, V4, and V2 vectors

2022-10-11 Thread Andrew Stubbs
ssues, but rather existing problems that did not show up because the code did not previously vectorize. Expanding the testcase to allow 64-lane vectors shows the same problems there. I shall backport these patches to the OG12 branch shortly. Andrew Andrew Stubbs (6): amdgcn: add multiple ve

Re: [PATCH][RFT] Vectorization of first-order recurrences

2022-10-10 Thread Andrew Stubbs
On 10/10/2022 12:03, Richard Biener wrote: The following picks up the prototype by Ju-Zhe Zhong for vectorizing first order recurrences. That solves two TSVC missed optimization PRs. There's a new scalar cycle def kind, vect_first_order_recurrence and it's handling of the backedge value vectori

Re: [PATCH] vect: while_ult for integer mask

2022-10-03 Thread Andrew Stubbs
On 29/09/2022 14:46, Richard Biener wrote: It's not the nicest way of carrying the information but short of inventing new modes I can't see something better (well, another optab). I see the GCN backend expects a constant in operand 3 but the docs don't specify the operand has to be a CONST_INT,

[committed] amdgcn: remove unused variable

2022-09-29 Thread Andrew Stubbs
I've committed this small clean up. It silences a warning. Andrewamdgcn: remove unused variable This was left over from a previous version of the SIMD clone patch. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen): Remove unused elt_bits variable.

Re: [Patch] libgomp/gcn: Prepare for reverse-offload callback handling

2022-09-29 Thread Andrew Stubbs
On 27/09/2022 14:16, Tobias Burnus wrote: @@ -422,6 +428,12 @@ struct agent_info if it has been. */ bool initialized; + /* Flag whether the HSA program that consists of all the modules has been + finalized. */ + bool prog_finalized; + /* Flag whether the HSA OpenMP's requires

Re: [PATCH] vect: while_ult for integer mask

2022-09-29 Thread Andrew Stubbs
On 29/09/2022 10:24, Richard Sandiford wrote: Otherwise: operand0[0] = operand1 < operand2; for (i = 1; i < operand3; i++) operand0[i] = operand0[i - 1] && (operand1 + i < operand2); looks like a "length and mask" operation, which IIUC is also what RVV wanted? (Wasn't at the Cauldro

Re: [PATCH] vect: while_ult for integer mask

2022-09-29 Thread Andrew Stubbs
On 29/09/2022 08:52, Richard Biener wrote: On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs wrote: This patch is a prerequisite for some amdgcn patches I'm working on to support shorter vector lengths (having fixed 64 lanes tends to miss optimizations, and masking is not supported everywher

[PATCH] vect: while_ult for integer mask

2022-09-28 Thread Andrew Stubbs
This patch is a prerequisite for some amdgcn patches I'm working on to support shorter vector lengths (having fixed 64 lanes tends to miss optimizations, and masking is not supported everywhere yet). The problem is that, unlike AArch64, I'm not using different mask modes for different sized ve

Re: [OG12][PATCH] openmp: Fix handling of target constructs in static member

2022-09-13 Thread Andrew Stubbs
On 13/09/2022 12:03, Paul-Antoine Arras wrote: Hello, This patch intends to backport e90af965e5c by Jakub Jelinek to devel/omp/gcc-12. The original patch was described here: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601189.html I've merged and committed it for you. Andrew

Re: GCN: Add -mlow-precision-sqrt for double-precision sqrt [PR105246] (was: Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations)

2022-09-09 Thread Andrew Stubbs
On 09/09/2022 13:20, Tobias Burnus wrote: However, the pre-existing 'sqrt' problem still is real. It also applies to reverse sqrt ("v_rsq"), but that's for whatever reason not used for GCN. This patch now adds a commandline flag - off by default - to choose whether this behavior is wanted. I d

Re: [PATCH] amdgcn: Add support for additional natively supported floating-point operations

2022-09-09 Thread Andrew Stubbs
On 08/09/2022 21:38, Kwok Cheung Yeung wrote: Hello This patch adds support for some additional floating-point operations, in scalar and vector modes, which are natively supported by the AMD GCN instruction set, but haven't been implemented in GCC yet. With the exception of frexp, these imple

Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-31 Thread Andrew Stubbs
On 31/08/2022 09:29, Jakub Jelinek wrote: On Tue, Aug 30, 2022 at 06:54:49PM +0200, Rainer Orth wrote: --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node) veclen = node->simdclone->vecsize_int; else

Re: [PATCH 2/3] amdgcn: OpenMP SIMD routine support

2022-08-30 Thread Andrew Stubbs
On 09/08/2022 14:23, Andrew Stubbs wrote: Enable and configure SIMD clones for amdgcn. This affects both the __simd__ function attribute, and the OpenMP "declare simd" directive. Note that the masked SIMD variants are generated, but the middle end doesn't actually support c

Re: [PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-30 Thread Andrew Stubbs
On 26/08/2022 12:04, Jakub Jelinek wrote: gcc/ChangeLog: * doc/tm.texi: Regenerate. * omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero vecsize. (simd_clone_adjust_argument_types): Likewise. * target.def (compute_vecsize_and_simdlen): Document

[PATCH 3/3] vect: inbranch SIMD clones

2022-08-09 Thread Andrew Stubbs
There has been support for generating "inbranch" SIMD clones for a long time, but nothing actually uses them (as far as I can see). This patch add supports for a sub-set of possible cases (those using mask_mode == VOIDmode). The other cases fail to vectorize, just as before, so there should be n

[PATCH 2/3] amdgcn: OpenMP SIMD routine support

2022-08-09 Thread Andrew Stubbs
Enable and configure SIMD clones for amdgcn. This affects both the __simd__ function attribute, and the OpenMP "declare simd" directive. Note that the masked SIMD variants are generated, but the middle end doesn't actually support calling them yet. gcc/ChangeLog: * config/gcn/gcn.cc (g

[PATCH 1/3] omp-simd-clone: Allow fixed-lane vectors

2022-08-09 Thread Andrew Stubbs
The vecsize_int/vecsize_float has an assumption that all arguments will use the same bitsize, and vary the number of lanes according to the element size, but this is inappropriate on targets where the number of lanes is fixed and the bitsize varies (i.e. amdgcn). With this change the vecsize can

<    1   2   3   4   5   6   7   8   9   10   >