[PATCH 07/17] openmp: allow requires unified_shared_memory

2022-07-07 Thread Andrew Stubbs
This is the front-end portion of the Unified Shared Memory implementation. It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets flag_offload_memory, but is otherwise inactive, for now. It also checks that -foffload-memory isn't set to an incompatible mode.

[PATCH 12/17] Handle cleanup of omp allocated variables (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs
Currently we are only handling omp allocate directive that is associated with an allocate statement. This statement results in malloc and free calls. The malloc calls are easy to get to as they are in the same block as allocate directive. But the free calls come in a separate cleanup block. To

[PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc

2022-07-07 Thread Andrew Stubbs
This adds support for using Cuda Managed Memory with omp_alloc. It will be used as the underpinnings for "requires unified_shared_memory" in a later patch. There are two new predefined allocators, ompx_unified_shared_mem_alloc and ompx_host_mem_alloc, plus corresponding memory spaces, which can

[PATCH 06/17] openmp: Add -foffload-memory

2022-07-07 Thread Andrew Stubbs
Add a new option. It's inactive until I add some follow-up patches. gcc/ChangeLog: * common.opt: Add -foffload-memory and its enum values. * coretypes.h (enum offload_memory): New. * doc/invoke.texi: Document -foffload-memory. --- gcc/common.opt | 16

[PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-07-07 Thread Andrew Stubbs
. co-authored-by: Andrew Stubbs --- gcc/omp-low.cc | 174 +++ gcc/passes.def | 1 + gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 ++ gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++ gcc/testsuite/g++.dg/gomp/usm-1

[PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc

2022-07-07 Thread Andrew Stubbs
This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. The name uses the OpenMP extension space and is intended to be consistent with other OpenMP implementations currently in development. The allocator is equivalent to using a custom allocator with the

[PATCH 04/17] openmp, nvptx: low-lat memory access traits

2022-07-07 Thread Andrew Stubbs
The NVPTX low latency memory is not accessible outside the team that allocates it, and therefore should be unavailable for allocators with the access trait "all". This change means that the omp_low_lat_mem_alloc predefined allocator now implicitly implies the "pteam" trait. libgomp/ChangeLog:

[PATCH 01/17] libgomp, nvptx: low-latency memory allocator

2022-07-07 Thread Andrew Stubbs
This patch adds support for allocating low-latency ".shared" memory on NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory can be allocated, reallocated, and freed using a basic but fast algorithm, is thread safe and the size of the low-latency heap can be configured using

[PATCH 02/17] libgomp: pinned memory

2022-07-07 Thread Andrew Stubbs
Implement the OpenMP pinned memory trait on Linux hosts using the mlock syscall. Pinned allocations are performed using mmap, not malloc, to ensure that they can be unpinned safely when freed. libgomp/ChangeLog: * allocator.c (MEMSPACE_ALLOC): Add PIN. (MEMSPACE_CALLOC): Add

[PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators

2022-07-07 Thread Andrew Stubbs
memory that's both high-bandwidth and pinned anyway). Patches 15 to 17 are new work. I can probably approve these myself, but they can't be committed until the rest of the series is approved. Andrew Andrew Stubbs (11): libgomp, nvptx: low-latency memory allocator libgomp: pinned memory libgomp

Re: [committed] openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library

2022-06-29 Thread Andrew Stubbs
On 29/06/2022 11:45, Jakub Jelinek wrote: And omp_init_allocator needs to decide what to do if one asks for features that need memkind as well as for features that need whatever you/Abid have been working on. A possible resolution is punt (return omp_null_allocator), or prefer one feature over

Re: [committed] openmp: Add support for HBW or large capacity or interleaved memory through the libmemkind.so library

2022-06-28 Thread Andrew Stubbs
On 09/06/2022 09:19, Jakub Jelinek via Gcc-patches wrote: + switch (memspace) +{ +case omp_high_bw_mem_space: +#ifdef LIBGOMP_USE_MEMKIND + struct gomp_memkind_data *memkind_data; + memkind_data = gomp_get_memkind (); + if (data.partition == omp_atv_interleaved +

[committed][OG11] andgcn, openmp: Unified Shared Memory

2022-06-27 Thread Andrew Stubbs
I've pushed these three patches to the devel/omp/gcc-11 branch ("OG11"). I'll be submitting mainline versions soonish. The patches add a means to track "requires unified_shared_memory" from the frontend, through the backend compiler, and on to the runtime, plus all the bits needed to

[committed] amdgcn: test global constructors

2022-06-27 Thread Andrew Stubbs
This setting is way out of date; global constructors have worked on GCN for a while now. Andrewamdgcn: test global constructors The tests are disabled for historical reasons only. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_global_constructor):

[committed] amdgcn: remove obsolete assembler workarounds

2022-06-27 Thread Andrew Stubbs
This patch removed some workarounds that were required for old versions of the LLVM assembler. The minimum supported version is now 13.0.1 so the workarounds are no longer needed. Andrewamdgcn: remove obsolete assembler workarounds This nonsense is no longer required, now that the minimum

Re: [PATCH] libgomp, openmp: pinned memory

2022-06-07 Thread Andrew Stubbs
On 07/06/2022 13:10, Jakub Jelinek wrote: On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote: Following some feedback from users of the OG11 branch I think I need to withdraw this patch, for now. The memory pinned via the mlock call does not give the expected performance boost. I

Re: [PATCH] libgomp, openmp: pinned memory

2022-06-07 Thread Andrew Stubbs
how that'll handle heterogenous systems, but those ought to be rare. I don't think libmemkind will resolve this performance issue, although certainly it can be used for host implementations of low-latency memories, etc. Andrew On 13/01/2022 13:53, Andrew Stubbs wrote: On 05/01/2022 17:07

Re: [committed] amdgcn: Remove LLVM 9 assembler/linker support

2022-06-06 Thread Andrew Stubbs
On 27/05/2022 20:16, Thomas Schwinge wrote: Hi Andrew! On 2022-05-24T16:27:52+0100, Andrew Stubbs wrote: I've committed this patch to set the minimum required LLVM version, for the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a prerequisite for the gfx90a support, and 13.0.1

Re: [patch] [wwwdocs]+[invoke.texi] Update GCN for gfx90a (was: Re: [committed] amdgcn: Add gfx90a support)

2022-05-25 Thread Andrew Stubbs
On 25/05/2022 12:16, Tobias Burnus wrote: On 25.05.22 11:18, Andrew Stubbs wrote: On 24/05/2022 17:44, Tobias Burnus wrote: On 24.05.22 17:31, Andrew Stubbs wrote: amdgcn: Add gfx90a support I've deliberately avoided the MI100 and MI200 names because they're really not that simple. MI100

Re: [patch] [wwwdocs]+[invoke.texi] Update GCN for gfx90a (was: Re: [committed] amdgcn: Add gfx90a support)

2022-05-25 Thread Andrew Stubbs
On 24/05/2022 17:44, Tobias Burnus wrote: On 24.05.22 17:31, Andrew Stubbs wrote: amdgcn: Add gfx90a support Attached is an attempt to update invoke.texi I've deliberately avoided the MI100 and MI200 names because they're really not that simple. MI100 is gfx908, but MI150 is gfx906

[committed] amdgcn: Add gfx90a support

2022-05-24 Thread Andrew Stubbs
I've committed this patch to add support for gfx90a AMD GPU devices. The patch updates all the places that have architecture/ISA specific code, tidies up the ISA naming and handling in the backend, and adds a new multilib. This is just lightly tested at this point, but there are no known

[committed] amdgcn: Remove LLVM 9 assembler/linker support

2022-05-24 Thread Andrew Stubbs
I've committed this patch to set the minimum required LLVM version, for the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a prerequisite for the gfx90a support, and 13.0.1 is now the oldest version not known to have compatibility issues. The patch removes all the obsolete feature

Re: [PATCH, OpenMP, v2] Implement uses_allocators clause for target regions

2022-05-19 Thread Andrew Stubbs
On 19/05/2022 17:00, Jakub Jelinek wrote: Without requires dynamic_allocators, there are various extra restrictions imposed: 1) omp_init_allocator/omp_destroy_allocator may not be called (except for implicit calls to it from uses_allocators) in a target region I interpreted that more like

Re: [Patch] gcn/t-omp-device: Add 'amdgcn' as 'arch' [PR105602]

2022-05-16 Thread Andrew Stubbs
On 16/05/2022 11:28, Tobias Burnus wrote: While 'vendor' and 'kind' is well defined, 'arch' and 'isa' isn't. When looking at an 'metadirective' testcase (which oddly uses 'arch(amd)'), I noticed that LLVM uses 'arch(amdgcn)' while we use 'gcn', cf. e.g.

Re: libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library (was: [PATCH 1/4] Remove build dependence on HSA run-time)

2022-04-28 Thread Andrew Stubbs
On 06/04/2022 11:02, Thomas Schwinge wrote: Hi! On 2021-01-14T15:50:23+0100, I wrote: I'm raising here an issue with HSA libgomp plugin code changes from a while ago. While HSA is now no longer relevant for GCC master branch, the same code has also been copied into the GCN libgomp plugin.

[PATCH] openmp: Handle unified address memory.

2022-04-20 Thread Andrew Stubbs
This patch adds enough support for "requires unified_address" to make the sollve_vv testcases pass. It implements unified_address as a synonym of unified_shared_memory, which is both valid and the only way I know of to unify addresses with Cuda (could be wrong). This patch should be applied

Re: [PATCH 0/5] openmp: Handle pinned and unified shared memory.

2022-04-13 Thread Andrew Stubbs
This patch adjusts the testcases, previously proposed, to allow for testing on machines with varying page sizes and default amounts of lockable memory. There turns out to be more variation than I had thought. This should go on mainline at the same time as the previous patches in this thread.

Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-04-02 Thread Andrew Stubbs
On 02/04/2022 13:04, Andrew Stubbs wrote: This additional patch adds transformation for omp_target_alloc. The OpenMP 5.0 document says that addresses allocated this way needs to work without is_device_ptr. The easiest way to make that work is to make them USM addresses. Actually, reading

Re: [PATCH 4/5] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-04-02 Thread Andrew Stubbs
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote: This patches changes calls to malloc/free/calloc/realloc and operator new to memory allocation functions in libgomp with allocator=ompx_unified_shared_mem_alloc. This additional patch adds transformation for omp_target_alloc. The OpenMP 5.0

Re: [PATCH 5/5] openmp: -foffload-memory=pinned

2022-03-30 Thread Andrew Stubbs
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote: gcc/ChangeLog: * omp-low.cc (omp_enable_pinned_mode): New function. (execute_lower_omp): Call omp_enable_pinned_mode. This worked for x86_64, but I needed to make the attached adjustment to work on powerpc without a linker error.

Re: [PATCH, OpenMP 5.0] More implementation of the requires directive

2022-03-29 Thread Andrew Stubbs
On 13/01/2021 15:07, Chung-Lin Tang wrote: We currently emit errors, but do not fatally cause exit of the program if those are not met. We're still unsure if complete block-out of program execution is the right thing for the user. This can be discussed later. After the Unified Shared Memory

Re: [Patch] GCN: Implement __atomic_compare_exchange_{1, 2} in libgcc [PR102215]

2022-03-09 Thread Andrew Stubbs
On 09/03/2022 16:29, Tobias Burnus wrote: This shows up with with OpenMP offloading as libgomp since a couple of months uses __atomic_compare_exchange (see PR for details), causing link errors when the gcn libgomp.a is linked. It also shows up with sollve_vv. The implementation does a bit

[OG11][committed] amdgcn: Allow vector reductions on constants

2022-02-14 Thread Andrew Stubbs
On 14/02/2022 14:13, Andrew Stubbs wrote: I've committed this fix for an ICE compiling sollve_vv testcase test_target_teams_distribute_defaultmap.c. Somehow the optimizers result in a vector reduction on a vector of duplicated constants. This was a case the backend didn't handle, so we ended

[committed] amdgcn: Allow vector reductions on constants

2022-02-14 Thread Andrew Stubbs
I've committed this fix for an ICE compiling sollve_vv testcase test_target_teams_distribute_defaultmap.c. Somehow the optimizers result in a vector reduction on a vector of duplicated constants. This was a case the backend didn't handle, so we ended up with an unrecognised instruction ICE.

Re: [wwwdocs] gcc-12/changes.html (GCN): >1 workers per gang

2022-02-02 Thread Andrew Stubbs
On 02/02/2022 15:39, Tobias Burnus wrote: On 09.08.21 15:55, Tobias Burnus wrote: Now that the GCN/OpenACC patches for this have been committed today, I think it makes sense to add it to the documentation. (I was told that some follow-up items are still pending, but as the feature does work

[PATCH] openmp, nvptx: low-lat memory access traits

2022-01-27 Thread Andrew Stubbs
This patch adjusts the NVPTX low-latency allocator that I have previously posted (awaiting re-review). The patch assumes that all my previously posted patches are applied already. Given that any memory allocated from the low-latency memory space cannot support the "access=all" allocator trait

[PATCH] libgomp, openmp: Add ompx_pinned_mem_alloc

2022-01-20 Thread Andrew Stubbs
This patch adds a new predefined allocator named ompx_pinned_mem_alloc as an extension to the OpenMP standard. It is intended as a convenient way to allocate pinned memory using the Linux support patch I posted recently. I anticipate it being used by compiler internals in future as part of a

Re: [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.

2022-01-18 Thread Andrew Stubbs
On 18/01/2022 12:25, Thomas Schwinge wrote: Hi! Maybe I'm just totally confused -- as so often ;-) -- but things seem strange here: On 2022-01-12T10:43:05+0100, Marcel Vollweiler wrote: Currently omp_get_device_num does not work on gcn targets with more than one offload device. The reason is

Re: [PATCH] libgomp, OpenMP: Fix issue for omp_get_device_num on gfx targets.

2022-01-18 Thread Andrew Stubbs
Sorry, I had not seen that this was entirely within my amdgcn remit On 12/01/2022 09:43, Marcel Vollweiler wrote: Hi, Currently omp_get_device_num does not work on gcn targets with more than one offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in icv-device.c and thus

Re: [PATCH] libgomp, openmp: pinned memory

2022-01-13 Thread Andrew Stubbs
On 05/01/2022 17:07, Andrew Stubbs wrote: I don't believe 64KB will be anything like enough for any real HPC application. Is it really worth optimizing for this case? Anyway, I'm working on an implementation using mmap instead of malloc for pinned allocations. I figure that will simplify

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-13 Thread Andrew Stubbs
Updated patch: this version fixes some missed cases of malloc in the realloc implementation. It also reworks the unused variable workarounds so that the work better with my reworked pinned memory patches I've not posted yet. Andrewlibgomp, nvptx: low-latency memory allocator This patch adds

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-07 Thread Andrew Stubbs
On 06/01/2022 17:53, Tom de Vries wrote: My current understanding is that this is a backend problem, and needs to be fixed by defining atomic_store patterns which take care of this peculiarity. You mentioned on IRC that I ought to initialize the free chain using atomics also, and that you

Re: [PATCH] libgomp, openmp: pinned memory

2022-01-05 Thread Andrew Stubbs
On 04/01/2022 18:47, Jakub Jelinek wrote: On Tue, Jan 04, 2022 at 07:28:29PM +0100, Jakub Jelinek via Gcc-patches wrote: Other issues in the patch are that it doesn't munlock on deallocation and that because of that deallocation we need to figure out what to do on page boundaries. As

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-05 Thread Andrew Stubbs
On 05/01/2022 13:04, Tom de Vries wrote: On 1/5/22 12:08, Tom de Vries wrote: The allocators-1.c test-case doesn't compile because: ... FAIL: libgomp.c/allocators-1.c (test for excess errors) Excess errors: /home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22:

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-05 Thread Andrew Stubbs
On 05/01/2022 11:08, Tom de Vries wrote: The alloc-7.c execution test failure is a regression, AFAICT.  It fails here: ... 38    if uintptr_t) p) % __alignof (int)) != 0 || p[0] || p[1] || p[2]) 39  abort (); ... because: ... (gdb) p p[0] $2 = 772014104 (gdb) p p[1] $3 = 0

Re: [PATCH] nvptx: bump default to PTX 4.1

2022-01-05 Thread Andrew Stubbs
On 05/01/2022 10:24, Tom de Vries wrote: On 12/21/21 12:33, Andrew Stubbs wrote: On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014). Tobias has pointed out

Re: [PATCH] libgomp, openmp: pinned memory

2022-01-04 Thread Andrew Stubbs
On 04/01/2022 15:55, Jakub Jelinek wrote: The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but instead add libgomp/config/linux/allocator.c that includes some headers, defines some macros and then includes the generic allocator.c. OK, good point, I can do that. I

[PATCH] libgomp, openmp: pinned memory

2022-01-04 Thread Andrew Stubbs
This patch implements the OpenMP pinned memory trait for Linux hosts. On other hosts and on devices the trait becomes a no-op (instead of being rejected). The memory is locked via the mlock syscall, which is both the "correct" way to do it on Linux, and a problem because the default ulimit

[OG11][PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-22 Thread Andrew Stubbs
This is now backported to the devel/omp/gcc-11 branch (OG11). Andrew On 09/12/2021 11:41, Andrew Stubbs wrote: On 02/12/2021 16:43, Jakub Jelinek wrote: On Thu, Dec 02, 2021 at 04:31:36PM +, Andrew Stubbs wrote: On 02/12/2021 16:05, Andrew Stubbs wrote: On 02/12/2021 12:58, Jakub

[PATCH] nvptx: bump default to PTX 4.1

2021-12-21 Thread Andrew Stubbs
On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014). Tobias has pointed out, privately, that the default version is both documented and encoded in the -mptx

[PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2021-12-20 Thread Andrew Stubbs
This patch is submitted now for review and so I can commit a backport it to the OG11 branch, but isn't suitable for mainline until stage 1. The patch implements support for omp_low_lat_mem_space and omp_low_lat_mem_alloc on NVPTX offload devices. The omp_pteam_mem_alloc, omp_cgroup_mem_alloc

[PATCH] OpenMP front-end: allow requires dynamic_allocators

2021-12-20 Thread Andrew Stubbs
Hi all, This patch removes the "sorry" message for the OpenMP "requires dynamic_allocators" feature in C, C++ and Fortran. The clause is supposed to state that the user code will not work without the omp_alloc/omp_free and omp_init_allocator/omp_destroy_allocator and these things *are*

Re: [PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-09 Thread Andrew Stubbs
On 02/12/2021 16:43, Jakub Jelinek wrote: On Thu, Dec 02, 2021 at 04:31:36PM +, Andrew Stubbs wrote: On 02/12/2021 16:05, Andrew Stubbs wrote: On 02/12/2021 12:58, Jakub Jelinek wrote: I've tried modifying offload_handle_link_vars but that spot doesn't catch the omp_data_sizes variables

Re: [PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-02 Thread Andrew Stubbs
On 02/12/2021 16:05, Andrew Stubbs wrote: On 02/12/2021 12:58, Jakub Jelinek wrote: I've tried modifying offload_handle_link_vars but that spot doesn't catch the omp_data_sizes variables emitted by libgomp.c-c++-common/target_42.c, which was one of the motivating examples. Why doesn't catch

Re: [PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-02 Thread Andrew Stubbs
On 02/12/2021 12:58, Jakub Jelinek wrote: I've tried modifying offload_handle_link_vars but that spot doesn't catch the omp_data_sizes variables emitted by libgomp.c-c++-common/target_42.c, which was one of the motivating examples. Why doesn't catch it? Is the variable created only post-IPA?

Re: [PATCH] OpenMP: Ensure that offloaded variables are public

2021-12-02 Thread Andrew Stubbs
On 30/11/2021 16:54, Jakub Jelinek wrote: Why does the GCN plugin or runtime need to know those vars? It needs to know the single array that contains their addresses of course... With older LLVM there were issues with relocations that made it impossible to link the the offload_var_table. This

[commit][master+OG11] amdgcn: Fix ICE generating CFI [PR103396]

2021-11-25 Thread Andrew Stubbs
If committed this patch to fix the amdgcn ICE reported in PR103396. The problem was that it was mis-counting the number of registers to save when the link register was only clobbered implicitly by calls. The issue is easily fixed by adjusting the condition to match elsewhere in the same

[PATCH] OpenMP: Ensure that offloaded variables are public

2021-11-16 Thread Andrew Stubbs
Hi, This patch is needed for AMD GCN offloading when we use the assembler from LLVM 13+. The GCN runtime (libgomp+ROCm) requires that the location of all variables in the offloaded variables table are discoverable at runtime (using the "hsa_executable_symbol_get_info" API), and this only

Re: [Patch][GCN] [GCC 11] Backport GCN with LLVM-MC 13 linker fixes to GCC 11

2021-10-18 Thread Andrew Stubbs
This is fine by me. As I said in my email on the 15th, LLVM 13 is still not considered safe to use. The ICE you encountered is a real problem that will affect real users. I expect to work on a solution for that soon. Andrew On 16/10/2021 21:41, Tobias Burnus wrote: This patch is mostly

[committed] amdgcn: fix up offload debug linking with LLVM 13

2021-10-15 Thread Andrew Stubbs
This is a follow-up to my previous LLVM13 support patches (the amdgcn port uses the LLVM assembler) to fix up a corner case. With this patch one can now enable debug information in LLVM 13 offload binaries. This was trickier than you'd think because the different LLVM versions have different

[committed] amdgcn: Fix assembler version incompatibility

2021-10-07 Thread Andrew Stubbs
I've committed this patch to fix another case of LLVM assembler incompatibility. Marcel previously posted a patch to fix up the global_load and global_store instructions, following a non-backwards-compatible change in the assembler.

[committed] amdgcn: Implement -msram-ecc=any

2021-10-07 Thread Andrew Stubbs
I've committed this patch to implement the -msram-ecc=any feature that has been stubbed out awaiting LLVM support for a while now. When the LLVM assembler supports the "any" feature (v13+) GCC will now make use of it. Otherwise, GCC will continue to treat "any" the same as "on". Using the

[committed] amdgcn: Support LLVM 13 assembler syntax

2021-10-07 Thread Andrew Stubbs
I've committed this patch to allow GCC to adapt to the different variants of the LLVM amdgcn assembler. Unfortunately they keep making changes without maintaining backwards compatibility. GCC should now work with LLVM 9, LLVM 12, and LLVM 13 in terms of CLI usage, however only LLVM 9 is well

Re: Host and offload targets have no common meaning of address spaces (was: [ping] Re-unify 'omp_build_component_ref' and 'oacc_build_component_ref')

2021-09-03 Thread Andrew Stubbs
On 24/08/2021 12:43, Richard Biener via Gcc-patches wrote: On Tue, Aug 24, 2021 at 12:23 PM Thomas Schwinge wrote: Hi! On 2021-08-19T22:13:56+0200, I wrote: On 2021-08-16T10:21:04+0200, Jakub Jelinek wrote: On Mon, Aug 16, 2021 at 10:08:42AM +0200, Thomas Schwinge wrote: |> Concerning

[OG11, committed] libgomp amdgcn: Fix issues with dynamic OpenMP thread scaling

2021-08-04 Thread Andrew Stubbs
This patch fixes a bug in which testcases using thread_limit larger than the number of physical threads would crash with a memory fault. This was exacerbated in testcases with a lot of register pressure because the autoscaling reduces the number of physical threads to compensate for the

Re: [committed] amdgcn: Fix attributes for LLVM-12 [PR 100208]

2021-07-29 Thread Andrew Stubbs
On 29/07/2021 08:34, Richard Biener wrote: On Wed, Jul 28, 2021 at 3:04 PM Andrew Stubbs wrote: This patch follows up my previous patch and supports more variants of LLVM 12. There are still other incompatibilities with LLVM 12, but this at least the ELF attributes should now automatically

[OG11, committed] amdgcn: Fix attributes for LLVM-12 [PR 100208]

2021-07-29 Thread Andrew Stubbs
Now backported to devel/omp/gcc-11. Andrew On 28/07/2021 14:03, Andrew Stubbs wrote: This patch follows up my previous patch and supports more variants of LLVM 12. There are still other incompatibilities with LLVM 12, but this at least the ELF attributes should now automatically tune to any

[committed] amdgcn: Fix attributes for LLVM-12 [PR 100208]

2021-07-28 Thread Andrew Stubbs
This patch follows up my previous patch and supports more variants of LLVM 12. There are still other incompatibilities with LLVM 12, but this at least the ELF attributes should now automatically tune to any LLVM 9, 10, or 12 assembler (It would be nice if one set of options would just work

[og11][committed] amdgcn: Add -mxnack and -msram-ecc [PR 100208]

2021-07-20 Thread Andrew Stubbs
This is now backported to devel/omp/gcc-11. Andrew On 19/07/2021 17:49, Andrew Stubbs wrote: This patch adds two new GCN-specific options: -mxnack and -msram-ecc={on,off,any}. The primary purpose is to ensure that we have an explicit default setting for these features

[committed] amdgcn: Add -mxnack and -msram-ecc [PR 100208]

2021-07-19 Thread Andrew Stubbs
This patch adds two new GCN-specific options: -mxnack and -msram-ecc={on,off,any}. The primary purpose is to ensure that we have an explicit default setting for these features and that this is passed to the assembler. This will ensure that if LLVM defaults change, again, GCC won't get caught

Re: [gcn] Work-around libgomp 'error: array subscript 0 is outside array bounds of ‘__lds struct gomp_thread * __lds[0]’ [-Werror=array-bounds]' (was: [PATCH libatomic/arm] avoid warning on constant a

2021-07-19 Thread Andrew Stubbs
On 19/07/2021 09:46, Thomas Schwinge wrote: GCN already uses address 4 for this value because address 0 caused problems with null-pointer checks. Ugh. How much wasted bytes per what is that? (I haven't looked yet; hopefully not per GPU thread?) Because: It's 4 bytes per gang. And that

Re: [PATCH libatomic/arm] avoid warning on constant addresses (PR 101379)

2021-07-17 Thread Andrew Stubbs
On 16/07/2021 18:42, Thomas Schwinge wrote: Of course, we may simply re-work the libgomp/GCN code -- but don't we first need to answer the question whether the current code is actually "bad"? Aren't we going to get a lot of similar reports from kernel/embedded/other low-level software

Re: [wwwdocs] gcc-12/changes.html: OpenMP + GCN update

2021-06-23 Thread Andrew Stubbs
On 23/06/2021 10:53, Tobias Burnus wrote: + additionally the following features which were available in C and C++ + before: depobj, mutexinoutset and I realise that you did not invent this awkward wording, but I'd prefer ... "the following features that were previously only

Re: [PATCH 3/3] [amdgcn] Add hook for DWARF address spaces.

2021-06-23 Thread Andrew Stubbs
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote: Map GCN address spaces to the proposed DWARF address spaces defined by AMD at https://llvm.org/docs/AMDGPUUsage.html#amdgpu-dwarf-address-class-mapping-table gcc/ * config/gcn/gcn.c: Include dwarf2.h. (gcn_addr_space_debug): New

Re: [PATCH 2/3] [amdgcn] Use frame pointer for CFA expressions.

2021-06-23 Thread Andrew Stubbs
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote: As size of address is bigger than registers in amdgcn, we are forced to use DW_CFA_def_cfa_expression to make an expression that concatenates multiple registers for the value of the CFA. This then prohibits us from using many of the dwarf ops which

Re: [PATCH 1/3] [amdgcn] Update CFI configuration

2021-06-23 Thread Andrew Stubbs
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote: Currently we don't get any call frame information for the amdgcn target. This patch makes necessary adjustments to generate CFI that can work with ROCGDB (ROCm 3.8+). gcc/ * config/gcn/gcn.c (move_callee_saved_registers): Emit CFI notes for

Re: [PATCH 1/5] amdgcn: Use unsigned types for udivsi3/umodsi3 libgcc helper args/return

2021-06-18 Thread Andrew Stubbs
On 18/06/2021 15:19, Julian Brown wrote: This patch changes the argument and return types for the libgcc __udivsi3 and __umodsi3 helper functions for GCN to USItype instead of SItype. This is probably just cosmetic in practice. I can probably self-approve this, but I'll give Andrew Stubbs

Re: [PATCH 4/5] amdgcn: Enable support for TImode for AMD GCN

2021-06-18 Thread Andrew Stubbs
th alternatives for all operations that might be needed). Those gaps are filled in by this patch, or by the preceding patches in the series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.c (gcn_

Re: [PATCH 3/5] amdgcn: Add clrsbsi2/clrsbdi2 implementation

2021-06-18 Thread Andrew Stubbs
up the result afterwards. These patterns are lost from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown

Re: [PATCH 2/5] amdgcn: Add [us]mulsi3_highpart SGPR alternatives & [us]mulsid3/muldi3 expanders

2021-06-18 Thread Andrew Stubbs
t from libgcc if we build it for DImode/TImode rather than SImode/DImode, a change we make in a later patch in this series. I can probably self-approve this, but I'll give Andrew Stubbs a chance to comment. Thanks, Julian 2021-06-18 Julian Brown gcc/ * config/gcn/gcn.md (mulsi3_highpart

Re: [PATCH] gcc/configure.ac: fix register issue for global_load assembler functions

2021-06-14 Thread Andrew Stubbs
On 14/06/2021 13:36, Julian Brown wrote: On Wed, 9 Jun 2021 16:47:21 +0200 Marcel Vollweiler wrote: This patch fixes an issue with global_load assembler functions leading to a "invalid operand for instruction" error since in different LLVM versions those functions use either one or two

Re: [Patch?][RFC][RTL] clobber handling & buildin expansion - missing insn_invalid_p call [PR100418]

2021-06-02 Thread Andrew Stubbs
On 30/05/2021 19:51, Jeff Law wrote: On 5/5/2021 7:50 AM, Tobias Burnus wrote: Hi Eric, hi all, currently, gcn (amdgcn-amdhsa) bootstrapping fails as Alexandre's patch to __builtin_memset (applied yesterday) now does more expansions. The problem is [→ PR100418]  

Re: [committed] amdgcn: disable TImode

2021-05-07 Thread Andrew Stubbs
will at least build. I suspect we'll see some real failures here soon though. Andrew On 07/05/2021 23:45, Andrew Stubbs wrote: On 07/05/2021 18:11, Tobias Burnus wrote: On 07.05.21 18:35, Andrew Stubbs wrote: TImode has always been a problem on amdgcn, and now it is causing many new test failures, so

Re: [committed] amdgcn: disable TImode

2021-05-07 Thread Andrew Stubbs
On 07/05/2021 18:11, Tobias Burnus wrote: On 07.05.21 18:35, Andrew Stubbs wrote: TImode has always been a problem on amdgcn, and now it is causing many new test failures, so I'm disabling it. Does still still work with libgomp? The patch sounds as if it might cause problems

[committed] amdgcn: disable TImode

2021-05-07 Thread Andrew Stubbs
TImode has always been a problem on amdgcn, and now it is causing many new test failures, so I'm disabling it. The mode only has move instructions defined, which was enough for SLP, but any other code trying to use it without checking the optabs is a problem. The mode remains available for

[PATCH] builtins.c: Ensure emit_move_insn operands are valid (PR100418)

2021-05-07 Thread Andrew Stubbs
A recent patch from Alexandre added new calls to emit_move_insn with PLUS expressions in the operands. Apparently this works fine on (at least) x86_64, but fails on (at least) amdgcn, where the adddi3 patten has clobbers that the movdi3 does not. This results in ICEs in recog. This patch

Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-04-18 Thread Andrew Stubbs
On 16/04/2021 18:30, Thomas Schwinge wrote: Hi! On 2021-04-16T17:05:24+0100, Andrew Stubbs wrote: On 15/04/2021 18:26, Thomas Schwinge wrote: and optimisation, since shared memory might be faster than the main memory on a GPU. Do we potentially have a problem that making more use

Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-04-16 Thread Andrew Stubbs
On 15/04/2021 18:26, Thomas Schwinge wrote: and optimisation, since shared memory might be faster than the main memory on a GPU. Do we potentially have a problem that making more use of (scarce) gang-private memory may negatively affect peformance, because potentially fewer OpenACC gangs may

Re: [committed] amdgcn: Silence warnings in gcn.c

2021-03-19 Thread Andrew Stubbs
This follow-up fixes a typo in the placement of the close quote. Thanks to Tobias for pointing it out. Andrew On 18/03/2021 17:41, Andrew Stubbs wrote: This patch has no functional changes; it merely cleans up some warning messages. Thanks to Jan-Benedict for pointing them out, off-list

[committed] amdgcn: Silence warnings in gcn.c

2021-03-18 Thread Andrew Stubbs
This patch has no functional changes; it merely cleans up some warning messages. Thanks to Jan-Benedict for pointing them out, off-list. Andrew amdgcn: Silence warnings in gcn.c This fixes a few cases of "unquoted identifier or keyword", one "spurious trailing punctuation sequence", and a

[committed][OG10] amdgcn: Fix early-debug relocations

2021-03-06 Thread Andrew Stubbs
This patch is now backported to devel/omp/gcc-10. Andrew On 26/11/2020 14:41, Andrew Stubbs wrote: This patch fixes an error in GCN mkoffload that corrupted relocations in the early-debug info. The code now updates the relocation code without zeroing the symbol index. Andrew

[committed][OG10] DWARF: late code range fixup

2021-03-06 Thread Andrew Stubbs
This patch fixes up the DWARF code ranges for offload debugging, again. This time it defers the changes until most other DWARF generation has occurred, because the previous method was causing ICEs on some testcases. This patch will be proposed for mainline in stage 1. Andrew DWARF: late

[commit][OG10] nvptx: remove erroneous stack deletion

2021-03-02 Thread Andrew Stubbs
This patch fixes an OpenMP performance issue on NVPTX. The problem is that it deallocates the stack memory when it shouldn't, forcing the GOMP_OFFLOAD_run function to allocate the stack space again, before every kernel launch. The memory is only meant to be deallocated when a data

Re: [RFC] DWARF address spaces for local variables

2021-02-04 Thread Andrew Stubbs
Ping. On 22/01/2021 11:42, Andrew Stubbs wrote: Hi all, Jakub, I need to implement DWARF for local variables that exist in an alternative address space. This happens for OpenACC gang-private variables (or will when the patches are committed) on AMD GCN, at least. This is distinct from

[committed] amdgcn: Add gfx908 support

2021-02-03 Thread Andrew Stubbs
This patch adds a new -march option and multilib configuration to the amdgcn GPU target. The patch does not attempt to support any of the new features of the gfx908 devices, but does set the correct ELF flags etc. that are expected by the ROCm runtime. The GFX908 devices are not generally

[OG10][committed] amdgcn: Allow V64DFmode min/max reductions

2021-01-26 Thread Andrew Stubbs
Now backported to devel/omp/gcc-10. On 26/01/2021 10:29, Andrew Stubbs wrote: This patch fixes and AMD GCN bug in which attempting to use DFmode vector reductions would cause an ICE. There's no reason not to allow the reductions, so we simply enable them thusly. Andrew

[committed] amdgcn: Allow V64DFmode min/max reductions

2021-01-26 Thread Andrew Stubbs
This patch fixes and AMD GCN bug in which attempting to use DFmode vector reductions would cause an ICE. There's no reason not to allow the reductions, so we simply enable them thusly. Andrew amdgcn: Allow V64DFmode min/max reductions I don't know why these were disabled. There're no direct

Re: [RFC] DWARF address spaces for local variables

2021-01-22 Thread Andrew Stubbs
On 22/01/2021 11:42, Andrew Stubbs wrote: @@ -20294,15 +20315,6 @@ add_location_or_const_value_attribute (dw_die_ref die, tree decl, bool cache_p) if (list) { add_AT_location_description (die, DW_AT_location, list); - - addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (decl

[RFC] DWARF address spaces for local variables

2021-01-22 Thread Andrew Stubbs
Hi all, Jakub, I need to implement DWARF for local variables that exist in an alternative address space. This happens for OpenACC gang-private variables (or will when the patches are committed) on AMD GCN, at least. This is distinct from pointer variables that reference other address

Re: [committed][OG10] Fix offload dwarf info

2021-01-16 Thread Andrew Stubbs
On 15/01/2021 11:43, Andrew Stubbs wrote: This patch corrects a problem in which GDB ignores the debug info for offload kernel entry functions because they're represented as nested functions inside a function that does not exist on the accelerator device (only on the host). Apparently I had

<    1   2   3   4   5   6   7   8   9   10   >