This is the front-end portion of the Unified Shared Memory implementation.
It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets
flag_offload_memory, but is otherwise inactive, for now.
It also checks that -foffload-memory isn't set to an incompatible mode.
Currently we are only handling omp allocate directive that is associated
with an allocate statement. This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive. But the free calls come in a separate cleanup block. To
This adds support for using Cuda Managed Memory with omp_alloc. It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.
There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can
Add a new option. It's inactive until I add some follow-up patches.
gcc/ChangeLog:
* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
gcc/common.opt | 16
.
co-authored-by: Andrew Stubbs
---
gcc/omp-low.cc | 174 +++
gcc/passes.def | 1 +
gcc/testsuite/c-c++-common/gomp/usm-2.c | 46 ++
gcc/testsuite/c-c++-common/gomp/usm-3.c | 44 ++
gcc/testsuite/g++.dg/gomp/usm-1
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.
The allocator is equivalent to using a custom allocator with the
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.
libgomp/ChangeLog:
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add
memory that's both high-bandwidth and pinned anyway).
Patches 15 to 17 are new work. I can probably approve these myself, but
they can't be committed until the rest of the series is approved.
Andrew
Andrew Stubbs (11):
libgomp, nvptx: low-latency memory allocator
libgomp: pinned memory
libgomp
On 29/06/2022 11:45, Jakub Jelinek wrote:
And omp_init_allocator needs to decide what to do if one asks for features
that need memkind as well as for features that need whatever you/Abid have
been working on. A possible resolution is punt (return omp_null_allocator),
or prefer one feature over
On 09/06/2022 09:19, Jakub Jelinek via Gcc-patches wrote:
+ switch (memspace)
+{
+case omp_high_bw_mem_space:
+#ifdef LIBGOMP_USE_MEMKIND
+ struct gomp_memkind_data *memkind_data;
+ memkind_data = gomp_get_memkind ();
+ if (data.partition == omp_atv_interleaved
+
I've pushed these three patches to the devel/omp/gcc-11 branch ("OG11").
I'll be submitting mainline versions soonish.
The patches add a means to track "requires unified_shared_memory" from
the frontend, through the backend compiler, and on to the runtime, plus
all the bits needed to
This setting is way out of date; global constructors have worked on GCN
for a while now.
Andrewamdgcn: test global constructors
The tests are disabled for historical reasons only.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_global_constructor):
This patch removed some workarounds that were required for old versions
of the LLVM assembler. The minimum supported version is now 13.0.1 so
the workarounds are no longer needed.
Andrewamdgcn: remove obsolete assembler workarounds
This nonsense is no longer required, now that the minimum
On 07/06/2022 13:10, Jakub Jelinek wrote:
On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote:
Following some feedback from users of the OG11 branch I think I need to
withdraw this patch, for now.
The memory pinned via the mlock call does not give the expected performance
boost. I
how that'll handle heterogenous systems, but those ought to be rare.
I don't think libmemkind will resolve this performance issue, although
certainly it can be used for host implementations of low-latency
memories, etc.
Andrew
On 13/01/2022 13:53, Andrew Stubbs wrote:
On 05/01/2022 17:07
On 27/05/2022 20:16, Thomas Schwinge wrote:
Hi Andrew!
On 2022-05-24T16:27:52+0100, Andrew Stubbs wrote:
I've committed this patch to set the minimum required LLVM version, for
the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a
prerequisite for the gfx90a support, and 13.0.1
On 25/05/2022 12:16, Tobias Burnus wrote:
On 25.05.22 11:18, Andrew Stubbs wrote:
On 24/05/2022 17:44, Tobias Burnus wrote:
On 24.05.22 17:31, Andrew Stubbs wrote:
amdgcn: Add gfx90a support
I've deliberately avoided the MI100 and MI200 names because they're
really not that simple. MI100
On 24/05/2022 17:44, Tobias Burnus wrote:
On 24.05.22 17:31, Andrew Stubbs wrote:
amdgcn: Add gfx90a support
Attached is an attempt to update invoke.texi
I've deliberately avoided the MI100 and MI200 names because they're
really not that simple. MI100 is gfx908, but MI150 is gfx906
I've committed this patch to add support for gfx90a AMD GPU devices.
The patch updates all the places that have architecture/ISA specific
code, tidies up the ISA naming and handling in the backend, and adds a
new multilib.
This is just lightly tested at this point, but there are no known
I've committed this patch to set the minimum required LLVM version, for
the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a
prerequisite for the gfx90a support, and 13.0.1 is now the oldest
version not known to have compatibility issues.
The patch removes all the obsolete feature
On 19/05/2022 17:00, Jakub Jelinek wrote:
Without requires dynamic_allocators, there are various extra restrictions
imposed:
1) omp_init_allocator/omp_destroy_allocator may not be called (except for
implicit calls to it from uses_allocators) in a target region
I interpreted that more like
On 16/05/2022 11:28, Tobias Burnus wrote:
While 'vendor' and 'kind' is well defined, 'arch' and 'isa' isn't.
When looking at an 'metadirective' testcase (which oddly uses 'arch(amd)'),
I noticed that LLVM uses 'arch(amdgcn)' while we use 'gcn', cf. e.g.
On 06/04/2022 11:02, Thomas Schwinge wrote:
Hi!
On 2021-01-14T15:50:23+0100, I wrote:
I'm raising here an issue with HSA libgomp plugin code changes from a
while ago. While HSA is now no longer relevant for GCC master branch,
the same code has also been copied into the GCN libgomp plugin.
This patch adds enough support for "requires unified_address" to make
the sollve_vv testcases pass. It implements unified_address as a synonym
of unified_shared_memory, which is both valid and the only way I know of
to unify addresses with Cuda (could be wrong).
This patch should be applied
This patch adjusts the testcases, previously proposed, to allow for
testing on machines with varying page sizes and default amounts of
lockable memory. There turns out to be more variation than I had thought.
This should go on mainline at the same time as the previous patches in
this thread.
On 02/04/2022 13:04, Andrew Stubbs wrote:
This additional patch adds transformation for omp_target_alloc. The
OpenMP 5.0 document says that addresses allocated this way needs to work
without is_device_ptr. The easiest way to make that work is to make them
USM addresses.
Actually, reading
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote:
This patches changes calls to malloc/free/calloc/realloc and operator new to
memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc.
This additional patch adds transformation for omp_target_alloc. The
OpenMP 5.0
On 08/03/2022 11:30, Hafiz Abid Qadeer wrote:
gcc/ChangeLog:
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.
This worked for x86_64, but I needed to make the attached adjustment to
work on powerpc without a linker error.
On 13/01/2021 15:07, Chung-Lin Tang wrote:
We currently emit errors, but do not fatally cause exit of the program
if those
are not met. We're still unsure if complete block-out of program
execution is the right
thing for the user. This can be discussed later.
After the Unified Shared Memory
On 09/03/2022 16:29, Tobias Burnus wrote:
This shows up with with OpenMP offloading as libgomp since a couple
of months uses __atomic_compare_exchange (see PR for details),
causing link errors when the gcn libgomp.a is linked.
It also shows up with sollve_vv.
The implementation does a bit
On 14/02/2022 14:13, Andrew Stubbs wrote:
I've committed this fix for an ICE compiling sollve_vv testcase
test_target_teams_distribute_defaultmap.c.
Somehow the optimizers result in a vector reduction on a vector of
duplicated constants. This was a case the backend didn't handle, so we
ended
I've committed this fix for an ICE compiling sollve_vv testcase
test_target_teams_distribute_defaultmap.c.
Somehow the optimizers result in a vector reduction on a vector of
duplicated constants. This was a case the backend didn't handle, so we
ended up with an unrecognised instruction ICE.
On 02/02/2022 15:39, Tobias Burnus wrote:
On 09.08.21 15:55, Tobias Burnus wrote:
Now that the GCN/OpenACC patches for this have been committed today,
I think it makes sense to add it to the documentation.
(I was told that some follow-up items are still pending, but as
the feature does work
This patch adjusts the NVPTX low-latency allocator that I have
previously posted (awaiting re-review). The patch assumes that all my
previously posted patches are applied already.
Given that any memory allocated from the low-latency memory space cannot
support the "access=all" allocator trait
This patch adds a new predefined allocator named ompx_pinned_mem_alloc
as an extension to the OpenMP standard. It is intended as a convenient
way to allocate pinned memory using the Linux support patch I posted
recently. I anticipate it being used by compiler internals in future as
part of a
On 18/01/2022 12:25, Thomas Schwinge wrote:
Hi!
Maybe I'm just totally confused -- as so often ;-) -- but things seem
strange here:
On 2022-01-12T10:43:05+0100, Marcel Vollweiler wrote:
Currently omp_get_device_num does not work on gcn targets with more than
one offload device. The reason is
Sorry, I had not seen that this was entirely within my amdgcn remit
On 12/01/2022 09:43, Marcel Vollweiler wrote:
Hi,
Currently omp_get_device_num does not work on gcn targets with more than
one offload device. The reason is that GOMP_DEVICE_NUM_VAR is static in
icv-device.c and thus
On 05/01/2022 17:07, Andrew Stubbs wrote:
I don't believe 64KB will be anything like enough for any real HPC
application. Is it really worth optimizing for this case?
Anyway, I'm working on an implementation using mmap instead of malloc
for pinned allocations. I figure that will simplify
Updated patch: this version fixes some missed cases of malloc in the
realloc implementation. It also reworks the unused variable workarounds
so that the work better with my reworked pinned memory patches I've not
posted yet.
Andrewlibgomp, nvptx: low-latency memory allocator
This patch adds
On 06/01/2022 17:53, Tom de Vries wrote:
My current understanding is that this is a backend problem, and needs to
be fixed by defining atomic_store patterns which take care of this
peculiarity.
You mentioned on IRC that I ought to initialize the free chain using
atomics also, and that you
On 04/01/2022 18:47, Jakub Jelinek wrote:
On Tue, Jan 04, 2022 at 07:28:29PM +0100, Jakub Jelinek via Gcc-patches wrote:
Other issues in the patch are that it doesn't munlock on deallocation and
that because of that deallocation we need to figure out what to do on page
boundaries. As
On 05/01/2022 13:04, Tom de Vries wrote:
On 1/5/22 12:08, Tom de Vries wrote:
The allocators-1.c test-case doesn't compile because:
...
FAIL: libgomp.c/allocators-1.c (test for excess errors)
Excess errors:
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22:
On 05/01/2022 11:08, Tom de Vries wrote:
The alloc-7.c execution test failure is a regression, AFAICT. It fails
here:
...
38 if uintptr_t) p) % __alignof (int)) != 0 || p[0] || p[1]
|| p[2])
39 abort ();
...
because:
...
(gdb) p p[0]
$2 = 772014104
(gdb) p p[1]
$3 = 0
On 05/01/2022 10:24, Tom de Vries wrote:
On 12/21/21 12:33, Andrew Stubbs wrote:
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is
necessary to bump the minimum supported PTX version from 3.1 (~2013)
to 4.1 (~2014).
Tobias has pointed out
On 04/01/2022 15:55, Jakub Jelinek wrote:
The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but
instead add libgomp/config/linux/allocator.c that includes some headers,
defines some macros and then includes the generic allocator.c.
OK, good point, I can do that.
I
This patch implements the OpenMP pinned memory trait for Linux hosts. On
other hosts and on devices the trait becomes a no-op (instead of being
rejected).
The memory is locked via the mlock syscall, which is both the "correct"
way to do it on Linux, and a problem because the default ulimit
This is now backported to the devel/omp/gcc-11 branch (OG11).
Andrew
On 09/12/2021 11:41, Andrew Stubbs wrote:
On 02/12/2021 16:43, Jakub Jelinek wrote:
On Thu, Dec 02, 2021 at 04:31:36PM +, Andrew Stubbs wrote:
On 02/12/2021 16:05, Andrew Stubbs wrote:
On 02/12/2021 12:58, Jakub
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is necessary
to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014).
Tobias has pointed out, privately, that the default version is both
documented and encoded in the -mptx
This patch is submitted now for review and so I can commit a backport it
to the OG11 branch, but isn't suitable for mainline until stage 1.
The patch implements support for omp_low_lat_mem_space and
omp_low_lat_mem_alloc on NVPTX offload devices. The omp_pteam_mem_alloc,
omp_cgroup_mem_alloc
Hi all,
This patch removes the "sorry" message for the OpenMP "requires
dynamic_allocators" feature in C, C++ and Fortran.
The clause is supposed to state that the user code will not work without
the omp_alloc/omp_free and omp_init_allocator/omp_destroy_allocator and
these things *are*
On 02/12/2021 16:43, Jakub Jelinek wrote:
On Thu, Dec 02, 2021 at 04:31:36PM +, Andrew Stubbs wrote:
On 02/12/2021 16:05, Andrew Stubbs wrote:
On 02/12/2021 12:58, Jakub Jelinek wrote:
I've tried modifying offload_handle_link_vars but that spot
doesn't catch
the omp_data_sizes variables
On 02/12/2021 16:05, Andrew Stubbs wrote:
On 02/12/2021 12:58, Jakub Jelinek wrote:
I've tried modifying offload_handle_link_vars but that spot doesn't
catch
the omp_data_sizes variables emitted by
libgomp.c-c++-common/target_42.c,
which was one of the motivating examples.
Why doesn't catch
On 02/12/2021 12:58, Jakub Jelinek wrote:
I've tried modifying offload_handle_link_vars but that spot doesn't catch
the omp_data_sizes variables emitted by libgomp.c-c++-common/target_42.c,
which was one of the motivating examples.
Why doesn't catch it? Is the variable created only post-IPA?
On 30/11/2021 16:54, Jakub Jelinek wrote:
Why does the GCN plugin or runtime need to know those vars?
It needs to know the single array that contains their addresses of course...
With older LLVM there were issues with relocations that made it
impossible to link the the offload_var_table. This
If committed this patch to fix the amdgcn ICE reported in PR103396.
The problem was that it was mis-counting the number of registers to save
when the link register was only clobbered implicitly by calls. The issue
is easily fixed by adjusting the condition to match elsewhere in the
same
Hi,
This patch is needed for AMD GCN offloading when we use the assembler
from LLVM 13+.
The GCN runtime (libgomp+ROCm) requires that the location of all
variables in the offloaded variables table are discoverable at runtime
(using the "hsa_executable_symbol_get_info" API), and this only
This is fine by me.
As I said in my email on the 15th, LLVM 13 is still not considered safe
to use. The ICE you encountered is a real problem that will affect real
users.
I expect to work on a solution for that soon.
Andrew
On 16/10/2021 21:41, Tobias Burnus wrote:
This patch is mostly
This is a follow-up to my previous LLVM13 support patches (the amdgcn
port uses the LLVM assembler) to fix up a corner case.
With this patch one can now enable debug information in LLVM 13 offload
binaries. This was trickier than you'd think because the different LLVM
versions have different
I've committed this patch to fix another case of LLVM assembler
incompatibility. Marcel previously posted a patch to fix up the
global_load and global_store instructions, following a
non-backwards-compatible change in the assembler.
I've committed this patch to implement the -msram-ecc=any feature that
has been stubbed out awaiting LLVM support for a while now.
When the LLVM assembler supports the "any" feature (v13+) GCC will now
make use of it. Otherwise, GCC will continue to treat "any" the same as
"on".
Using the
I've committed this patch to allow GCC to adapt to the different
variants of the LLVM amdgcn assembler. Unfortunately they keep making
changes without maintaining backwards compatibility.
GCC should now work with LLVM 9, LLVM 12, and LLVM 13 in terms of CLI
usage, however only LLVM 9 is well
On 24/08/2021 12:43, Richard Biener via Gcc-patches wrote:
On Tue, Aug 24, 2021 at 12:23 PM Thomas Schwinge
wrote:
Hi!
On 2021-08-19T22:13:56+0200, I wrote:
On 2021-08-16T10:21:04+0200, Jakub Jelinek wrote:
On Mon, Aug 16, 2021 at 10:08:42AM +0200, Thomas Schwinge wrote:
|> Concerning
This patch fixes a bug in which testcases using thread_limit larger than
the number of physical threads would crash with a memory fault. This was
exacerbated in testcases with a lot of register pressure because the
autoscaling reduces the number of physical threads to compensate for the
On 29/07/2021 08:34, Richard Biener wrote:
On Wed, Jul 28, 2021 at 3:04 PM Andrew Stubbs wrote:
This patch follows up my previous patch and supports more variants of
LLVM 12.
There are still other incompatibilities with LLVM 12, but this at least
the ELF attributes should now automatically
Now backported to devel/omp/gcc-11.
Andrew
On 28/07/2021 14:03, Andrew Stubbs wrote:
This patch follows up my previous patch and supports more variants of
LLVM 12.
There are still other incompatibilities with LLVM 12, but this at least
the ELF attributes should now automatically tune to any
This patch follows up my previous patch and supports more variants of
LLVM 12.
There are still other incompatibilities with LLVM 12, but this at least
the ELF attributes should now automatically tune to any LLVM 9, 10, or
12 assembler (It would be nice if one set of options would just work
This is now backported to devel/omp/gcc-11.
Andrew
On 19/07/2021 17:49, Andrew Stubbs wrote:
This patch adds two new GCN-specific options: -mxnack and
-msram-ecc={on,off,any}.
The primary purpose is to ensure that we have an explicit default
setting for these features
This patch adds two new GCN-specific options: -mxnack and
-msram-ecc={on,off,any}.
The primary purpose is to ensure that we have an explicit default
setting for these features and that this is passed to the assembler.
This will ensure that if LLVM defaults change, again, GCC won't get
caught
On 19/07/2021 09:46, Thomas Schwinge wrote:
GCN already uses address 4 for this value because address 0 caused
problems with null-pointer checks.
Ugh. How much wasted bytes per what is that? (I haven't looked yet;
hopefully not per GPU thread?) Because:
It's 4 bytes per gang. And that
On 16/07/2021 18:42, Thomas Schwinge wrote:
Of course, we may simply re-work the libgomp/GCN code -- but don't we
first need to answer the question whether the current code is actually
"bad"? Aren't we going to get a lot of similar reports from
kernel/embedded/other low-level software
On 23/06/2021 10:53, Tobias Burnus wrote:
+ additionally the following features which were available in C and C++
+ before: depobj, mutexinoutset and
I realise that you did not invent this awkward wording, but I'd prefer ...
"the following features that were previously only
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote:
Map GCN address spaces to the proposed DWARF address spaces defined by AMD at
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-dwarf-address-class-mapping-table
gcc/
* config/gcn/gcn.c: Include dwarf2.h.
(gcn_addr_space_debug): New
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote:
As size of address is bigger than registers in amdgcn, we are forced to use
DW_CFA_def_cfa_expression to make an expression that concatenates multiple
registers for the value of the CFA. This then prohibits us from using many
of the dwarf ops which
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote:
Currently we don't get any call frame information for the amdgcn target.
This patch makes necessary adjustments to generate CFI that can work with
ROCGDB (ROCm 3.8+).
gcc/
* config/gcn/gcn.c (move_callee_saved_registers): Emit CFI notes for
On 18/06/2021 15:19, Julian Brown wrote:
This patch changes the argument and return types for the libgcc __udivsi3
and __umodsi3 helper functions for GCN to USItype instead of SItype.
This is probably just cosmetic in practice.
I can probably self-approve this, but I'll give Andrew Stubbs
th alternatives for all operations that
might be needed). Those gaps are filled in by this patch, or by the
preceding patches in the series.
I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.
Thanks,
Julian
2021-06-18 Julian Brown
gcc/
* config/gcn/gcn.c (gcn_
up the result afterwards.
These patterns are lost from libgcc if we build it for DImode/TImode
rather than SImode/DImode, a change we make in a later patch in this
series.
I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.
Thanks,
Julian
2021-06-18 Julian Brown
t from libgcc if we build it
for DImode/TImode rather than SImode/DImode, a change we make in a later
patch in this series.
I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.
Thanks,
Julian
2021-06-18 Julian Brown
gcc/
* config/gcn/gcn.md (mulsi3_highpart
On 14/06/2021 13:36, Julian Brown wrote:
On Wed, 9 Jun 2021 16:47:21 +0200
Marcel Vollweiler wrote:
This patch fixes an issue with global_load assembler functions leading
to a "invalid operand for instruction" error since in different LLVM
versions those functions use either one or two
On 30/05/2021 19:51, Jeff Law wrote:
On 5/5/2021 7:50 AM, Tobias Burnus wrote:
Hi Eric, hi all,
currently, gcn (amdgcn-amdhsa) bootstrapping fails as Alexandre's
patch to __builtin_memset (applied yesterday) now does more expansions.
The problem is [→ PR100418]
will at least build.
I suspect we'll see some real failures here soon though.
Andrew
On 07/05/2021 23:45, Andrew Stubbs wrote:
On 07/05/2021 18:11, Tobias Burnus wrote:
On 07.05.21 18:35, Andrew Stubbs wrote:
TImode has always been a problem on amdgcn, and now it is causing many
new test failures, so
On 07/05/2021 18:11, Tobias Burnus wrote:
On 07.05.21 18:35, Andrew Stubbs wrote:
TImode has always been a problem on amdgcn, and now it is causing many
new test failures, so I'm disabling it.
Does still still work with libgomp?
The patch sounds as if it might cause problems
TImode has always been a problem on amdgcn, and now it is causing many
new test failures, so I'm disabling it.
The mode only has move instructions defined, which was enough for SLP,
but any other code trying to use it without checking the optabs is a
problem.
The mode remains available for
A recent patch from Alexandre added new calls to emit_move_insn with
PLUS expressions in the operands. Apparently this works fine on (at
least) x86_64, but fails on (at least) amdgcn, where the adddi3 patten
has clobbers that the movdi3 does not. This results in ICEs in recog.
This patch
On 16/04/2021 18:30, Thomas Schwinge wrote:
Hi!
On 2021-04-16T17:05:24+0100, Andrew Stubbs wrote:
On 15/04/2021 18:26, Thomas Schwinge wrote:
and optimisation, since shared memory might be faster than
the main memory on a GPU.
Do we potentially have a problem that making more use
On 15/04/2021 18:26, Thomas Schwinge wrote:
and optimisation, since shared memory might be faster than
the main memory on a GPU.
Do we potentially have a problem that making more use of (scarce)
gang-private memory may negatively affect peformance, because potentially
fewer OpenACC gangs may
This follow-up fixes a typo in the placement of the close quote.
Thanks to Tobias for pointing it out.
Andrew
On 18/03/2021 17:41, Andrew Stubbs wrote:
This patch has no functional changes; it merely cleans up some warning
messages.
Thanks to Jan-Benedict for pointing them out, off-list
This patch has no functional changes; it merely cleans up some warning
messages.
Thanks to Jan-Benedict for pointing them out, off-list.
Andrew
amdgcn: Silence warnings in gcn.c
This fixes a few cases of "unquoted identifier or keyword", one "spurious
trailing punctuation sequence", and a
This patch is now backported to devel/omp/gcc-10.
Andrew
On 26/11/2020 14:41, Andrew Stubbs wrote:
This patch fixes an error in GCN mkoffload that corrupted relocations in
the early-debug info.
The code now updates the relocation code without zeroing the symbol index.
Andrew
This patch fixes up the DWARF code ranges for offload debugging, again.
This time it defers the changes until most other DWARF generation has
occurred, because the previous method was causing ICEs on some testcases.
This patch will be proposed for mainline in stage 1.
Andrew
DWARF: late
This patch fixes an OpenMP performance issue on NVPTX.
The problem is that it deallocates the stack memory when it shouldn't,
forcing the GOMP_OFFLOAD_run function to allocate the stack space again,
before every kernel launch.
The memory is only meant to be deallocated when a data
Ping.
On 22/01/2021 11:42, Andrew Stubbs wrote:
Hi all, Jakub,
I need to implement DWARF for local variables that exist in an
alternative address space. This happens for OpenACC gang-private
variables (or will when the patches are committed) on AMD GCN, at least.
This is distinct from
This patch adds a new -march option and multilib configuration to the
amdgcn GPU target. The patch does not attempt to support any of the new
features of the gfx908 devices, but does set the correct ELF flags etc.
that are expected by the ROCm runtime.
The GFX908 devices are not generally
Now backported to devel/omp/gcc-10.
On 26/01/2021 10:29, Andrew Stubbs wrote:
This patch fixes and AMD GCN bug in which attempting to use DFmode
vector reductions would cause an ICE.
There's no reason not to allow the reductions, so we simply enable them
thusly.
Andrew
This patch fixes and AMD GCN bug in which attempting to use DFmode
vector reductions would cause an ICE.
There's no reason not to allow the reductions, so we simply enable them
thusly.
Andrew
amdgcn: Allow V64DFmode min/max reductions
I don't know why these were disabled. There're no direct
On 22/01/2021 11:42, Andrew Stubbs wrote:
@@ -20294,15 +20315,6 @@ add_location_or_const_value_attribute (dw_die_ref die,
tree decl, bool cache_p)
if (list)
{
add_AT_location_description (die, DW_AT_location, list);
-
- addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (decl
Hi all, Jakub,
I need to implement DWARF for local variables that exist in an
alternative address space. This happens for OpenACC gang-private
variables (or will when the patches are committed) on AMD GCN, at least.
This is distinct from pointer variables that reference other address
On 15/01/2021 11:43, Andrew Stubbs wrote:
This patch corrects a problem in which GDB ignores the debug info for
offload kernel entry functions because they're represented as nested
functions inside a function that does not exist on the accelerator
device (only on the host).
Apparently I had
301 - 400 of 947 matches
Mail list logo