.
This move makes the binaries compatible with the new rocgdb from ROCm 3.5.
2020-06-17 Andrew Stubbs
gcc/
* config/gcn/gcn-hsa.h (TEXT_SECTION_ASM_OP): Use ".text".
(BSS_SECTION_ASM_OP): Use ".bss".
(ASM_SPEC): Remove "-mattr=-code-object-v3".
(LINK_SPEC): Add
This patch is now backported to the devel/omp/gcc-10 branch.
Andrew
On 17/06/2020 10:13, Andrew Stubbs wrote:
This upgrades the compiler to emit HSA Code Object v3 binaries. This
means changing the assembler directives, and linker command line options.
The gcn-run and libgomp loaders need
On 19/06/2020 17:00, Tobias Burnus wrote:
OK for mainline?
OK, thank you.
Andrew
This patch ensures that programs using vector extensions to pass vectors
to functions pass the vectors in memory.
Even though we could technically do this in registers, the ABI would
have to be reworked to do so, and there's no call for that yet (maybe if
we want vector libgcc/libm).
This ch
On 23/06/2020 16:21, Tobias Burnus wrote:
If the offloading code is (only) in a library, one can come up
with the idea to build those parts as shared library – and link
it to the nonoffloading code.(*)
Currently, this fails as the mkoffload calls the nonoffloading
compiler without the -fpic/-fPI
On 23/06/2020 20:36, Thomas Schwinge wrote:
Eventually (not now...), instead of special-casing more and more options
(I somehow doubt that '-fpic', '-fPIC' are the only ones?), shouldn't we
solve this in some more generic way, like re-invoking the host compiler
exactly as invoked before (if that
This patch configures the DWARF debug output to match the proposed DWARF
specification from AMD. This is already implemented in LLVM and rocgdb
(out of tree).
This makes no attempt to support CFI, yet, and has some issues with
vector registers. GCC will need to support some DWARF extensions t
On 29/06/2020 21:16, Julian Brown wrote:
Data-share write (ds_write) instructions do not necessarily complete
the write to LDS immediately. When a write completes, LGKM_CNT is
decremented. For now, we wait until LGKM_CNT reaches zero after each
ds_write instruction.
This fixes a race condition i
.
Co-Authored-By: Andrew Stubbs
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 0f07e51f7e8..6afe18d5ee0 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -8461,10 +8461,22 @@ push_target_argument_according_to_value (gimple_stmt_iterator *gsi, int device,
}
}
+static bool
On 02/07/2020 18:00, Jakub Jelinek wrote:
On Thu, Jul 02, 2020 at 05:15:20PM +0100, Andrew Stubbs wrote:
This patch, originally by Kwok, auto-adjusts the default OpenMP target
arguments to set num_threads(1) when there are no parallel regions. There
may still be multiple teams in this case
This patch implements a floating-point fold_left_plus vector pattern,
which gives a significant speed-up in the BabelStream "dot" benchmark.
The GCN architecture can't actually do an in-order vector reduction any
more efficiently than that equivalent scalar algorithm, so this is a bit
of a che
Now backported to OG10.
Andrew
On 03/07/2020 11:11, Andrew Stubbs wrote:
This patch implements a floating-point fold_left_plus vector pattern,
which gives a significant speed-up in the BabelStream "dot" benchmark.
The GCN architecture can't actually do an in-order vector redu
This tiny patch just cleans up some defines in the libgomp GCN plugin.
The code wouldn't compile, as it was, if elf.h is updated to support GCN
relocations. This should fix that. The other user, mkoffload.c, was
already fixed.
Andrew
Tweak plugin-gcn.c defines
Ensure the code will continue t
On 25/11/2020 11:36, Richard Biener wrote:
This makes sure to lower VECTOR_BOOLEAN_TYPE_P typed VEC_COND_EXPRs
so we don't try to use vcond to expand those. That's especially
improtant for x86 integer mode boolean vectors but eventually
as well for aarch64 / gcn VnBImode ones.
GCN does not hav
This libgomp OpenACC testcase makes assumptions about the order in which
loop iterations will run that are invalid on amdgcn. Apparently nvptx
does work that way, but I find that surprising in itself.
For example, this patch ensures that where a test expects one bit left
set, or unset, then it
This patch fixes an error in GCN mkoffload that corrupted relocations in
the early-debug info.
The code now updates the relocation code without zeroing the symbol index.
Andrew
Fix early-debug relocations
The relocation symbols were inadvertantly wiped when the type was set in
mkoffload.
gcc/
On 26/11/2020 14:48, Iain Sandoe wrote:
Rainer Orth wrote:
unfortunately, Solaris/SPARC results are miserable:
So without further investigation, we cannot use the leb128 directives
with Solaris/SPARC as.
I think Andrew was running GCN (not sure of the results there)
- but, I suppose tha
On 08/11/2022 14:35, Kwok Cheung Yeung wrote:
Hello
This patch adds three extra builtins for the vectorized forms of the
abs, floorf and floor math functions, which are implemented by native
GCN instructions. I have also added a test to check that they generate
the expected assembler instruct
On 16/11/2022 11:42, Tobias Burnus wrote:
This is a part of a patch by Andrew (hi!) - namely that part that only
adds the
__builtin_gcn_kernarg_ptr. More is planned, see below.
The short term benefit of this patch is to permit replacing hardcoded
numbers
by a builtin – like in libgomp (see pa
Recent patches have enabled new capabilities on AMD GCN, but not all the
testsuite features were enabled. The hardfp divide patch actually had an
test regression because the expected results were too conservative.
This patch corrects both issues.
Andrewamdgcn: update target-supports.exp
The b
The hardfp division patch exposed a flaw in the ldexp pattern at -O0;
the compiler was trying to use out-of-range immediates on VOP3
instruction encodings.
This patch changes the constraints appropriately, and also takes the
opportunity to combine the two patterns into one using the newly
ava
I've committed this to the devel/omp/gcc-12 branch.
The patch fixes a concurrency issue where the spin-locks didn't work
well if many GPU threads tried to free low-latency memory all at once.
Adding a short sleep instruction is enough for the hardware thread to
yield and allow another to proc
I've committed this patch to fix a couple of bugs introduced in the
recent CMul patch.
First, the fmsubadd insn was accidentally all adds and no substracts.
Second, there were input dependencies on the undefined output register
which caused the compiler to reserve unnecessary slots in the stac
On 05/05/2023 12:10, Tobias Burnus wrote:
Probably added for symmetry with out_mode/out_n but at the end not used.
That function was added in commit
r13-6423-gce9cd7258d0 amdgcn: Enable SIMD vectorization of math
functions
Tested the removal by building with that patch applied.
OK for mainl
On 01/02/2023 15:35, Paul-Antoine Arras wrote:
This patch introduces an instruction pattern for conditional shift
operations (cond_{ashl|ashr|lshr}) in the GCN machine description.
Tested on GCN3 Fiji gfx803.
OK to commit?
The changelog will need to be wrapped to 80 columns.
OK otherwise.
A
I've committed this patch to change the ways stacks are initialized on
amdgcn. The patch only touches GCN files, or the GCN-only portions of
libgomp files, so I'm allowing it despite stage 4 because I want the ABI
change done for GCC 13, and because it enables Tobias's reverse
offload-patch tha
On 02/02/2023 14:59, Tobias Burnus wrote:
Maybe it becomes better reviewable with an attached patch ...
On 02.02.23 15:31, Tobias Burnus wrote:
Now that the stack handling has been changed for AMDGCN, this patch
enables reverse offload.
(cf. today's "[committed] amdgcn, libgomp: Manually alloca
The -mstack-size option has been marked obsolete in favour of setting an
environment variable at runtime ("GCN_STACK_SIZE"), but some testcases
still need the option set or they have stack overflow. I could change
them to use the envvar, but my testing setup uses remote execute which
doesn't su
On 10/02/2023 14:21, Thomas Schwinge wrote:
Is the correct fix the following (conceptually like
'linux_memspace_alloc' cited above), or is there something that I fail to
understand?
static void *
linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int
pin)
{
On 10/02/2023 15:11, Thomas Schwinge wrote:
Hi!
Re OpenMP 'pinned' memory allocator trait semantics vs. 'omp_realloc':
On 2022-01-13T13:53:03+, Andrew Stubbs wrote:
On 05/01/2022 17:07, Andrew Stubbs wrote:
[...], I'm working on an implementation using mmap ins
On 24/05/2022 17:44, Tobias Burnus wrote:
On 24.05.22 17:31, Andrew Stubbs wrote:
amdgcn: Add gfx90a support
Attached is an attempt to update invoke.texi
I've deliberately avoided the MI100 and MI200 names because they're
really not that simple. MI100 is gfx908, but MI150 is
On 25/05/2022 12:16, Tobias Burnus wrote:
On 25.05.22 11:18, Andrew Stubbs wrote:
On 24/05/2022 17:44, Tobias Burnus wrote:
On 24.05.22 17:31, Andrew Stubbs wrote:
amdgcn: Add gfx90a support
I've deliberately avoided the MI100 and MI200 names because they're
really not that simple
On 27/05/2022 20:16, Thomas Schwinge wrote:
Hi Andrew!
On 2022-05-24T16:27:52+0100, Andrew Stubbs wrote:
I've committed this patch to set the minimum required LLVM version, for
the assembler and linker, to 13.0.1. An upgrade from LLVM 9 is a
prerequisite for the gfx90a support, and 13.0
x27;t know
how that'll handle heterogenous systems, but those ought to be rare.
I don't think libmemkind will resolve this performance issue, although
certainly it can be used for host implementations of low-latency
memories, etc.
Andrew
On 13/01/2022 13:53, Andrew Stubbs wrote:
On 07/06/2022 13:10, Jakub Jelinek wrote:
On Tue, Jun 07, 2022 at 12:05:40PM +0100, Andrew Stubbs wrote:
Following some feedback from users of the OG11 branch I think I need to
withdraw this patch, for now.
The memory pinned via the mlock call does not give the expected performance
boost. I
This patch removed some workarounds that were required for old versions
of the LLVM assembler. The minimum supported version is now 13.0.1 so
the workarounds are no longer needed.
Andrewamdgcn: remove obsolete assembler workarounds
This nonsense is no longer required, now that the minimum sup
This setting is way out of date; global constructors have worked on GCN
for a while now.
Andrewamdgcn: test global constructors
The tests are disabled for historical reasons only.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp (check_effective_target_global_constructor):
R
I've pushed these three patches to the devel/omp/gcc-11 branch ("OG11").
I'll be submitting mainline versions soonish.
The patches add a means to track "requires unified_shared_memory" from
the frontend, through the backend compiler, and on to the runtime, plus
all the bits needed to implement
On 09/06/2022 09:19, Jakub Jelinek via Gcc-patches wrote:
+ switch (memspace)
+{
+case omp_high_bw_mem_space:
+#ifdef LIBGOMP_USE_MEMKIND
+ struct gomp_memkind_data *memkind_data;
+ memkind_data = gomp_get_memkind ();
+ if (data.partition == omp_atv_interleaved
+ &
On 29/06/2022 11:45, Jakub Jelinek wrote:
And omp_init_allocator needs to decide what to do if one asks for features
that need memkind as well as for features that need whatever you/Abid have
been working on. A possible resolution is punt (return omp_null_allocator),
or prefer one feature over t
This patch has no functional changes; it merely cleans up some warning
messages.
Thanks to Jan-Benedict for pointing them out, off-list.
Andrew
amdgcn: Silence warnings in gcn.c
This fixes a few cases of "unquoted identifier or keyword", one "spurious
trailing punctuation sequence", and a "m
This follow-up fixes a typo in the placement of the close quote.
Thanks to Tobias for pointing it out.
Andrew
On 18/03/2021 17:41, Andrew Stubbs wrote:
This patch has no functional changes; it merely cleans up some warning
messages.
Thanks to Jan-Benedict for pointing them out, off-list
On 15/04/2021 18:26, Thomas Schwinge wrote:
and optimisation, since shared memory might be faster than
the main memory on a GPU.
Do we potentially have a problem that making more use of (scarce)
gang-private memory may negatively affect peformance, because potentially
fewer OpenACC gangs may th
On 16/04/2021 18:30, Thomas Schwinge wrote:
Hi!
On 2021-04-16T17:05:24+0100, Andrew Stubbs wrote:
On 15/04/2021 18:26, Thomas Schwinge wrote:
and optimisation, since shared memory might be faster than
the main memory on a GPU.
Do we potentially have a problem that making more use of
On 14/06/2021 13:36, Julian Brown wrote:
On Wed, 9 Jun 2021 16:47:21 +0200
Marcel Vollweiler wrote:
This patch fixes an issue with global_load assembler functions leading
to a "invalid operand for instruction" error since in different LLVM
versions those functions use either one or two registe
s lost from libgcc if we build it
for DImode/TImode rather than SImode/DImode, a change we make in a later
patch in this series.
I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.
Thanks,
Julian
2021-06-18 Julian Brown
gcc/
* config/gcn/gcn.md (mulsi
o fix up the result afterwards.
These patterns are lost from libgcc if we build it for DImode/TImode
rather than SImode/DImode, a change we make in a later patch in this
series.
I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.
Thanks,
Julian
2021-06-18 Ju
h alternatives for all operations that
might be needed). Those gaps are filled in by this patch, or by the
preceding patches in the series.
I can probably self-approve this, but I'll give Andrew Stubbs a chance
to comment.
Thanks,
Julian
2021-06-18 Julian Brown
gcc/
* config/
On 18/06/2021 15:19, Julian Brown wrote:
This patch changes the argument and return types for the libgcc __udivsi3
and __umodsi3 helper functions for GCN to USItype instead of SItype.
This is probably just cosmetic in practice.
I can probably self-approve this, but I'll give Andrew Stu
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote:
Currently we don't get any call frame information for the amdgcn target.
This patch makes necessary adjustments to generate CFI that can work with
ROCGDB (ROCm 3.8+).
gcc/
* config/gcn/gcn.c (move_callee_saved_registers): Emit CFI notes for
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote:
As size of address is bigger than registers in amdgcn, we are forced to use
DW_CFA_def_cfa_expression to make an expression that concatenates multiple
registers for the value of the CFA. This then prohibits us from using many
of the dwarf ops which e
On 22/06/2021 18:14, Hafiz Abid Qadeer wrote:
Map GCN address spaces to the proposed DWARF address spaces defined by AMD at
https://llvm.org/docs/AMDGPUUsage.html#amdgpu-dwarf-address-class-mapping-table
gcc/
* config/gcn/gcn.c: Include dwarf2.h.
(gcn_addr_space_debug): New func
On 23/06/2021 10:53, Tobias Burnus wrote:
+ additionally the following features which were available in C and C++
+ before: depobj, mutexinoutset and
I realise that you did not invent this awkward wording, but I'd prefer ...
"the following features that were previously only availabl
On 14/11/2019 12:43, Kwok Cheung Yeung wrote:
Hello
This patch fixes an issue seen in the following test cases on AMD GCN:
libgomp.oacc-fortran/gemm.f90
libgomp.oacc-fortran/gemm-2.f90
libgomp.c/for-5-test_ttdpfs_ds128_auto.c
libgomp.c/for-5-test_ttdpfs_ds128_guided32.c
libgomp.c/for-5-test_ttd
Hi,
This patch adds some necessary bits to enable OpenACC testings for
amdgcn offloading.
The two "check_effective" procedures are not actually needed yet, but
later patches to test cases will use them.
OK to commit?
Thanks
Andrew
Enable OpenACC GCN testing.
2019-11-14 And
om offload kernels is not recommended in
production, but can be useful in development.
OK to commit?
Thanks
Andrew
Add tests for print from offload target.
2019-11-14 Andrew Stubbs
libgomp/
* testsuite/libgomp.c/target-print-1.c: New file.
* testsuite/libgomp.fortran/target-print-1.f9
On 14/11/2019 17:05, Jakub Jelinek wrote:
On Thu, Nov 14, 2019 at 04:47:49PM +, Andrew Stubbs wrote:
This patch adds new libgomp tests to ensure that C "printf" and Fortran
"write" work correctly within offload kernels. Both should work for amdgcn,
but nvptx uses the
On 14/11/2019 15:30, Kwok Cheung Yeung wrote:
GCN 5 has commonly-used global memory instructions that specify the
address as [SGPR address] + [VGPR offset] + [constant offset], and we
often want the VGPR offset to be zero, so v0 is currently reserved for
that purpose.
However, v1 contains [0,
On 14/11/2019 15:30, Kwok Cheung Yeung wrote:
The set of fixed registers is adjusted by the
TARGET_CONDITIONAL_REGISTER_USAGE hook, but this needs to be done on a
per-function basis, whereas the hook is normally called once during GCC
initialization before any functions have been processed (whi
On 14/11/2019 15:32, Kwok Cheung Yeung wrote:
This patch restricts non-kernel functions to using a maximum of 64 SGPRs
and 24 VGPRs.
Kernels can request various pieces of information from the HSA runtime,
and these will be loaded into the registers consecutively before the
kernel executes. Th
On 14/11/2019 15:33, Kwok Cheung Yeung wrote:
The kernel attributes are changed to request at least 64 SGPRs and 24
VGPRs (i.e. the non-kernel maximum, otherwise the callees may not have
enough registers to run in) for non-leaf kernels to take advantage of
the reduced number of registers used i
On 14/11/2019 15:34, Kwok Cheung Yeung wrote:
This patch unfixes the registers for the hard frame pointer so that they
can be used for other purposes if the frame pointer is not in use.
This patch is dependent on the commit 'Support using multiple registers
to hold the frame pointer' (r277895)
On 15/11/2019 12:21, Jakub Jelinek wrote:
On Thu, Nov 14, 2019 at 04:36:38PM +, Andrew Stubbs wrote:
This patch adds some necessary bits to enable OpenACC testings for amdgcn
offloading.
The two "check_effective" procedures are not actually needed yet, but later
patches to test
On 15/11/2019 12:43, Jakub Jelinek wrote:
APUs, such as Carizzo are shared memory. DGPUs, such as Fiji and Vega, have
their own memory. A DGPU can access host memory, provided that it has been
set up just so, but that is very slow, and I don't know of a way to do that
without still having to copy
On 15/11/2019 15:51, Kwok Cheung Yeung wrote:
On 15/11/2019 11:32 am, Andrew Stubbs wrote:
On 14/11/2019 15:33, Kwok Cheung Yeung wrote:
The kernel attributes are changed to request at least 64 SGPRs and 24
VGPRs (i.e. the non-kernel maximum, otherwise the callees may not
have enough
On 15/11/2019 21:44, Julian Brown wrote:
+static void
+hsa_memory_copy_wrapper (void *dst, const void *src, size_t len)
+{
+ hsa_status_t status = hsa_fns.hsa_memory_copy_fn (dst, src, len);
+
+ if (status == HSA_STATUS_SUCCESS)
+return;
+
+ /* It appears that the copy fails if the source
On 15/11/2019 21:44, Julian Brown wrote:
@@ -2732,13 +2732,9 @@ wait_for_queue_nonfull (struct goacc_asyncqueue *aq)
{
if (aq->queue_n == ASYNC_QUEUE_SIZE)
{
- pthread_mutex_lock (&aq->mutex);
-
/* Queue is full. Wait for it to not be full. */
while (aq->queue_n
On 15/11/2019 21:44, Julian Brown wrote:
This patch checks that cfun is valid in the gcn_asm_output_symbol_ref
function. This prevents a crash when that function is called with NULL
cfun, i.e. when outputting debug symbols.
OK?
OK, although that FIXME still baffles me.
Andrew
On 15/11/2019 21:44, Julian Brown wrote:
This patch flips the switch to enable worker partitioning on AMD GCN.
OK?
This is OK, although I think we could just remove that flag now.
Andrew
This patch adds GCN special casing for most of the OpenACC libgomp tests
that require it. It also disables one testcase that explicitly uses CUDA.
OK to commit?
Andrew
Update OpenACC tests for amdgcn
2019-11-19 Andrew Stubbs
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1
sn't match. The code is still correct for the purpose of the testcase
either way, however, so I'm removing the over-fussy match.
Andrew
Update loop-1.c test for amdgcn
2019-11-19 Andrew Stubbs
gcc/testsuite/
* gcc.dg/tree-ssa/loop-1.c: Change amdgcn assembler scan.
diff --git a/
The attached patch assigns the "(int) x" to a temporary and passes that
to the function instead.
OK to commit?
--
Andrew Stubbs
CodeSourcery / Mentor Graphics
Normalize GOACC_parallel_keyed async parameter.
2019-11-22 Andrew Stubbs
gcc/
* omp-expand.c (expand_omp_target): Pass s
I've committed the attached. The patch adjusts the GCN kernel metadata
so that it is correct for GFX9 devices.
The existing implementation was correct for GFX8, and seems to work on
GFX9, but wasn't technically correct.
--
Andrew Stubbs
CodeSourcery / Mentor Graphics
Use GFX9
ocation remains unchanged for non-offload compiles (this is only
really used for running the testsuite).
--
Andrew Stubbs
CodeSourcery / Mentor Graphics
Limit LDS usage.
2019-11-22 Andrew Stubbs
gcc/
* config/gcn/gcn.c (OMP_LDS_SIZE): Define.
(ACC_LDS_SIZE): Define.
(OTHER_LDS_SIZE): Def
On 25/11/2019 11:14, Tobias Burnus wrote:
This patch adds "gcc_unreachable ();" as suggested by me (cf. below).
It also silences the -Wunused-variable + 'no return statement' warnings.
OK for the trunk?
OK.
Thanks, Tobias.
Andrew
On 25/11/2019 14:17, Tobias Burnus wrote:
The compiler warns that funcs_tail and vars_tails are unused – they,
funcs_ids/var_ids and struct id_map seem to be a copy-n-paste leftovers
from gcc/config/nvptx/mkoffload.c.
Additionally, COMMENT_PREFIX does not seem to be used anywhere. (In the
who
On 02/12/2019 14:23, Thomas Schwinge wrote:
Hi!
On 2019-11-15T13:43:04+0100, Jakub Jelinek wrote:
On Fri, Nov 15, 2019 at 12:38:06PM +, Andrew Stubbs wrote:
On 15/11/2019 12:21, Jakub Jelinek wrote:
I'm surprised by the set acc_mem_shared 0, I thought gcn is a shared memory
offlo
wed-by tag, sorry).
Andrew
Enable OpenACC GCN testing.
2019-12-03 Andrew Stubbs
libgomp/
* testsuite/lib/libgomp.exp (offload_target_to_openacc_device_type):
Recognize amdgcn.
(check_effective_target_openacc_amdgcn_accel_present): New proc.
(check_effective_target_openacc_amdgcn_acce
On 02/12/2019 14:43, Thomas Schwinge wrote:
Hi!
On 2019-11-12T13:29:13+, Andrew Stubbs wrote:
--- a/include/gomp-constants.h
+++ b/include/gomp-constants.h
@@ -174,6 +174,7 @@ enum gomp_map_kind
#define GOMP_DEVICE_NVIDIA_PTX5
#define GOMP_DEVICE_INTEL_MIC 6
tcase now compiles, although not quite correctly, but that's another
issue (pr92772).
Andrew
Add missing amdgcn vcondu patterns
2019-12-03 Andrew Stubbs
gcc/
* config/gcn/gcn-valu.md: Change "vcondu" patterns to use VEC_1REG_MODE
for the data mode.
diff --git a/gcc/config/g
On 18/11/2022 17:20, Tobias Burnus wrote:
This patch adds two builtins (getting end-of-stack pointer and
a Boolean answer whether it was the first call to the builtin on this
thread).
The idea is to replace some hard-coded values in newlib, permitting to move
later to a manually allocated stac
On 18/11/2022 17:41, Tobias Burnus wrote:
Attached is the updated/rediffed version, which now uses the builtin
instead of the 'asm("s8").
The code in principle works; that is: If no private stack variables are
copied, it works.
Or in other words: reverse-offload target regions that don't use
On 19/11/2022 10:46, Tobias Burnus wrote:
On 18.11.22 18:49, Andrew Stubbs wrote:
On 18/11/2022 17:20, Tobias Burnus wrote:
This looks wrong:
+ /* stackbase = (stack_segment_decr & 0x)
+ + stack_wave_offset);
+ seg_size = dispatch_ptr->private_segme
On 21/11/2022 13:40, Tobias Burnus wrote:
Working on the builtins, I realized that I mixed up (again) bits and byes.
While 'uint64_t var[2]' has a size of 128 bits, 'char var[128]' has a
size of 128 bytes.
Thus, there is sufficient space for 16 pointer-size/uin64_t values but I
only need 6.
T
On 28/11/2022 07:40, Tobias Burnus wrote:
It turned out that cprop cleverly propagated the unspec_volatile
to the preceding (pseudo)register, permitting to remove the
'set (s0) (pseudoregister)' at -O2. Unfortunately, it does
matter whether the assignment is done to 's2' (previously: pseudoregis
On 29/11/2022 15:56, Paul-Antoine Arras wrote:
Hi all,
This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP
context selectors, so as to be consistent with LLVM. It also adds test
cases checking all supported AMD ISAs are properly recognised when used
in a 'declare variant' co
On 29/11/2022 18:26, Tobias Burnus wrote:
Hi PA, hi Andrew, hi Jakub, hi all,
On 29.11.22 16:56, Paul-Antoine Arras wrote:
This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP
context selectors, [...]
I think this should be documented somewhere. We have
https://gcc.gnu.org/on
On 09/09/2022 15:31, Jakub Jelinek wrote:
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt,
vec refs)
tree fndecl = gimple_call_fndecl (stmt);
if (fndecl)
{
+ /* We can vectorize some builtins and
On 01/12/2022 11:10, Paul-Antoine Arras wrote:
+ if (TARGET_FIJI)
\
+ builtin_define ("__FIJI__"); \
+ else if (TARGET_VEGA10)
\
+
On 30/11/2022 15:37, Jakub Jelinek wrote:
On Wed, Nov 30, 2022 at 03:17:30PM +, Andrew Stubbs wrote:
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -
On 01/12/2022 14:35, Paul-Antoine Arras wrote:
I believe this patch addresses your comments regarding the GCN bits.
The new builtins are consistent with the LLVM naming convention (lower
case, canonical name). For gfx803, I also kept '__fiji__' to be
consistent with -march=fiji.
Is it OK for
On 08/12/2022 12:11, Jakub Jelinek wrote:
On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when
On 08/12/2022 14:02, Tobias Burnus wrote:
On 08.12.22 13:51, Andrew Stubbs wrote:
On 08/12/2022 12:11, Jakub Jelinek wrote:
On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are
Hi all,
This patch removes the "sorry" message for the OpenMP "requires
dynamic_allocators" feature in C, C++ and Fortran.
The clause is supposed to state that the user code will not work without
the omp_alloc/omp_free and omp_init_allocator/omp_destroy_allocator and
these things *are* prese
This patch is submitted now for review and so I can commit a backport it
to the OG11 branch, but isn't suitable for mainline until stage 1.
The patch implements support for omp_low_lat_mem_space and
omp_low_lat_mem_alloc on NVPTX offload devices. The omp_pteam_mem_alloc,
omp_cgroup_mem_alloc a
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is necessary
to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014).
Tobias has pointed out, privately, that the default version is both
documented and encoded in the -mptx
This is now backported to the devel/omp/gcc-11 branch (OG11).
Andrew
On 09/12/2021 11:41, Andrew Stubbs wrote:
On 02/12/2021 16:43, Jakub Jelinek wrote:
On Thu, Dec 02, 2021 at 04:31:36PM +, Andrew Stubbs wrote:
On 02/12/2021 16:05, Andrew Stubbs wrote:
On 02/12/2021 12:58, Jakub
This patch implements the OpenMP pinned memory trait for Linux hosts. On
other hosts and on devices the trait becomes a no-op (instead of being
rejected).
The memory is locked via the mlock syscall, which is both the "correct"
way to do it on Linux, and a problem because the default ulimit for
On 04/01/2022 15:55, Jakub Jelinek wrote:
The usual libgomp way of doing this wouldn't be to use #ifdef __linux__, but
instead add libgomp/config/linux/allocator.c that includes some headers,
defines some macros and then includes the generic allocator.c.
OK, good point, I can do that.
I think
On 05/01/2022 10:24, Tom de Vries wrote:
On 12/21/21 12:33, Andrew Stubbs wrote:
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is
necessary to bump the minimum supported PTX version from 3.1 (~2013)
to 4.1 (~2014).
Tobias has pointed out
501 - 600 of 1022 matches
Mail list logo