On 15/06/2023 10:58, Richard Biener wrote:
On Thu, 15 Jun 2023, Andrew Stubbs wrote:
On 14/06/2023 15:29, Richard Biener wrote:
Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
This implemens fully masked vectorization or a masked
On 14/06/2023 15:29, Richard Biener wrote:
Am 14.06.2023 um 16:27 schrieb Andrew Stubbs :
On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane
On 14/06/2023 12:54, Richard Biener via Gcc-patches wrote:
This implemens fully masked vectorization or a masked epilog for
AVX512 style masks which single themselves out by representing
each lane with a single bit and by using integer modes for the mask
(both is much like GCN).
AVX512 is also s
This patch allows vectorization when operators are available as
libfuncs, rather that only as insns.
This will be useful for amdgcn where we plan to vectorize loops that
contain integer division or modulus, but don't want to generate inline
instructions for the division algorithm every time.
On 09/06/2023 10:02, Richard Sandiford wrote:
Andrew Stubbs writes:
On 07/06/2023 20:42, Richard Sandiford wrote:
I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit ve
On 07/06/2023 20:42, Richard Sandiford wrote:
I don't know if this helps (probably not), but we have a similar
situation on AArch64: a 64-bit mode like V8QI can be doubled to a
128-bit vector or to a pair of 64-bit vectors. We used V16QI for
the former and "V2x8QI" for the latter. V2x8QI is for
On 06/06/2023 16:33, Tobias Burnus wrote:
Andrew: Does the GCN change look okay to you?
This patch permits to use GCN devices with 'omp requires
unified_address' which
in principle works already, except that the requirement handling did
disable it.
(It also updates libgomp.texi for this chan
On 30/05/2023 07:26, Richard Biener wrote:
On Fri, May 26, 2023 at 4:35 PM Andrew Stubbs wrote:
Hi all,
I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and
OK.
Andrew
On 26/05/2023 15:58, Tobias Burnus wrote:
(Update the syntax of the amdgcn commandline option in anticipation of
later patches;
while -m(no-)xnack is in mainline since r12-2396-gaad32a00b7d2b6 (for
PR100208),
-mxsnack (contrary to -msram-ecc) is currently mostly a stub for later
pat
Hi all,
I want to implement a vector DIVMOD libfunc for amdgcn, but I can't just
do it because the GCC middle-end models DIVMOD's return value as
"complex int" type, and there are no vector equivalents of that type.
Therefore, this patch adds minimal support for "complex vector int"
modes.
On 05/05/2023 12:10, Tobias Burnus wrote:
Probably added for symmetry with out_mode/out_n but at the end not used.
That function was added in commit
r13-6423-gce9cd7258d0 amdgcn: Enable SIMD vectorization of math
functions
Tested the removal by building with that patch applied.
OK for mainl
I've committed this patch to fix a couple of bugs introduced in the
recent CMul patch.
First, the fmsubadd insn was accidentally all adds and no substracts.
Second, there were input dependencies on the undefined output register
which caused the compiler to reserve unnecessary slots in the stac
I've committed this to the devel/omp/gcc-12 branch.
The patch fixes a concurrency issue where the spin-locks didn't work
well if many GPU threads tried to free low-latency memory all at once.
Adding a short sleep instruction is enough for the hardware thread to
yield and allow another to proc
The hardfp division patch exposed a flaw in the ldexp pattern at -O0;
the compiler was trying to use out-of-range immediates on VOP3
instruction encodings.
This patch changes the constraints appropriately, and also takes the
opportunity to combine the two patterns into one using the newly
ava
Recent patches have enabled new capabilities on AMD GCN, but not all the
testsuite features were enabled. The hardfp divide patch actually had an
test regression because the expected results were too conservative.
This patch corrects both issues.
Andrewamdgcn: update target-supports.exp
The b
This patch switches amdgcn from using softfp for division to using the
hardware support. There isn't a single division instruction, but there
is an instruction sequence that gives the necessary accuracy.
This implementation also allows fully vectorized division, so gives good
performance impro
Hi Andre,
I don't have a cascadelake device to test on, nor any knowledge about
what makes it different from regular x86_64.
If the cascadelake device is supposed to work the same as other x86_64
devices for these vectors then the test has found a bug in the compiler
and you should be lookin
I've committed this patch to add a missing vector operator on amdgcn.
The architecture doesn't have a 64-bit not instruction so we didn't have
an insn for it, but the vectorizer didn't like that and caused the
v64df_pow function to use 2MB of stack frame. This is a problem when you
typically h
On 27/03/2023 12:26, Thomas Schwinge wrote:
Hi!
On 2023-03-27T09:27:31+, "Stubbs, Andrew" wrote:
-Original Message-
From: Thomas Schwinge
Sent: 24 March 2023 15:50
On 2022-01-04T15:32:17+0000, Andrew Stubbs
wrote:
This patch implements the OpenMP pinned memory trait
This patch fixes a bug in which the function prologue would save more
registers to the stack than there was space allocated. This would cause
data corruption when the epilogue restored the registers if a child
function had overwritten that memory.
The problem was caused by insn constraints tha
This patch adds new pseudo-insns for no-op vector extractions.
These were previously modelled as simple move instructions, but the
register allocator has unhelpful special handling for these that
triggered spills to memory. Modelling them as a vec_select does the
right thing in the register al
On 21/03/2023 13:42, Andrew Jenner wrote:
This patch gives GCC to use the accumulator VGPR registers on CDNA1 and
later architectures. The backend does not yet attempt to make use of the
matrix acceleration instructions, but the new registers are still useful
as fast space for register spills.
On 21/03/2023 13:35, Andrew Jenner wrote:
I have updated this patch to incorporate the feedback from Andrew
Stubbs. Tested on CDNA2 GFX90a.
gcc/ChangeLog:
* config/gcn/gcn-protos.h (gcn_expand_dpp_swap_pairs_insn)
(gcn_expand_dpp_distribute_even_insn
On 21/03/2023 12:14, Jakub Jelinek wrote:
Hi!
As mentioned in the PR, vect-simd-clone-1[678]{,f}.c tests FAIL on
x86_64-linux with -m64/-march=cascadelake or -m32/-march=cascadelake,
there are 3 matches for the calls rather than expected two.
As suggested by Richi, this patch changes those tests
On 13/03/2023 12:25, Tobias Burnus wrote:
Found when comparing '-v -Wl,-v' output as despite -save-temps multiple
runs
yielded differed results.
Fixed as attached.
OK for mainline?
OK.
Andrew
On 08/03/2023 11:06, Tobias Burnus wrote:
Next try – this time with both patches.
On 08.03.23 12:05, Tobias Burnus wrote:
Hi Andrew,
attached are two patches related to GCN, one for libgomp.texi
documenting an env var
and a release-notes update in www docs.
OK? Comments?
LGTM
Andrew
On 03/03/2023 17:05, Paul-Antoine Arras wrote:
Le 02/03/2023 à 18:18, Andrew Stubbs a écrit :
On 01/03/2023 16:56, Paul-Antoine Arras wrote:
This patch introduces instruction patterns for conditional min and max
operations (cond_{f|s|u}{max|min}) in the GCN machine description. It
also allows
On 02/03/2023 15:07, Kwok Cheung Yeung wrote:
Hello
I've made the suggested changes. Should I hold off on committing this
until GCC 13 has been branched off?
No need, amdgcn is not a primary target and this stuff won't affect
anyone else. Please go ahead and commit.
Andrew
On 01/03/2023 16:56, Paul-Antoine Arras wrote:
This patch introduces instruction patterns for conditional min and max
operations (cond_{f|s|u}{max|min}) in the GCN machine description. It
also allows the exec register to be saved in SGPRs to avoid spilling to
memory.
Tested on GCN3 Fiji gfx803
On 01/03/2023 10:52, Andre Vieira (lists) wrote:
On 01/03/2023 10:01, Andrew Stubbs wrote:
> On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
>> Hello
>>
>> This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>> target hook for the AMD GCN a
On 28/02/2023 23:01, Kwok Cheung Yeung wrote:
Hello
This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
target hook for the AMD GCN architecture, such that when vectorized,
calls to builtin standard math functions such as asinf, exp, pow etc.
are converted to calls to the r
This patch fixes a bug in which libgomp doesn't know what to do with
attached pointers in fortran derived types when using Unified Shared
Memory instead of explicit mappings.
I've committed it to the devel/omp/gcc-12 branch (OG12) and will fold it
into the next rebase/repost of the USM patches
On 10/02/2023 09:11, Jakub Jelinek wrote:
I've tried to fix the -flto thing and I can't figure out how. The problem
seems to be that there are two dump files from the two compiler invocations
and it scans the wrong one. Aarch64 has the same problem.
Two dumps are because it is in a dg-do run te
On 16/02/2023 21:11, Thomas Schwinge wrote:
--- /dev/null
+++ b/libgomp/basic-allocator.c
+#ifndef BASIC_ALLOC_YIELD
+#deine BASIC_ALLOC_YIELD
+#endif
In file included from [...]/libgomp/config/nvptx/allocator.c:49:
[...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: in
On 17/02/2023 08:12, Thomas Schwinge wrote:
Hi Andrew!
On 2023-02-16T23:06:44+0100, I wrote:
On 2023-02-16T16:17:32+, "Stubbs, Andrew via Gcc-patches"
wrote:
The mmap implementation was not optimized for a lot of small allocations, and I
can't see that issue changing here
That's corre
These patches implement an LDS memory allocator for OpenMP on AMD.
1. 230216-basic-allocator.patch
Separate the allocator from NVPTX so the code can be shared.
2. 230216-amd-low-lat.patch
Allocate the memory, adjust the default address space, and hook up the
allocator.
They will need to be
On 14/02/2023 12:54, Thomas Schwinge wrote:
Hi Andrew!
On 2022-01-13T11:13:51+, Andrew Stubbs wrote:
Updated patch: this version fixes some missed cases of malloc in the
realloc implementation.
Right, and as it seems I've run into another issue: a stray 'free'.
On 09/02/2023 20:13, Andrew Jenner wrote:
This patch introduces instruction patterns for complex number operations
in the GCN machine description. These patterns are cmul, cmul_conj,
vec_addsub, vec_fmaddsub, vec_fmsubadd, cadd90, cadd270, cmla and cmls
(cmla_conj and cmls_conj were not found t
On 13/02/2023 14:38, Thomas Schwinge wrote:
Hi!
On 2022-03-08T11:30:55+, Hafiz Abid Qadeer wrote:
From: Andrew Stubbs
Add a new option. It will be used in follow-up patches.
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
+@option{-foffload-memory=pinned} forces all host
I presume I've been CC'd on this conversation because weird vector
architecture problems have happened to me before. :)
However, I'm not sure I can help much because AMD GCN does not use
BImode vectors at all. This is partly because loading boolean values
into a GCN vector would have 31 paddin
On 10/02/2023 15:11, Thomas Schwinge wrote:
Hi!
Re OpenMP 'pinned' memory allocator trait semantics vs. 'omp_realloc':
On 2022-01-13T13:53:03+, Andrew Stubbs wrote:
On 05/01/2022 17:07, Andrew Stubbs wrote:
[...], I'm working on an implementation using mmap ins
On 10/02/2023 14:21, Thomas Schwinge wrote:
Is the correct fix the following (conceptually like
'linux_memspace_alloc' cited above), or is there something that I fail to
understand?
static void *
linux_memspace_calloc (omp_memspace_handle_t memspace, size_t size, int
pin)
{
The -mstack-size option has been marked obsolete in favour of setting an
environment variable at runtime ("GCN_STACK_SIZE"), but some testcases
still need the option set or they have stack overflow. I could change
them to use the envvar, but my testing setup uses remote execute which
doesn't su
On 02/02/2023 14:59, Tobias Burnus wrote:
Maybe it becomes better reviewable with an attached patch ...
On 02.02.23 15:31, Tobias Burnus wrote:
Now that the stack handling has been changed for AMDGCN, this patch
enables reverse offload.
(cf. today's "[committed] amdgcn, libgomp: Manually alloca
I've committed this patch to change the ways stacks are initialized on
amdgcn. The patch only touches GCN files, or the GCN-only portions of
libgomp files, so I'm allowing it despite stage 4 because I want the ABI
change done for GCC 13, and because it enables Tobias's reverse
offload-patch tha
On 01/02/2023 15:35, Paul-Antoine Arras wrote:
This patch introduces an instruction pattern for conditional shift
operations (cond_{ashl|ashr|lshr}) in the GCN machine description.
Tested on GCN3 Fiji gfx803.
OK to commit?
The changelog will need to be wrapped to 80 columns.
OK otherwise.
A
I changed it to use 128-byte alignment to match the GPU cache-lines.
Committed to OG12.
Andrew
On 11/01/2023 18:05, Andrew Stubbs wrote:
This patch fixes a runtime issue I encountered with the AMD GCN Unified
Shared Memory implementation.
We were using regular malloc'd memory confi
This patch fixes a runtime issue I encountered with the AMD GCN Unified
Shared Memory implementation.
We were using regular malloc'd memory configured into USM mode, but
there were random intermittent crashes. I can't be completely sure, but
my best guess is that the HSA driver is using malloc
Here's a new version of the patch.
On 01/12/2022 14:16, Jakub Jelinek wrote:
+void __attribute__((noinline))
You should use noipa attribute instead of noinline on callers
which aren't declare simd (on declare simd it would prevent cloning
which is essential for the declare simd behavior), so t
I've committed this patch to the devel/omp/gcc-12 branch. It fixes some
missed cases in the Unified Shared Memory implementation that were
especially noticeable in Fortran because the size of arrays are known.
This patch will have to be folded into the mainline USM patches that
were submitted
On 08/12/2022 14:02, Tobias Burnus wrote:
On 08.12.22 13:51, Andrew Stubbs wrote:
On 08/12/2022 12:11, Jakub Jelinek wrote:
On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are
On 08/12/2022 12:11, Jakub Jelinek wrote:
On Thu, Jul 07, 2022 at 11:34:33AM +0100, Andrew Stubbs wrote:
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when
On 01/12/2022 14:35, Paul-Antoine Arras wrote:
I believe this patch addresses your comments regarding the GCN bits.
The new builtins are consistent with the LLVM naming convention (lower
case, canonical name). For gfx803, I also kept '__fiji__' to be
consistent with -march=fiji.
Is it OK for
On 30/11/2022 15:37, Jakub Jelinek wrote:
On Wed, Nov 30, 2022 at 03:17:30PM +, Andrew Stubbs wrote:
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-simd-clone-16.c
@@ -0,0 +1,89 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -
On 01/12/2022 11:10, Paul-Antoine Arras wrote:
+ if (TARGET_FIJI)
\
+ builtin_define ("__FIJI__"); \
+ else if (TARGET_VEGA10)
\
+
On 09/09/2022 15:31, Jakub Jelinek wrote:
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1074,13 +1076,19 @@ if_convertible_stmt_p (gimple *stmt,
vec refs)
tree fndecl = gimple_call_fndecl (stmt);
if (fndecl)
{
+ /* We can vectorize some builtins and
On 29/11/2022 18:26, Tobias Burnus wrote:
Hi PA, hi Andrew, hi Jakub, hi all,
On 29.11.22 16:56, Paul-Antoine Arras wrote:
This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP
context selectors, [...]
I think this should be documented somewhere. We have
https://gcc.gnu.org/on
On 29/11/2022 15:56, Paul-Antoine Arras wrote:
Hi all,
This patch adds support for 'gfx803' as an alias for 'fiji' in OpenMP
context selectors, so as to be consistent with LLVM. It also adds test
cases checking all supported AMD ISAs are properly recognised when used
in a 'declare variant' co
On 28/11/2022 07:40, Tobias Burnus wrote:
It turned out that cprop cleverly propagated the unspec_volatile
to the preceding (pseudo)register, permitting to remove the
'set (s0) (pseudoregister)' at -O2. Unfortunately, it does
matter whether the assignment is done to 's2' (previously: pseudoregis
On 21/11/2022 13:40, Tobias Burnus wrote:
Working on the builtins, I realized that I mixed up (again) bits and byes.
While 'uint64_t var[2]' has a size of 128 bits, 'char var[128]' has a
size of 128 bytes.
Thus, there is sufficient space for 16 pointer-size/uin64_t values but I
only need 6.
T
On 19/11/2022 10:46, Tobias Burnus wrote:
On 18.11.22 18:49, Andrew Stubbs wrote:
On 18/11/2022 17:20, Tobias Burnus wrote:
This looks wrong:
+ /* stackbase = (stack_segment_decr & 0x)
+ + stack_wave_offset);
+ seg_size = dispatch_ptr->private_segme
On 18/11/2022 17:41, Tobias Burnus wrote:
Attached is the updated/rediffed version, which now uses the builtin
instead of the 'asm("s8").
The code in principle works; that is: If no private stack variables are
copied, it works.
Or in other words: reverse-offload target regions that don't use
On 18/11/2022 17:20, Tobias Burnus wrote:
This patch adds two builtins (getting end-of-stack pointer and
a Boolean answer whether it was the first call to the builtin on this
thread).
The idea is to replace some hard-coded values in newlib, permitting to move
later to a manually allocated stac
On 16/11/2022 11:42, Tobias Burnus wrote:
This is a part of a patch by Andrew (hi!) - namely that part that only
adds the
__builtin_gcn_kernarg_ptr. More is planned, see below.
The short term benefit of this patch is to permit replacing hardcoded
numbers
by a builtin – like in libgomp (see pa
On 08/11/2022 14:35, Kwok Cheung Yeung wrote:
Hello
This patch adds three extra builtins for the vectorized forms of the
abs, floorf and floor math functions, which are implemented by native
GCN instructions. I have also added a test to check that they generate
the expected assembler instruct
On 03/11/2022 17:47, Kwok Cheung Yeung wrote:
Hello
This patch fixes a bug introduced in a previous patch adding support for
generating native instructions for the exp2 and log2 patterns. The
problem is that the name of the instruction implementing the exp2
operation is v_exp (and not v_exp2)
My recent patch to add additional vector lengths didn't address the
vector reductions yet.
This patch adds the missing support. Shorter vectors use fewer reduction
steps, and the means to extract the final value has been adjusted.
Lacking from this is any useful costs, so for loops the vect p
This patch adds patterns for the fmin and fmax operators, for scalars,
vectors, and vector reductions.
The compiler uses smin and smax for most floating-point optimizations,
etc., but not where the user calls fmin/fmax explicitly. On amdgcn the
hardware min/max instructions are already IEEE c
A function parameter was left over from a previous draft of my
multiple-vector-length patch. This patch silences the harmless warning.
Andrewamdgcn: Silence unused parameter warning
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen):
Set base_type a
On 24/10/2022 19:06, Richard Biener wrote:
Am 24.10.2022 um 18:51 schrieb Andrew Stubbs :
I've committed this to the OG12 branch to remove some test failures. We
probably ought to have something on mainline also, but a proper fix would be
better.
Without this. the libgomp.oac
I've committed this to the OG12 branch to remove some test failures. We
probably ought to have something on mainline also, but a proper fix
would be better.
Without this. the libgomp.oacc-c-c++-common/private-variables.c testcase
fails to compile due to an ICE. The OpenACC worker broadcasting
I've committed this patch to the devel/omp/gcc-12 branch. I will have to
fold it into my previous OpenMP memory management patch series when I
repost it.
The GFX908 (MI100) devices only partially support the Unified Shared
Memory model that we have, and only then with additional kernel boot
p
I've committed this patch to the devel/omp/gcc-12 branch. I will have to
fold it into my previous OpenMP memory management patch series when I
repost it.
The patch changes the internal memory allocation method such that memory
is allocated in the regular heap and then marked as "coarse-grained
This patch fixes a problem in which fatal errors inside mutex-locked
regions (i.e. basically anything in the plugin) will cause it to hang up
trying to take the lock to clean everything up.
Using abort() instead of exit(1) bypasses the atexit handlers and solves
the problem.
OK for mainline?
On 14/10/2022 08:07, Richard Biener wrote:
On Tue, 11 Oct 2022, Richard Sandiford wrote:
Richard Biener writes:
On Mon, 10 Oct 2022, Andrew Stubbs wrote:
On 10/10/2022 12:03, Richard Biener wrote:
The following picks up the prototype by Ju-Zhe Zhong for vectorizing
first order recurrences
On 12/10/2022 15:29, Tobias Burnus wrote:
On 29.09.22 18:24, Andrew Stubbs wrote:
On 27/09/2022 14:16, Tobias Burnus wrote:
Andrew did suggest a while back to piggyback on the console_output
handling,
avoiding another atomic access. - If this is still wanted, I like to
have some
guidance
On 11/10/2022 12:29, Richard Biener wrote:
On Tue, Oct 11, 2022 at 1:03 PM Andrew Stubbs wrote:
This patch series adds additional vector sizes for the amdgcn backend.
The hardware supports any arbitrary vector length up to 64-lanes via
masking, but GCC cannot (yet) make full use of them due
The testsuite needs a few tweaks following my patches to add multiple vector
sizes for amdgcn.
gcc/testsuite/ChangeLog:
* gcc.dg/pr104464.c: Xfail on amdgcn.
* gcc.dg/signbit-2.c: Likewise.
* gcc.dg/signbit-5.c: Likewise.
* gcc.dg/vect/bb-slp-68.c: Likewise.
Another example of the vectorizer needing explicit insns where the scalar
expander just works.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (neg2): New define_expand.
---
gcc/config/gcn/gcn-valu.md | 13 +
1 file changed, 13 insertions(+)
diff --git a/gcc/config/gcn/gcn-valu.md
The vectors sizes are simulated using implicit masking, but they make life
easier for the autovectorizer and SLP passes.
gcc/ChangeLog:
* config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes
V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF,
V16QI, V16HI, V16SI, V16
Implements vec_init when the input is a vector of smaller vectors, or of
vector MEM types, or a smaller vector duplicated several times.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (vec_init): New.
* config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3.
(GEN_VNM): Add gathervNm
Add vec_extract expanders for all valid pairs of vector types.
gcc/ChangeLog:
* config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants.
* config/gcn/gcn-valu.md
(vec_extract): New define_expand.
* config/gcn/gcn.cc (get_exec): Export the existing func
GET_MODE_NUNITS isn't a compile time constant, so we end up with many
impossible insns in the machine description. Adding MODE_VF allows the insns
to be eliminated completely.
gcc/ChangeLog:
* config/gcn/gcn-valu.md
(2): Use MODE_VF.
(2): Likewise.
* config/gcn/g
ssues, but rather existing
problems that did not show up because the code did not previously
vectorize. Expanding the testcase to allow 64-lane vectors shows the
same problems there.
I shall backport these patches to the OG12 branch shortly.
Andrew
Andrew Stubbs (6):
amdgcn: add multiple ve
On 10/10/2022 12:03, Richard Biener wrote:
The following picks up the prototype by Ju-Zhe Zhong for vectorizing
first order recurrences. That solves two TSVC missed optimization PRs.
There's a new scalar cycle def kind, vect_first_order_recurrence
and it's handling of the backedge value vectori
On 29/09/2022 14:46, Richard Biener wrote:
It's not the nicest way of carrying the information but short of inventing
new modes I can't see something better (well, another optab). I see
the GCN backend expects a constant in operand 3 but the docs don't
specify the operand has to be a CONST_INT,
I've committed this small clean up. It silences a warning.
Andrewamdgcn: remove unused variable
This was left over from a previous version of the SIMD clone patch.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_simd_clone_compute_vecsize_and_simdlen):
Remove unused elt_bits variable.
On 27/09/2022 14:16, Tobias Burnus wrote:
@@ -422,6 +428,12 @@ struct agent_info
if it has been. */
bool initialized;
+ /* Flag whether the HSA program that consists of all the modules has been
+ finalized. */
+ bool prog_finalized;
+ /* Flag whether the HSA OpenMP's requires
On 29/09/2022 10:24, Richard Sandiford wrote:
Otherwise:
operand0[0] = operand1 < operand2;
for (i = 1; i < operand3; i++)
operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
looks like a "length and mask" operation, which IIUC is also what
RVV wanted? (Wasn't at the Cauldro
On 29/09/2022 08:52, Richard Biener wrote:
On Wed, Sep 28, 2022 at 5:06 PM Andrew Stubbs wrote:
This patch is a prerequisite for some amdgcn patches I'm working on to
support shorter vector lengths (having fixed 64 lanes tends to miss
optimizations, and masking is not supported everywher
This patch is a prerequisite for some amdgcn patches I'm working on to
support shorter vector lengths (having fixed 64 lanes tends to miss
optimizations, and masking is not supported everywhere yet).
The problem is that, unlike AArch64, I'm not using different mask modes
for different sized ve
On 13/09/2022 12:03, Paul-Antoine Arras wrote:
Hello,
This patch intends to backport e90af965e5c by Jakub Jelinek to
devel/omp/gcc-12.
The original patch was described here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601189.html
I've merged and committed it for you.
Andrew
On 09/09/2022 13:20, Tobias Burnus wrote:
However, the pre-existing 'sqrt' problem still is real. It also applies
to reverse sqrt ("v_rsq"), but that's for whatever reason not used for GCN.
This patch now adds a commandline flag - off by default - to choose
whether this behavior is wanted. I d
On 08/09/2022 21:38, Kwok Cheung Yeung wrote:
Hello
This patch adds support for some additional floating-point operations,
in scalar and vector modes, which are natively supported by the AMD GCN
instruction set, but haven't been implemented in GCC yet. With the
exception of frexp, these imple
On 31/08/2022 09:29, Jakub Jelinek wrote:
On Tue, Aug 30, 2022 at 06:54:49PM +0200, Rainer Orth wrote:
--- a/gcc/omp-simd-clone.cc
+++ b/gcc/omp-simd-clone.cc
@@ -504,7 +504,10 @@ simd_clone_adjust_return_type (struct cgraph_node *node)
veclen = node->simdclone->vecsize_int;
else
On 09/08/2022 14:23, Andrew Stubbs wrote:
Enable and configure SIMD clones for amdgcn. This affects both the __simd__
function attribute, and the OpenMP "declare simd" directive.
Note that the masked SIMD variants are generated, but the middle end doesn't
actually support c
On 26/08/2022 12:04, Jakub Jelinek wrote:
gcc/ChangeLog:
* doc/tm.texi: Regenerate.
* omp-simd-clone.cc (simd_clone_adjust_return_type): Allow zero
vecsize.
(simd_clone_adjust_argument_types): Likewise.
* target.def (compute_vecsize_and_simdlen): Document
There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).
This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode). The other cases fail to vectorize, just as before,
so there should be n
Enable and configure SIMD clones for amdgcn. This affects both the __simd__
function attribute, and the OpenMP "declare simd" directive.
Note that the masked SIMD variants are generated, but the middle end doesn't
actually support calling them yet.
gcc/ChangeLog:
* config/gcn/gcn.cc (g
The vecsize_int/vecsize_float has an assumption that all arguments will use
the same bitsize, and vary the number of lanes according to the element size,
but this is inappropriate on targets where the number of lanes is fixed and
the bitsize varies (i.e. amdgcn).
With this change the vecsize can
201 - 300 of 998 matches
Mail list logo