On 2/17/22 18:24, Tobias Burnus wrote:
SM version (-misa=)
[Patch adds -misa=sm_70]
* The compiler supports internally: SM_30, SM_35, SM_53, SM_70, SM_75,
SM_80.
I'd formulate it like: it uses SM_70 internally to accurately formulate
when certain insns can be used.
I think it makes sense
On 2/17/22 18:24, Tobias Burnus wrote:
PTX version (-mptx=)
[patch adds -mptx=6.0 as option]
* Currently supported internally are 3.1 (CUDA 5.0, used by GCC <= 11),
6.0 (CUDA 9.0, current GCC 12 default), 6.3 (CUDA 10.0), 7.0 (CUDA 11.0)
* -mptx= supports 3.1, 6.3, 7.0 – but not the internal
Hi,
For the nvptx port, with -mptx-comment we have in pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
// #APP
// 9
Hi,
Add functionality that indicates which insns are added by -minit-regs, such
that for instance we have for pr53465.s:
...
// #APP
// 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1
// Start: Added by -minit-regs=3:
// #NO_APP
mov.u32 %r26, 0;
Hi,
When running the libgomp testsuite on x86_64 with nvptx accelerator, we run
into:
...
XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test
...
The problem is that we're expecting the following ptxas
Hi,
On nvptx I see the following FAIL:
...
FAIL: gcc.dg/sibcall-3.c execution test
...
The test-case states that "this test is xfailed on targets without sibcall
patterns".
The nvptx port doesn't have a sibcall pattern, so add an xfail. Likewise in
two similar test-cases.
Tested on nvptx.
Hi,
Some test-cases in gcc/testsuite/gcc.target/nvptx contain mptx
settings, which are paired with misa settings, in order to have the mptx
version support the misa version.
Since commit decde11183bd ("[nvptx] Choose -mptx default based on -misa"),
this is no longer necessary.
Remove the mptx
On 2/21/22 08:54, Richard Biener wrote:
On Sun, Feb 20, 2022 at 11:50 PM Tom de Vries via Gcc-patches
wrote:
Hi,
With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into:
...
FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test
FAIL: gcc.c-torture/execute/pr53465.c
Hi,
With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into:
...
FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test
FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test
FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test
...
while the test-cases pass with
Hi,
We currently generate this code for an atomic store:
...
.reg.u32 %r21;
atom.exch.b32 %r21,[%r22],%r23;
...
where %r21 is set but unused.
Use the ptx bit bucket operand '_' instead, such that we have:
...
atom.exch.b32 _,[%r22],%r23;
...
[ Note that the same problem still occurs for this
Hi,
In nvptx_reorg_uniform_simt we have a loop:
...
for (insn = get_insns (); insn; insn = next)
{
next = NEXT_INSN (insn);
if (!(CALL_P (insn) && nvptx_call_insn_is_syscall_p (insn))
&& !(NONJUMP_INSN_P (insn)
&& GET_CODE (PATTERN (insn)) == PARALLEL
Hi,
With the default ptx isa 6.0, we have for uniform-simt-1.c:
...
@%r33 atom.global.cas.b32 %r26, [a], %r28, %r29;
shfl.sync.idx.b32 %r26, %r26, %r32, 31, 0x;
...
The atomic insn is predicated by -muniform-simt, and the subsequent insn does
a warp
On 2/15/22 12:08, Thomas Schwinge wrote:
Hi Tom!
On 2022-02-15T11:52:29+0100, Tom de Vries wrote:
On 2/15/22 08:34, Thomas Schwinge wrote:
For my understanding:
Thanks for your explanations!
It is expected that this changes, for example (similar elsewhere)
On 2/15/22 08:34, Thomas Schwinge wrote:
Hi Tom!
For my understanding:
On 2022-02-10T10:13:10+0100, Tom de Vries via Gcc-patches
wrote:
The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory
locations do not guarantee atomicity with respect to normal store
Hi,
Require effective target non_strict_prototype in a few test-cases.
Tested on nvptx.
Committed to trunk.
Thanks,
- Tom
[testsuite] Require non_strict_prototype in a few tests
gcc/testsuite/ChangeLog:
2022-02-10 Tom de Vries
* gcc.c-torture/compile/pr100576.c: Require
Hi,
Require effective target alloca in a few test-cases.
Tested on nvptx.
Committed to trunk.
Thanks,
- Tom
[testsuite] Require alloca support in a few tests
gcc/testsuite/ChangeLog:
2022-02-10 Tom de Vries
* c-c++-common/Walloca-larger-than.c: Require effective target alloca.
Hi,
With GOMP_NVPTX_JIT=-00 and -mptx=3.1, I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_prof-version-1.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \
execution test
...
The problem is that we're generating a diverging branch around
Hi,
Add spinlock test-cases for nvptx.
Strictly speaking, these are invalid openACC, because they're not guaranteed
to terminate.
But I've tested these without problems on cards from nvidia architectures
Kepler, Maxwell, Pascal and Turing (though on Turing, that's what you expect
given that
Hi,
The OpenACC execution model states that implementing a critical
section across workers using atomic operations and a busy-wait loop may never
succeed, since the scheduler may suspend the worker that owns the lock, in
which case the worker waiting on the lock can never complete.
Add a
Hi,
For sm_7x atomic stores we fall back on expand_atomic_store, but this
results in using membar.sys for shared stores.
Fix this by adding an nvptx_atomic_store insn that adds a membar.cta for a
shared store.
Tested on x86_64 with nvptx accelerator.
Committed to trunk.
Thanks,
- Tom
[nvptx]
Hi,
The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory
locations do not guarantee atomicity with respect to normal store instructions
to the same address.
This can be fixed by:
- inserting barriers between normal stores and atomic operations to a common
address
-
Hi,
There's a nvidia driver JIT bug that mishandles this code (minimized from
builtin-arith-overflow-15.c):
...
int main (void) {
signed char r;
unsigned char y = (unsigned char) 0x80;
if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, ))
__builtin_abort ();
return 0;
}
On 2/8/22 14:09, Roger Sayle wrote:
Many thanks to Thomas Schwinge for confirming my hypothesis that the
register
usage regression, PR target/104345, is solely due to libgcc's _muldc3
function.
In addition to the isinf functionality in the previously proposed nvptx
patch at
On 2/3/22 22:00, Roger Sayle wrote:
This patch addresses the "increased register pressure" regression on
nvptx-none caused by my change to transition the backend to a
STORE_FLAG_VALUE = 1 target. This improved code generation for the
more common case of producing 0/1 Boolean values,
On 1/16/22 12:49, Roger Sayle wrote:
This patch adds support for nvptx's BImode and.pred, or.pred and
xor.pred instructions. Technically, nvptx.md previously defined
andbi3, iorbi3 and xorbi3 instructions, but the assembly language
mnemonic output for these was incorrect (e.g. and.b1) and
On 1/14/22 10:54, Roger Sayle wrote:
Now that the middle-end MULT_HIGHPART_EXPR pieces are in place, this
patch adds support for nvptx's mul.hi.s64 and mul.hi.u64 instructions,
as previously reviewed (provisionally pre-approved) back in August 2020:
On 1/10/22 11:58, Roger Sayle wrote:
One of the unusual target features of the Nvidia PTX ISA is that it
doesn't provide QI mode (byte sized) operations or registers.
[ FWIW: I recently happened to check this, and it actually supports
.u8/.s8/.b8 regs, but indeed just for very few
On 1/8/22 13:21, Roger Sayle wrote:
This patch adds more support for _Float16 (HFmode) to the nvptx backend.
Currently negation, absolute value and floating point comparisons are
implemented by promoting to float (SFmode). This patch adds suitable
define_insns to nvptx.md, most conditional on
Hi,
With the commit "[nvptx] Choose -mptx default based on -misa" I introduced a
use of PTX_ISA_SM70, without adding it first.
Add it, as well as the corresponding TARGET_SM70.
Build for x86_64 with nvptx accelerator.
Committed to trunk.
Thanks,
- Tom
[nvptx] Unbreak build, add PTX_ISA_SM70
On 2/8/22 14:24, Tobias Burnus wrote:
Hi Tom,
if I understand the patch correctly, -misa=sm_53 -mptx=3.1 will ...
On 08.02.22 13:57, Tom de Vries via Gcc-patches wrote:
Furthermore, using the -mptx option is a bit user-unfriendly.
Say we want to compile for sm_80. We can use -misa=sm_80
On 2/8/22 13:57, Tom de Vries via Gcc-patches wrote:
+static const char *
+sm_version_to_string (enum ptx_isa sm)
+{
+ switch (sm)
+{
+case PTX_ISA_SM30:
+ return "30";
+case PTX_ISA_SM35:
+ return "35";
+case PTX_ISA_SM53:
+ return "53
Hi,
While testing with driver version 390.147 I ran into the problem that it
doesn't support ptx isa version 6.3 (the new default), only 6.1.
Furthermore, using the -mptx option is a bit user-unfriendly.
Say we want to compile for sm_80. We can use -misa=sm_80 to specify that, but
then run
Hi,
On nvptx, I run into an execution failure in test-case
gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh'
modifier.
The port uses newlib, which does by default not support that modifier.
There's a configure option --enable-newlib-io-c99-formats to enable this
support, but
Hi,
On nvptx, I run into an execution failure in test-case
gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh'
modifier.
The port uses newlib, which does by default not support that modifier.
There's a configure option --enable-newlib-io-c99-formats to enable this
support, but
Hi,
In PR target/104364, two problems were reported:
- in muniform-simt mode, an atom.cas insn is no longer executed in the
"master lane" only.
- in msoft-stack mode, an __atomic_compare_exchange_n on stack memory is
translated assuming it accesses local memory, while that's not the case.
On 2/2/22 09:30, Tobias Burnus wrote:
This patch updates the documentation for Tom's change of the default
-mptx= version - mentioning also -mptx=7.0.
I forgot whether ptx = 7.0 was working fine or whether there was
a reason not to mention it.
A ptx version is experimental if all sm versions
On 2/3/22 10:40, Thomas Schwinge wrote:
Hi Tom!
On 2021-05-19T14:56:17+0200, I wrote:
On 2020-08-12T15:57:23+0200, Tom de Vries wrote:
When enabling sync_int_long for nvptx, we run into a failure in
gcc.dg/pr86314.c:
...
nvptx-run: error getting kernel result: operation not supported on \
Hi,
On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O2 execution test
...
which minimizes to the same test-case as listed in commit
Hi,
On a GT 1030 (sm_61), with driver version 470.94 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
-O2 execution test
...
which minimizes to the same test-case as listed in commit
Hi,
With the following example, minimized from parallel-dims.c:
...
int
main (void)
{
int vectors_max = -1;
#pragma acc parallel num_gangs (1) num_workers (1) copy (vectors_max)
{
for (int i = 0; i < 2; i++)
for (int j = 0; j < 2; j++)
#pragma acc loop vector reduction
Hi,
In ptx isa 6.0, a new barrier instruction was added, and bar.sync was
redefined as barrier.sync.aligned.
The aligned modifier indicates that all threads in a CTA will execute the same
barrier instruction.
The seems fine for a form "bar.sync 0".
But a "bar.sync %rx,64" (as used for vector
Hi,
When running libgomp test-case reduction-7.c on an nvptx accelerator
(T400, driver version 470.86) and GOMP_NVPTX_JIT=-O0, I run into:
...
reduction-7.exe:reduction-7.c:312: v_p_2: \
Assertion `out[j * 32 + i] == (i + j) * 2' failed.
FAIL:
Hi,
The ptx insn atom doesn't support local memory. In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.
The message is somewhat confusing
Hi,
When I run the libgomp test-case reduction-cplx-dbl.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \
execution test
Hi,
When running libgomp test-case broadcast-many.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
libgomp: The Nvidia accelerator has insufficient resources to launch \
'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \
recompile the program with
Hi,
When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx
accelerator (Nvidia T400, 2GB), I run into:
...
libgomp: cuCtxSynchronize error: unspecified launch failure \
(perhaps abort was called)
libgomp: cuMemFree_v2 error: unspecified launch failure
libgomp: device
[ was: Re: [RFC] ldist: Recognize rawmemchr loop patterns ]
On 1/31/22 16:00, Richard Biener wrote:
I'm running into PR56888 (
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 ) on nvptx due to
this, f.i. in gcc/testsuite/gcc.c-torture/execute/builtins/strlen.c,
where
On 9/17/21 10:08, Richard Biener via Gcc-patches wrote:
On Mon, Sep 13, 2021 at 4:53 PM Stefan Schulze Frielinghaus
wrote:
On Mon, Sep 06, 2021 at 11:56:21AM +0200, Richard Biener wrote:
On Fri, Sep 3, 2021 at 10:01 AM Stefan Schulze Frielinghaus
wrote:
On Fri, Aug 20, 2021 at 12:35:58PM
Hi,
Add a few test-cases that test expansion of __atomic_exchange.
Tested on nvptx.
Committed to trunk.
Thanks,
- Tom
[nvptx] Add gcc.target/nvptx/atomic-exchange-*.c test-cases
gcc/testsuite/ChangeLog:
2022-01-12 Tom de Vries
* gcc.target/nvptx/atomic-exchange-1.c: New test.
Hi,
Fix a few issues in test-cases gcc.target/nvptx/atomic_fetch-*.c:
- atomic_fetch-1.c uses scan-assembler instead of scan-assembler-times,
which is less accurate
- atomic_fetch-2.c only contains negative testing using
scan-assembler-not
- the test-cases use stack variables to generate
On 1/6/22 17:42, Roger Sayle wrote:
Happy New Year for 2022. This is a simple patch, now that the
nvptx backend has transitioned to STORE_FLAG_VALUE=1, that adds
support for NVidia's cnot instruction, that implements C/C++
style logical negation.
Happy newyear to you too :)
LGTM, please
On 1/6/22 10:29, Tom de Vries wrote:
At first glance, the above behaviour doesn't look like a too short timeout.
Using patch below, this passes for me, I'm currently doing a full build
and test to confirm.
Looks like it has to do with:
...
For sm_6x and earlier architectures, atom
On 1/5/22 15:36, Andrew Stubbs wrote:
On 05/01/2022 13:04, Tom de Vries wrote:
On 1/5/22 12:08, Tom de Vries wrote:
The allocators-1.c test-case doesn't compile because:
...
FAIL: libgomp.c/allocators-1.c (test for excess errors)
Excess errors:
On 1/5/22 12:08, Tom de Vries wrote:
The allocators-1.c test-case doesn't compile because:
...
FAIL: libgomp.c/allocators-1.c (test for excess errors)
Excess errors:
/home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22:
sorry, unimplemented: ' ' clause on
On 1/5/22 11:33, Andrew Stubbs wrote:
On 05/01/2022 10:24, Tom de Vries wrote:
On 12/21/21 12:33, Andrew Stubbs wrote:
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is
necessary to bump the minimum supported PTX version from 3.1 (~2013)
On 12/20/21 16:58, Andrew Stubbs wrote:
This patch is submitted now for review and so I can commit a backport it
to the OG11 branch, but isn't suitable for mainline until stage 1.
The patch implements support for omp_low_lat_mem_space and
omp_low_lat_mem_alloc on NVPTX offload devices. The
On 12/21/21 12:33, Andrew Stubbs wrote:
On 20/12/2021 15:58, Andrew Stubbs wrote:
In order to support the %dynamic_smem_size PTX feature is is necessary
to bump the minimum supported PTX version from 3.1 (~2013) to 4.1
(~2014).
Tobias has pointed out, privately, that the default version is
On 10/5/21 19:48, Roger Sayle wrote:
This patch to the nvptx backend changes the backend's STORE_FLAG_VALUE
from -1 to 1, by using BImode predicates and selp instructions, instead
of set instructions (almost always followed by integer negation).
Historically, it was reasonable (through rare)
On 9/17/21 5:41 PM, Roger Sayle wrote:
>
> This patch adds upon my previous patch to prototype HFmode support on
> nvptx, which includes adding new target macros TARGET_SM75 and TARGET_SM80.
I've mode those parts into this patch.
> Tobias Burnus has questioned "whether it makes sense to add
Hi,
Add support for ptx isa version 7.0, required for the addition of -misa=sm_75
and -misa=sm_80.
Tested by setting the default ptx isa version to 7.0, and doing a build and
libgomp test run.
Committed to trunk.
Thanks,
- Tom
[nvptx] Add -mptx=7.0
gcc/ChangeLog:
*
On 12/12/21 5:08 PM, Tobias Burnus wrote:
> I want to WITHDRAW that patch.
>
> I should read _emails_ before acting on _commit_ logs ...
>
heh :)
> Reason is given by Tom at:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586649.html
>
> However, we still shouldn't forget to update
On 9/17/21 10:24 AM, Tobias Burnus wrote:
> Hi Roger,
>
> some more generic remarks not specific to using new ISA features.
>
> On 17.09.21 00:53, Roger Sayle wrote:
>
>> Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points
>> where other useful instructions were added to the
On 8/27/21 12:07 PM, Roger Sayle wrote:
>
> This patch introduces some new define_insn rules to the nvptx backend,
> to perform sign-extension of a truncation (from and to the same mode),
> using a single cvt instruction. As an example, the following function
>
> int foo(int x) { return
On 8/30/21 12:54 PM, Tobias Burnus wrote:
> Document Roger's patch
> https://gcc.gnu.org/g:3c496e92d795a8fe5c527e3c5b5a6606669ae50d
>
> OK? Suggestions?
>
LGTM.
Thanks,
- Tom
On 8/20/21 12:54 AM, Roger Sayle wrote:
>
> This patch adds a __PTX_ISA__ predefined macro to the nvptx backend that
> allows code to check the compute model being targeted by the compiler.
Hi Roger,
The naming __PTX_ISA__ is consistent with the naming of -misa=sm_30/sm_35.
The
101 - 165 of 165 matches
Mail list logo