Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70

2022-02-22 Thread Tom de Vries via Gcc-patches
On 2/17/22 18:24, Tobias Burnus wrote: SM version (-misa=) [Patch adds -misa=sm_70] * The compiler supports internally: SM_30, SM_35, SM_53, SM_70, SM_75, SM_80. I'd formulate it like: it uses SM_70 internally to accurately formulate when certain insns can be used. I think it makes sense

Re: [Patch] nvptx: Add -mptx=6.0 + -misa=sm_70

2022-02-22 Thread Tom de Vries via Gcc-patches
On 2/17/22 18:24, Tobias Burnus wrote: PTX version (-mptx=) [patch adds -mptx=6.0 as option] * Currently supported internally are 3.1 (CUDA 5.0, used by GCC <= 11),   6.0 (CUDA 9.0, current GCC 12 default), 6.3 (CUDA 10.0), 7.0 (CUDA 11.0) * -mptx= supports 3.1, 6.3, 7.0 – but not the internal

[PATCH][final] Handle compiler-generated asm insn

2022-02-22 Thread Tom de Vries via Gcc-patches
Hi, For the nvptx port, with -mptx-comment we have in pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP mov.u32 %r26, 0; // #APP // 9

[committed][nvptx] Add -mptx-comment

2022-02-22 Thread Tom de Vries via Gcc-patches
Hi, Add functionality that indicates which insns are added by -minit-regs, such that for instance we have for pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP mov.u32 %r26, 0;

[committed][libgomp, testsuite, nvptx] Fix pr96390.c without CUDA

2022-02-22 Thread Tom de Vries via Gcc-patches
Hi, When running the libgomp testsuite on x86_64 with nvptx accelerator, we run into: ... XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) FAIL: libgomp.c/../libgomp.c-c++-common/pr96390.c execution test ... The problem is that we're expecting the following ptxas

[committed][nvptx] Xfail sibcall execution tests

2022-02-22 Thread Tom de Vries via Gcc-patches
Hi, On nvptx I see the following FAIL: ... FAIL: gcc.dg/sibcall-3.c execution test ... The test-case states that "this test is xfailed on targets without sibcall patterns". The nvptx port doesn't have a sibcall pattern, so add an xfail. Likewise in two similar test-cases. Tested on nvptx.

[committed][nvptx, testsuite] Remove mptx settings in gcc.target/nvptx tests

2022-02-22 Thread Tom de Vries via Gcc-patches
Hi, Some test-cases in gcc/testsuite/gcc.target/nvptx contain mptx settings, which are paired with misa settings, in order to have the mptx version support the misa version. Since commit decde11183bd ("[nvptx] Choose -mptx default based on -misa"), this is no longer necessary. Remove the mptx

Re: [RFC][nvptx] Initialize ptx regs

2022-02-21 Thread Tom de Vries via Gcc-patches
On 2/21/22 08:54, Richard Biener wrote: On Sun, Feb 20, 2022 at 11:50 PM Tom de Vries via Gcc-patches wrote: Hi, With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into: ... FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test FAIL: gcc.c-torture/execute/pr53465.c

[RFC][nvptx] Initialize ptx regs

2022-02-20 Thread Tom de Vries via Gcc-patches
Hi, With nvptx target, driver version 510.47.03 and board GT 1030 I, we run into: ... FAIL: gcc.c-torture/execute/pr53465.c -O1 execution test FAIL: gcc.c-torture/execute/pr53465.c -O2 execution test FAIL: gcc.c-torture/execute/pr53465.c -O3 -g execution test ... while the test-cases pass with

[committed][nvptx] Use _ as destination operand of atom.exch

2022-02-19 Thread Tom de Vries via Gcc-patches
Hi, We currently generate this code for an atomic store: ... .reg.u32 %r21; atom.exch.b32 %r21,[%r22],%r23; ... where %r21 is set but unused. Use the ptx bit bucket operand '_' instead, such that we have: ... atom.exch.b32 _,[%r22],%r23; ... [ Note that the same problem still occurs for this

[committed][nvptx] Don't skip atomic insns in nvptx_reorg_uniform_simt

2022-02-19 Thread Tom de Vries via Gcc-patches
Hi, In nvptx_reorg_uniform_simt we have a loop: ... for (insn = get_insns (); insn; insn = next) { next = NEXT_INSN (insn); if (!(CALL_P (insn) && nvptx_call_insn_is_syscall_p (insn)) && !(NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == PARALLEL

[committed][nvptx] Use nvptx_warpsync / nvptx_uniform_warp_check for -muniform-simt

2022-02-19 Thread Tom de Vries via Gcc-patches
Hi, With the default ptx isa 6.0, we have for uniform-simt-1.c: ... @%r33 atom.global.cas.b32 %r26, [a], %r28, %r29; shfl.sync.idx.b32 %r26, %r26, %r32, 31, 0x; ... The atomic insn is predicated by -muniform-simt, and the subsequent insn does a warp

Re: [committed][nvptx] Handle pre-sm_7x shared atomic store using atomic exchange

2022-02-15 Thread Tom de Vries via Gcc-patches
On 2/15/22 12:08, Thomas Schwinge wrote: Hi Tom! On 2022-02-15T11:52:29+0100, Tom de Vries wrote: On 2/15/22 08:34, Thomas Schwinge wrote: For my understanding: Thanks for your explanations! It is expected that this changes, for example (similar elsewhere)

Re: [committed][nvptx] Handle pre-sm_7x shared atomic store using atomic exchange

2022-02-15 Thread Tom de Vries via Gcc-patches
On 2/15/22 08:34, Thomas Schwinge wrote: Hi Tom! For my understanding: On 2022-02-10T10:13:10+0100, Tom de Vries via Gcc-patches wrote: The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory locations do not guarantee atomicity with respect to normal store

[committed][testsuite] Require non_strict_prototype in a few tests

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, Require effective target non_strict_prototype in a few test-cases. Tested on nvptx. Committed to trunk. Thanks, - Tom [testsuite] Require non_strict_prototype in a few tests gcc/testsuite/ChangeLog: 2022-02-10 Tom de Vries * gcc.c-torture/compile/pr100576.c: Require

[committed][testsuite] Require alloca support in a few tests

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, Require effective target alloca in a few test-cases. Tested on nvptx. Committed to trunk. Thanks, - Tom [testsuite] Require alloca support in a few tests gcc/testsuite/ChangeLog: 2022-02-10 Tom de Vries * c-c++-common/Walloca-larger-than.c: Require effective target alloca.

[committed][nvptx] Handle asm insn in prevent_branch_around_nothing

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, With GOMP_NVPTX_JIT=-00 and -mptx=3.1, I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/acc_prof-version-1.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \ execution test ... The problem is that we're generating a diverging branch around

[PATCH][libgomp, nvptx] Add spinlock test-cases

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, Add spinlock test-cases for nvptx. Strictly speaking, these are invalid openACC, because they're not guaranteed to terminate. But I've tested these without problems on cards from nvidia architectures Kepler, Maxwell, Pascal and Turing (though on Turing, that's what you expect given that

[PATCH][libgomp, openacc] Add terminating spinlock test-cases

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, The OpenACC execution model states that implementing a critical section across workers using atomic operations and a busy-wait loop may never succeed, since the scheduler may suspend the worker that owns the lock, in which case the worker waiting on the lock can never complete. Add a

[committed][nvptx] Handle sm_7x shared atomic store more optimal

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, For sm_7x atomic stores we fall back on expand_atomic_store, but this results in using membar.sys for shared stores. Fix this by adding an nvptx_atomic_store insn that adds a membar.cta for a shared store. Tested on x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx]

[committed][nvptx] Handle pre-sm_7x shared atomic store using atomic exchange

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory locations do not guarantee atomicity with respect to normal store instructions to the same address. This can be fixed by: - inserting barriers between normal stores and atomic operations to a common address -

[committed][nvptx] Workaround sub.u16 driver JIT bug

2022-02-10 Thread Tom de Vries via Gcc-patches
Hi, There's a nvidia driver JIT bug that mishandles this code (minimized from builtin-arith-overflow-15.c): ... int main (void) { signed char r; unsigned char y = (unsigned char) 0x80; if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, )) __builtin_abort (); return 0; }

Re: [PATCH] nvptx: Tweak constraints on copysign instructions.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 2/8/22 14:09, Roger Sayle wrote: Many thanks to Thomas Schwinge for confirming my hypothesis that the register usage regression, PR target/104345, is solely due to libgcc's _muldc3 function. In addition to the isinf functionality in the previously proposed nvptx patch at

Re: [PATCH] PR target/104345: Use nvptx "set" instruction for cond ? -1 : 0.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 2/3/22 22:00, Roger Sayle wrote: This patch addresses the "increased register pressure" regression on nvptx-none caused by my change to transition the backend to a STORE_FLAG_VALUE = 1 target. This improved code generation for the more common case of producing 0/1 Boolean values,

Re: [PATCH] nvptx: Fix and use BI mode logic instructions (e.g. and.pred).

2022-02-10 Thread Tom de Vries via Gcc-patches
On 1/16/22 12:49, Roger Sayle wrote: This patch adds support for nvptx's BImode and.pred, or.pred and xor.pred instructions. Technically, nvptx.md previously defined andbi3, iorbi3 and xorbi3 instructions, but the assembly language mnemonic output for these was incorrect (e.g. and.b1) and

Re: [PATCH] nvptx: Add support for 64-bit mul.hi (and other) instructions.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 1/14/22 10:54, Roger Sayle wrote: Now that the middle-end MULT_HIGHPART_EXPR pieces are in place, this patch adds support for nvptx's mul.hi.s64 and mul.hi.u64 instructions, as previously reviewed (provisionally pre-approved) back in August 2020:

Re: [PATCH] nvptx: Expand QI mode operations using SI mode instructions.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 1/10/22 11:58, Roger Sayle wrote: One of the unusual target features of the Nvidia PTX ISA is that it doesn't provide QI mode (byte sized) operations or registers. [ FWIW: I recently happened to check this, and it actually supports .u8/.s8/.b8 regs, but indeed just for very few

Re: [PATCH] nvptx: Improved support for HFMode including neghf2 and abshf2.

2022-02-10 Thread Tom de Vries via Gcc-patches
On 1/8/22 13:21, Roger Sayle wrote: This patch adds more support for _Float16 (HFmode) to the nvptx backend. Currently negation, absolute value and floating point comparisons are implemented by promoting to float (SFmode). This patch adds suitable define_insns to nvptx.md, most conditional on

[committed][nvptx] Unbreak build, add PTX_ISA_SM70

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, With the commit "[nvptx] Choose -mptx default based on -misa" I introduced a use of PTX_ISA_SM70, without adding it first. Add it, as well as the corresponding TARGET_SM70. Build for x86_64 with nvptx accelerator. Committed to trunk. Thanks, - Tom [nvptx] Unbreak build, add PTX_ISA_SM70

Re: [committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tom de Vries via Gcc-patches
On 2/8/22 14:24, Tobias Burnus wrote: Hi Tom, if I understand the patch correctly, -misa=sm_53 -mptx=3.1 will ... On 08.02.22 13:57, Tom de Vries via Gcc-patches wrote: Furthermore, using the -mptx option is a bit user-unfriendly. Say we want to compile for sm_80.  We can use -misa=sm_80

Re: [committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tom de Vries via Gcc-patches
On 2/8/22 13:57, Tom de Vries via Gcc-patches wrote: +static const char * +sm_version_to_string (enum ptx_isa sm) +{ + switch (sm) +{ +case PTX_ISA_SM30: + return "30"; +case PTX_ISA_SM35: + return "35"; +case PTX_ISA_SM53: + return "53

[committed][nvptx] Choose -mptx default based on -misa

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, While testing with driver version 390.147 I ran into the problem that it doesn't support ptx isa version 6.3 (the new default), only 6.1. Furthermore, using the -mptx option is a bit user-unfriendly. Say we want to compile for sm_80. We can use -misa=sm_80 to specify that, but then run

[committed][testsuite] Require c99_runtime to run builtin-sprintf.c

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, On nvptx, I run into an execution failure in test-case gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh' modifier. The port uses newlib, which does by default not support that modifier. There's a configure option --enable-newlib-io-c99-formats to enable this support, but

[committed][testsuite] Require c99_runtime to run builtin-sprintf.c

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, On nvptx, I run into an execution failure in test-case gcc.dg/tree-ssa/builtin-sprintf.c because the test-case uses the 'hh' modifier. The port uses newlib, which does by default not support that modifier. There's a configure option --enable-newlib-io-c99-formats to enable this support, but

[committed][nvptx] Fix .local atomic regressions

2022-02-08 Thread Tom de Vries via Gcc-patches
Hi, In PR target/104364, two problems were reported: - in muniform-simt mode, an atom.cas insn is no longer executed in the "master lane" only. - in msoft-stack mode, an __atomic_compare_exchange_n on stack memory is translated assuming it accesses local memory, while that's not the case.

Re: [Patch][wwwdocs + gcc] nvptx – update for -mptx change – gcc-12/changes.html + gcc/docs/invoke.texi

2022-02-04 Thread Tom de Vries via Gcc-patches
On 2/2/22 09:30, Tobias Burnus wrote: This patch updates the documentation for Tom's change of the default -mptx= version - mentioning also -mptx=7.0. I forgot whether ptx = 7.0 was working fine or whether there was a reason not to mention it. A ptx version is experimental if all sm versions

Re: Add 'libgomp.oacc-c-c++-common/private-atomic-1.c' [PR83812] (was: [PATCH][testsuite, nvptx] Add effective target sync_int_long_stack)

2022-02-03 Thread Tom de Vries via Gcc-patches
On 2/3/22 10:40, Thomas Schwinge wrote: Hi Tom! On 2021-05-19T14:56:17+0200, I wrote: On 2020-08-12T15:57:23+0200, Tom de Vries wrote: When enabling sync_int_long for nvptx, we run into a failure in gcc.dg/pr86314.c: ... nvptx-run: error getting kernel result: operation not supported on \

[committed][nvptx] Add uniform_warp_check insn

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O2 execution test ... which minimizes to the same test-case as listed in commit

[committed][nvptx] Add bar.warp.sync

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, On a GT 1030 (sm_61), with driver version 470.94 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \ -O2 execution test ... which minimizes to the same test-case as listed in commit

[committed][nvptx] Update default ptx isa to 6.3

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, With the following example, minimized from parallel-dims.c: ... int main (void) { int vectors_max = -1; #pragma acc parallel num_gangs (1) num_workers (1) copy (vectors_max) { for (int i = 0; i < 2; i++) for (int j = 0; j < 2; j++) #pragma acc loop vector reduction

[committed][nvptx] Update bar.sync for ptx isa 6.0

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, In ptx isa 6.0, a new barrier instruction was added, and bar.sync was redefined as barrier.sync.aligned. The aligned modifier indicates that all threads in a CTA will execute the same barrier instruction. The seems fine for a form "bar.sync 0". But a "bar.sync %rx,64" (as used for vector

[committed][nvptx] Handle nop in prevent_branch_around_nothing

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, When running libgomp test-case reduction-7.c on an nvptx accelerator (T400, driver version 470.86) and GOMP_NVPTX_JIT=-O0, I run into: ... reduction-7.exe:reduction-7.c:312: v_p_2: \ Assertion `out[j * 32 + i] == (i + j) * 2' failed. FAIL:

[committed][nvptx] Add some support for .local atomics

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, The ptx insn atom doesn't support local memory. In case of doing an atomic operation on local memory, we run into: ... operation not supported on global/shared address space ... This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE. The message is somewhat confusing

[committed][nvptx] Fix reduction lock

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi, When I run the libgomp test-case reduction-cplx-dbl.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ execution test

[committed][libgomp, testsuite] Fix insufficient resources in test-cases

2022-01-31 Thread Tom de Vries via Gcc-patches
Hi, When running libgomp test-case broadcast-many.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... libgomp: The Nvidia accelerator has insufficient resources to launch \ 'main$_omp_fn$0' with num_workers = 32 and vector_length = 32; \ recompile the program with

[committed][libgomp, testsuite] Reduce recursion depth in declare_target-*.f90

2022-01-31 Thread Tom de Vries via Gcc-patches
Hi, When running the libgomp testsuite with GOMP_NVPTX_JIT=-O0 using an nvptx accelerator (Nvidia T400, 2GB), I run into: ... libgomp: cuCtxSynchronize error: unspecified launch failure \ (perhaps abort was called) libgomp: cuMemFree_v2 error: unspecified launch failure libgomp: device

[PATCH][ldist] Don't add lib calls with -fno-tree-loop-distribute-patterns

2022-01-31 Thread Tom de Vries via Gcc-patches
[ was: Re: [RFC] ldist: Recognize rawmemchr loop patterns ] On 1/31/22 16:00, Richard Biener wrote: I'm running into PR56888 ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888 ) on nvptx due to this, f.i. in gcc/testsuite/gcc.c-torture/execute/builtins/strlen.c, where

Re: [RFC] ldist: Recognize rawmemchr loop patterns

2022-01-31 Thread Tom de Vries via Gcc-patches
On 9/17/21 10:08, Richard Biener via Gcc-patches wrote: On Mon, Sep 13, 2021 at 4:53 PM Stefan Schulze Frielinghaus wrote: On Mon, Sep 06, 2021 at 11:56:21AM +0200, Richard Biener wrote: On Fri, Sep 3, 2021 at 10:01 AM Stefan Schulze Frielinghaus wrote: On Fri, Aug 20, 2021 at 12:35:58PM

[committed][nvptx] Add gcc.target/nvptx/atomic-exchange-*.c test-cases

2022-01-12 Thread Tom de Vries via Gcc-patches
Hi, Add a few test-cases that test expansion of __atomic_exchange. Tested on nvptx. Committed to trunk. Thanks, - Tom [nvptx] Add gcc.target/nvptx/atomic-exchange-*.c test-cases gcc/testsuite/ChangeLog: 2022-01-12 Tom de Vries * gcc.target/nvptx/atomic-exchange-1.c: New test.

[committed][nvptx] Improve gcc.target/nvptx/atomic_fetch-*.c test-cases

2022-01-12 Thread Tom de Vries via Gcc-patches
Hi, Fix a few issues in test-cases gcc.target/nvptx/atomic_fetch-*.c: - atomic_fetch-1.c uses scan-assembler instead of scan-assembler-times, which is less accurate - atomic_fetch-2.c only contains negative testing using scan-assembler-not - the test-cases use stack variables to generate

Re: [PATCH] nvptx: Add support for PTX's cnot instruction.

2022-01-07 Thread Tom de Vries via Gcc-patches
On 1/6/22 17:42, Roger Sayle wrote: Happy New Year for 2022. This is a simple patch, now that the nvptx backend has transitioned to STORE_FLAG_VALUE=1, that adds support for NVidia's cnot instruction, that implements C/C++ style logical negation. Happy newyear to you too :) LGTM, please

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-06 Thread Tom de Vries via Gcc-patches
On 1/6/22 10:29, Tom de Vries wrote: At first glance, the above behaviour doesn't look like a too short timeout. Using patch below, this passes for me, I'm currently doing a full build and test to confirm. Looks like it has to do with: ... For sm_6x and earlier architectures, atom

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-06 Thread Tom de Vries via Gcc-patches
On 1/5/22 15:36, Andrew Stubbs wrote: On 05/01/2022 13:04, Tom de Vries wrote: On 1/5/22 12:08, Tom de Vries wrote: The allocators-1.c test-case doesn't compile because: ... FAIL: libgomp.c/allocators-1.c (test for excess errors) Excess errors:

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-05 Thread Tom de Vries via Gcc-patches
On 1/5/22 12:08, Tom de Vries wrote: The allocators-1.c test-case doesn't compile because: ... FAIL: libgomp.c/allocators-1.c (test for excess errors) Excess errors: /home/vries/oacc/trunk/source-gcc/libgomp/testsuite/libgomp.c/allocators-1.c:7:22: sorry, unimplemented: '    ' clause on

Re: [PATCH] nvptx: bump default to PTX 4.1

2022-01-05 Thread Tom de Vries via Gcc-patches
On 1/5/22 11:33, Andrew Stubbs wrote: On 05/01/2022 10:24, Tom de Vries wrote: On 12/21/21 12:33, Andrew Stubbs wrote: On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013)

Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-05 Thread Tom de Vries via Gcc-patches
On 12/20/21 16:58, Andrew Stubbs wrote: This patch is submitted now for review and so I can commit a backport it to the OG11 branch, but isn't suitable for mainline until stage 1. The patch implements support for omp_low_lat_mem_space and omp_low_lat_mem_alloc on NVPTX offload devices. The

Re: [PATCH] nvptx: bump default to PTX 4.1

2022-01-05 Thread Tom de Vries via Gcc-patches
On 12/21/21 12:33, Andrew Stubbs wrote: On 20/12/2021 15:58, Andrew Stubbs wrote: In order to support the %dynamic_smem_size PTX feature is is necessary to bump the minimum supported PTX version from 3.1 (~2013) to 4.1 (~2014). Tobias has pointed out, privately, that the default version is

Re: [PATCH] Transition nvptx backend to STORE_FLAG_VALUE = 1

2022-01-04 Thread Tom de Vries via Gcc-patches
On 10/5/21 19:48, Roger Sayle wrote: This patch to the nvptx backend changes the backend's STORE_FLAG_VALUE from -1 to 1, by using BImode predicates and selp instructions, instead of set instructions (almost always followed by integer negation). Historically, it was reasonable (through rare)

Re: [PATCH] nvptx: Adds uses of -misa=sm_75 and -misa=sm_80

2021-12-15 Thread Tom de Vries via Gcc-patches
On 9/17/21 5:41 PM, Roger Sayle wrote: > > This patch adds upon my previous patch to prototype HFmode support on > nvptx, which includes adding new target macros TARGET_SM75 and TARGET_SM80. I've mode those parts into this patch. > Tobias Burnus has questioned "whether it makes sense to add

[committed][nvptx] Add -mptx=7.0

2021-12-15 Thread Tom de Vries via Gcc-patches
Hi, Add support for ptx isa version 7.0, required for the addition of -misa=sm_75 and -misa=sm_80. Tested by setting the default ptx isa version to 7.0, and doing a build and libgomp test run. Committed to trunk. Thanks, - Tom [nvptx] Add -mptx=7.0 gcc/ChangeLog: *

Re: [Patch] invoke.texi: Add sm_35 to Nvidia PTX's -misa=

2021-12-12 Thread Tom de Vries via Gcc-patches
On 12/12/21 5:08 PM, Tobias Burnus wrote: > I want to WITHDRAW that patch. > > I should read _emails_ before acting on _commit_ logs ... > heh :) > Reason is given by Tom at: > https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586649.html > > However, we still shouldn't forget to update

Re: [PATCH] nvptx: Add (experimental) support for HFmode with -misa=sm_53

2021-12-12 Thread Tom de Vries via Gcc-patches
On 9/17/21 10:24 AM, Tobias Burnus wrote: > Hi Roger, > > some more generic remarks not specific to using new ISA features. > > On 17.09.21 00:53, Roger Sayle wrote: > >> Whilst there I also added -misa=sm_75 and -misa=sm_80 which are points >> where other useful instructions were added to the

Re: [PATCH] nvptx: Use cvt to perform sign-extension of truncation.

2021-12-08 Thread Tom de Vries via Gcc-patches
On 8/27/21 12:07 PM, Roger Sayle wrote: > > This patch introduces some new define_insn rules to the nvptx backend, > to perform sign-extension of a truncation (from and to the same mode), > using a single cvt instruction. As an example, the following function > > int foo(int x) { return

Re: [wwwdocs] gcc-12/changes.html: nvptx - new __PTX_SM__ macro

2021-08-31 Thread Tom de Vries via Gcc-patches
On 8/30/21 12:54 PM, Tobias Burnus wrote: > Document Roger's patch > https://gcc.gnu.org/g:3c496e92d795a8fe5c527e3c5b5a6606669ae50d > > OK? Suggestions? > LGTM. Thanks, - Tom

Re: [PATCH] nvptx: Add a __PTX_ISA__ predefined macro based on target ISA.

2021-08-24 Thread Tom de Vries via Gcc-patches
On 8/20/21 12:54 AM, Roger Sayle wrote: > > This patch adds a __PTX_ISA__ predefined macro to the nvptx backend that > allows code to check the compute model being targeted by the compiler. Hi Roger, The naming __PTX_ISA__ is consistent with the naming of -misa=sm_30/sm_35. The

<    1   2