[patch] various OpenACC reduction enhancements

2018-06-29 Thread Cesar Philippidis
The following patch set includes various OpenACC reduction enhancements present in og8. These include the following individual og8 commits: * (4469fc4) [Fortran] Permit reductions in gfc_omp_clause_copy_ctor * (704f1a2) [nxptx, OpenACC] vector reductions * (8a35c89) [OpenACC] Fix a reduction

Re: [patch] Update support for Fortran arrays in OpenACC

2018-06-29 Thread Cesar Philippidis
On 06/29/2018 10:49 AM, Jakub Jelinek wrote: > On Fri, Jun 29, 2018 at 10:33:56AM -0700, Cesar Philippidis wrote: >> @@ -1044,21 +1046,6 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p) >> return; >> >>tree decl = OMP_CLAUSE_DECL (c); >> - >

[patch] Update support for Fortran arrays in OpenACC

2018-06-29 Thread Cesar Philippidis
e reported line number in fortran combined OpenACC directives Is this patch OK for trunk? It bootstrapped / regression tested cleanly for x86_64 with nvptx offloading. Thanks, Cesar 2018-06-29 Cesar Philippidis gcc/fortran/ * trans-array.c (gfc_trans_array_bounds): Add an INIT_VLA ar

Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-06-29 Thread Cesar Philippidis
Ping. Ceasr On 06/20/2018 02:59 PM, Cesar Philippidis wrote: > At present, the nvptx libgomp plugin does not take into account the > amount of shared resources on GPUs (mostly shared-memory are register > usage) when selecting the default num_gangs and num_workers. In certain > si

Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-06-21 Thread Cesar Philippidis
On 06/20/2018 03:15 PM, Tom de Vries wrote: > On 06/20/2018 11:59 PM, Cesar Philippidis wrote: >> Now it follows the formula contained in >> the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA. > > Any reason we're not using the cuda

[patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-06-20 Thread Cesar Philippidis
is patch OK for trunk? Thanks, Cesar 2018-06-20 Cesar Philippidis gcc/ * config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Delete define. (PTX_DEFAULT_RUNTIME_DIM): New define. (nvptx_goacc_validate_dims): Use it to allow the runtime to dynamically allocate

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - runtime

2018-06-20 Thread Cesar Philippidis
On 06/20/2018 10:03 AM, Jakub Jelinek wrote: > On Wed, Jun 20, 2018 at 09:59:29AM -0700, Cesar Philippidis wrote: >> If it means anything, we have a significant async change that removes >> the async_refcount field in that struct. > > Wasn't async_refcount removed 2 y

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - runtime

2018-06-20 Thread Cesar Philippidis
On 06/20/2018 09:45 AM, Jakub Jelinek wrote: > On Tue, Jun 19, 2018 at 10:01:20AM -0700, Cesar Philippidis wrote: >> >From 53ee03231c5e6e4747b4ef01335079a2d4a98480 Mon Sep 17 00:00:00 2001 >> From: Cesar Philippidis >> Date: Tue, 19 Jun 2018 09:33:04 -0700 >> Subjec

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - runtime tests

2018-06-19 Thread Cesar Philippidis
This patch updates the existing OpenACC libgomp runtime tests with the new OpenACC 2.5 data clause semantics. Is it OK for trunk? Cesar 2018-06-19 Chung-Lin Tang Thomas Schwinge Cesar Philippidis libgomp/ * testsuite/libgomp.oacc-c-c++-common/data-already-1.c: Update test case

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - compiler tests

2018-06-19 Thread Cesar Philippidis
This patch updates the existing OpenACC compiler tests with the new OpenACC 2.5 data clause semantics. Is it OK for trunk? Cesar 2018-06-19 Chung-Lin Tang Thomas Schwinge Cesar Philippidis gcc/testsuite/ * c-c++-common/goacc/declare-1.c: Update test case to utilize OpenACC 2.5

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - runtime

2018-06-19 Thread Cesar Philippidis
This patch implements the OpenACC 2.5 data clause semantics in libgomp. Is it OK for trunk? Cesar 2018-06-19 Chung-Lin Tang Thomas Schwinge Cesar Philippidis libgomp/ * libgomp.h (struct splay_tree_key_s): Add dynamic_refcount member. (gomp_acc_remove_pointer): Update

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - middle end

2018-06-19 Thread Cesar Philippidis
This patch implements the OpenACC 2.5 data clause semantics in the middle end. Is it OK for trunk? Cesar 2018-06-19 Chung-Lin Tang Thomas Schwinge Cesar Philippidis gcc/c-family/ * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_{FINALIZE,IF_PRESENT}. Remove

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - Fortran

2018-06-19 Thread Cesar Philippidis
This patch implements the OpenACC 2.5 data clause semantics in the Fortran FE. Is it OK for trunk? Cesar 2018-06-19 Chung-Lin Tang Thomas Schwinge Cesar Philippidis gcc/fortran/ * gfortran.h (gfc_omp_clauses): Add unsigned if_present, finalize bitfields. * openmp.c (enum

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - C

2018-06-19 Thread Cesar Philippidis
This patch implements the OpenACC 2.5 data clause semantics in the C FE. Is it OK for trunk? Cesar 2018-06-19 Chung-Lin Tang Thomas Schwinge Cesar Philippidis gcc/c/ * c-parser.c (c_parser_omp_clause_name): Add support for finalize and if_present. Make present_or_{copy,copyin

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior - C++

2018-06-19 Thread Cesar Philippidis
This patch implements the OpenACC 2.5 data clause semantics in the C++ FE. Is it OK for trunk? Cesar 2018-06-19 Chung-Lin Tang Thomas Schwinge Cesar Philippidis gcc/cp/ * parser.c (cp_parser_omp_clause_name): Add support for finalize and if_present. Make present_or_{copy

Re: [OpenACC] Update OpenACC data clause semantics to the 2.5 behavior

2018-06-19 Thread Cesar Philippidis
Ping. To make this patch easier to review, I'll split it into individual patches for each major component in follow up emails. Cesar On 05/25/2018 01:01 PM, Cesar Philippidis wrote: > This patch updates GCC's to support OpenACC 2.5's data clause semantics. > In OpenACC

[PATCH] handle OpenMP/OpenACC regions inside Fortran character functions (PR85703)

2018-06-13 Thread Cesar Philippidis
ement decoders, and therein lies the problems. The fix is to reset gfc_matching_function early in those functions. Is this OK for trunk and GCC 8? Thanks, Cesar 2018-06-13 Cesar Philippidis PR fortran/85703 gcc/fortran/ * parse.c (decode_oacc_directive): Set gfc_matching_function to

[PATCH] update error reporting for OpenACC wait (PR85702)

2018-06-13 Thread Cesar Philippidis
ch to trunk as obvious. Cesar 2018-06-13 Cesar Philippidis PR fortran/85702 gcc/fortran/ * openmp.c (gfc_match_oacc_wait): Use %C to report error location. gcc/testsuite/ * gfortran.dg/goacc/pr85702.f90: New test. >From 07022efa1ba4a58fa12c3f8a3b911fba32a5df1b Mon Sep 17 00:00:00 2

[PATCH] Reject function and subroutine arguments in OpenACC declare data clauses (PR85701)

2018-06-05 Thread Cesar Philippidis
, this may have to be revisited. I tested this patch on x86_64 with nvptx offloading. Is it OK for trunk and the stable branches? Thanks, Cesar 2018-06-05 Cesar Philippidis PR fortran/85701 gcc/fortran/ * openmp.c (gfc_resolve_oacc_declare): Error on functions and subroutine data clause

[PATCH] fix checking error with OpenACC reference types variables (PR85879)

2018-05-31 Thread Cesar Philippidis
. It this OK for trunk? Thanks, Cesar 2018-05-31 Chung-Lin Tang Cesar Philippidis PR middle-end/85879 gcc/ * gimplify.c (gimplify_adjust_omp_clauses): Add 'remove = true' when emitting error on private/firstprivate reductions. * omp-low.c (lower_omp_target): Avoid refe

[OpenACC] Update OpenACC data clause semantics to the 2.5 behavior

2018-05-25 Thread Cesar Philippidis
ns to the fortran FE. Is this patch OK for trunk? I tested with x86_64-linux with nvptx acceleration. Thanks, Cesar 2018-05-25 Chung-Lin Tang Thomas Schwinge Cesar Philippidis gcc/c-family/ * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OA

Re: Fix PR85782: C++ ICE with continue statements inside acc loops

2018-05-18 Thread Cesar Philippidis
Ping. For reference, I've attached the patch for gcc7. Cesar On 05/15/2018 07:11 AM, Cesar Philippidis wrote: > This patch resolves the issue in PR85782, which involves a C++ ICE > caused by OpenACC loops which contain continue statements. The problem > is that genericize_continu

Fix PR85782: C++ ICE with continue statements inside acc loops

2018-05-15 Thread Cesar Philippidis
cause cp_genericize_r uses if statements to check for statement types instead of a huge switch statement. Cesar 2018-05-15 Cesar Philippidis PR c++/85782 gcc/cp/ * cp-gimplify.c (cp_genericize_r): Call genericize_omp_for_stmt for OACC_LOOPs. gcc/testsuite/ * c-c++-common/goacc/pr85782.c

Re: [og7] Update deviceptr handling in Fortran

2018-05-09 Thread Cesar Philippidis
c-c++-common/goacc/deviceptr-4.c -std=c++98 (test for excess > errors) I forgot to update the expected data mapping in devicetpr-4.c. Now, instead of implicitly adding a 'copy' clause for know deviceptr variables, the gimplifier will assign a force_deviceptr clause. I've ap

[og7] Backport libgomp gomp_copy_host2dev coalesce optimization from trunk

2018-05-07 Thread Cesar Philippidis
This patch backports Jakub's gomp_copy_host2dev optimization from <https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01800.html>. There were a couple of changes required due to the new async infrastructure in og7. I've applied this patch to og7. Cesar 2018-05-07 Thomas Schw

[og7] Update deviceptr handling in Fortran

2018-05-07 Thread Cesar Philippidis
ls on at least one legacy driver. Cesar 2018-05-07 Cesar Philippidis gcc/fortran/ * trans-openmp.c (gfc_omp_finish_clause): Don't create pointer data mappings for deviceptr clauses. (gfc_trans_omp_clauses_1): Likewise. gcc/ * gimplify.c (enum gimplify_omp_var_data): Add GOVD_DEVICETP

[PATCH] cleanup libgomp's coalesce chunk data structures

2018-05-02 Thread Cesar Philippidis
er by introducing a new gomp_coalesce_chunk structure with explicit start and end members. Beyond that, there's no functional changes to this patch. Is it OK for trunk? I tested it against x86_64-linux with nvptx acceleration. Thanks, Cesar 2018-05-02 Cesar Philippidis libgomp/ * target

Re: [PATCH] Update nvptx newlib installation requirements

2018-04-24 Thread Cesar Philippidis
On 04/24/2018 12:10 AM, Richard Biener wrote: > That's great news! Note that we usually keep copies of build dependences at > ftp://gcc.gnu.org/pub/gcc/infrastructure/ and there's currently no nvptx > newlib > variant there. Maybe you can prepare a tarball that's ready to plug into gcc > source

[PATCH] Update nvptx newlib installation requirements

2018-04-23 Thread Cesar Philippidis
Thanks, Cesar [nvptx] Update newlib dependency. 2018-04-23 Cesar Philippidis gcc/ * doc/install.texi: Update newlib dependency for nvptx. diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 64ad2445a33..121a821857f 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -4

Re: [PATCH] Handle empty infinite loops in OpenACC for PR84955

2018-04-12 Thread Cesar Philippidis
On 04/12/2018 11:27 AM, H.J. Lu wrote: > On Wed, Apr 11, 2018 at 12:30 PM, Cesar Philippidis > wrote: >> On 04/09/2018 04:31 AM, Richard Biener wrote: >>> On Fri, 6 Apr 2018, Jakub Jelinek wrote: >>> >>>> On Fri, Apr 06, 2018 at 06:48:52AM -0700, Cesar

Re: [PATCH] Handle empty infinite loops in OpenACC for PR84955

2018-04-11 Thread Cesar Philippidis
On 04/09/2018 04:31 AM, Richard Biener wrote: > On Fri, 6 Apr 2018, Jakub Jelinek wrote: > >> On Fri, Apr 06, 2018 at 06:48:52AM -0700, Cesar Philippidis wrote: >>> 2018-04-06 Cesar Philippidis >>> >>> PR middle-end/84955 >>> >>>

[og7] Enable worker partitioning with warp-sized vector_length

2018-04-10 Thread Cesar Philippidis
. Consequently, not all of the CUDA threads were being utilized when vector_length = 32 (which is the default case). I've committed this patch to openacc-gcc-7-branch which allows warp-sized vectors to nest inside worker-partitioned loops. Cesar 2018-04-10 Cesar Philippidis gcc/ * config/

[PATCH] Handle empty infinite loops in OpenACC for PR84955

2018-04-06 Thread Cesar Philippidis
18-04-06 Cesar Philippidis PR middle-end/84955 gcc/ * cfgloop.c (flow_loops_find): Add assert. * omp-expand.c (expand_oacc_for): Add dummy false branch for tiled basic blocks without omp continue statements. * tree-cfg.c (execute_fixup_cfg): Handle calls to internal functi

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

2018-03-30 Thread Cesar Philippidis
On 03/30/2018 07:45 AM, Tom de Vries wrote: > On 03/30/2018 03:07 AM, Tom de Vries wrote: >> On 03/02/2018 05:55 PM, Cesar Philippidis wrote: >>> As a follow up patch will show, the nvptx BE falls back to using >>> vector_length = 32 when a vector loop is nested in

Re: [PATCH,nvptx] Fix PR85056

2018-03-28 Thread Cesar Philippidis
On 03/27/2018 01:17 AM, Tom de Vries wrote: > On 03/26/2018 11:57 PM, Cesar Philippidis wrote: >> As noted in PR85056, the nvptx BE isn't declaring external arrays using >> PTX array notation. Specifically, it's emitting code that's missing the >> empty angle b

[PATCH,nvptx] Fix PR85056

2018-03-26 Thread Cesar Philippidis
this patch OK for trunk if the results come back clean? Thanks, Cesar 2018-03-26 Cesar Philippidis gcc/ PR target/85056 * config/nvptx/nvptx.c (nvptx_assemble_decl_begin): Add '[]' to extern array declarations. gcc/testsuite/ * testsuite/gcc.target/nvptx/pr85056.c: New test.

Re: [og7] vector_length extension part 4: target hooks and automatic parallelism

2018-03-26 Thread Cesar Philippidis
On 03/26/2018 07:14 AM, Tom de Vries wrote: > On 03/02/2018 08:18 PM, Cesar Philippidis wrote: >> diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c >> index ba3f4317f4e..f15ce6b8f8d 100644 >> --- a/gcc/omp-offload.c >> +++ b/gcc/omp-offload.c >> @@ -626,7 +626,8

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

2018-03-22 Thread Cesar Philippidis
On 03/22/2018 10:51 AM, Tom de Vries wrote: > On 03/22/2018 06:24 PM, Cesar Philippidis wrote: >> On 03/22/2018 09:18 AM, Tom de Vries wrote: >> >>> That's obviously not good enough. >>> >>> When I compile this test-case: >>> ... >>

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

2018-03-22 Thread Cesar Philippidis
On 03/22/2018 10:39 AM, Tom de Vries wrote: > On 03/02/2018 05:55 PM, Cesar Philippidis wrote: >> +  rtx red_partition; /* Similar to bcast_partition, except for vector >> +    reductions.  */ > > Shouldn't this be in "[og7] vector_length extension part 3: r

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

2018-03-22 Thread Cesar Philippidis
arguments for maxntid. Cesar >From 11035dc92884146dc4d974156adcb260568db785 Mon Sep 17 00:00:00 2001 From: Cesar Philippidis Date: Thu, 22 Mar 2018 08:05:53 -0700 Subject: [PATCH] emit .maxntid hint --- gcc/config/nvptx/nvptx.c | 19 +++ gcc/config/nvptx/nvptx.h | 2 ++

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

2018-03-22 Thread Cesar Philippidis
On 03/22/2018 07:44 AM, Tom de Vries wrote: > On 03/02/2018 05:55 PM, Cesar Philippidis wrote: >> The attached patch generalizes the worker state propagation and >> synchronization code to handle large vectors. When the vector_length is >> larger than a CUDA warp, the nvptx B

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

2018-03-22 Thread Cesar Philippidis
On 03/22/2018 07:23 AM, Tom de Vries wrote: > On 03/02/2018 05:55 PM, Cesar Philippidis wrote: > >> (nvptx_declare_function_name): Emit a .maxntid directive hint and >> call nvptx_init_oacc_workers. > >> + >> +  /* Emit a .maxntid hint to help the PTX JIT

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

2018-03-22 Thread Cesar Philippidis
On 03/22/2018 06:43 AM, Tom de Vries wrote: > On 03/22/2018 04:59 AM, Cesar Philippidis wrote: >> On 03/21/2018 10:10 AM, Tom de Vries wrote: >>> Changing the code generation scheme for workers is fine, but obviously >>> that should be a minimal, separate patch

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

2018-03-21 Thread Cesar Philippidis
On 03/21/2018 10:10 AM, Tom de Vries wrote: > On 03/02/2018 05:55 PM, Cesar Philippidis wrote: >> In addition, nvptx_cta_sync and the corresponding nvptx_barsync insn, >> have been extended to take a barrier ID and a thread count. The idea >> here is to assign one barrier fo

Re: [og7] vector_length extension part 4: target hooks and automatic parallelism

2018-03-21 Thread Cesar Philippidis
On 03/21/2018 08:49 AM, Tom de Vries wrote: > On 03/02/2018 08:18 PM, Cesar Philippidis wrote: > >> og7-vl-part4-hooks.diff > >> diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c >> index 5642941c6a3..507c8671704 100644 >> --- a/gcc/config/nvptx/

[og7] backport fix for PR84952

2018-03-20 Thread Cesar Philippidis
to nvptx_cta_sync so that function can be used for both large vector_lengths along with workers. Other than that, I didn't have to make any changes to his patch. Cesar 2018-03-20 Cesar Philippidis gcc/ * config/nvptx/nvptx.c (nvptx_single): Revert changes from 7445a4d40. Backport fro

Re: [og7] Update nvptx_fork/join barrier placement

2018-03-19 Thread Cesar Philippidis
On 03/19/2018 10:02 AM, Tom de Vries wrote: > On 03/19/2018 03:55 PM, Cesar Philippidis wrote: >>> Note that this changes ordering of the vector-neutering jump and >>> worker-neutering jump at the end. In principle, this should not be >>> harmful, but it viol

Re: [og7] Update nvptx_fork/join barrier placement

2018-03-19 Thread Cesar Philippidis
On 03/19/2018 07:04 AM, Tom de Vries wrote: > On 03/09/2018 05:55 PM, Cesar Philippidis wrote: >> On 03/09/2018 08:21 AM, Tom de Vries wrote: >>> On 03/09/2018 12:31 AM, Cesar Philippidis wrote: >>>> Nvidia Volta GPUs now support warp-level synchronization. >&g

[og7] Backport PR74048 and PR81352 nvptx fixes

2018-03-12 Thread Cesar Philippidis
cts unused parallelism (in this case, num_workers was being set but there was no worker partitioned loop). That problem went away with an extra dg-warning line. Cesar 2018-03-12 Cesar Philippidis Backport from trunk: 2018-01-25 Tom de Vries PR target/84028 gcc/ * config/nvptx/nvptx.c (nv

Re: [og7] Update nvptx_fork/join barrier placement

2018-03-09 Thread Cesar Philippidis
On 03/09/2018 08:21 AM, Tom de Vries wrote: > On 03/09/2018 12:31 AM, Cesar Philippidis wrote: >> Nvidia Volta GPUs now support warp-level synchronization. > > Well, let's try to make that statement a bit more precise. > > All Nvidia architectures have supported synch

Re: [og7] vector_length extension part 1: generalize function and variable names

2018-03-09 Thread Cesar Philippidis
On 03/09/2018 07:29 AM, Thomas Schwinge wrote: > On Thu, 1 Mar 2018 13:17:01 -0800, Cesar Philippidis > wrote: >> To reduce the size of the final patch, >> I've separated all of the misc. function and variable renaming into this >> patch. > > Yes, please

[og7] Update nvptx_fork/join barrier placement

2018-03-08 Thread Cesar Philippidis
vptx_cta_sync. Cesar 2018-03-08 Cesar Philippidis gcc/ * config/nvptx/nvptx.c (nvptx_single): Adjust placement of nvptx_fork and nvptx_join nutering labels. (nvptx_process_pars): Place the CTA barrier at the beginning of the join block. diff --git a/gcc/config/nvptx/nvptx.c b/gcc/

[og7] vector_length extension part 5: libgomp and tests

2018-03-02 Thread Cesar Philippidis
c-7-branch once the reduction changes have been approved. Cesar 2018-03-02 Cesar Philippidis libgomp/ * plugin/plugin-nvptx.c (nvptx_exec): Adjust calculations of workers and vectors. * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: New test. * testsuite/libgomp.oacc-fortran/gemm.f90: Ne

[og7] vector_length extension part 4: target hooks and automatic parallelism

2018-03-02 Thread Cesar Philippidis
code. Overall, the changes in this patch are mild. I'll apply it to openacc-gcc-7-branch after Tom approves the reduction patch. Cesar 2018-03-02 Cesar Philippidis gcc/ * config/nvptx/nvptx.c (NVPTX_GOACC_VL_WARP): Define. (nvptx_goacc_needs_vl_warp): New function. (nvptx_goac

[og7] vector_length extension part 3: reductions

2018-03-02 Thread Cesar Philippidis
finalizer will be slow. However, that's a project for another day. I'll commit this patch to openacc-gcc-7-branch after Tom reviews the new nvptx_red_partition insn. Cesar 2018-03-02 Cesar Philippidis gcc/ * config/nvptx/nvptx-protos.h (nvptx_output_red_partition): Decl

[og7] vector_length extension part 2: Generalize state propagation and synchronization

2018-03-02 Thread Cesar Philippidis
oversial. I'll commit this patch to openacc-gcc-7-branch once the other patches are ready. There will be three more patches in this series. Cesar 2018-03-02 Cesar Philippidis gcc/ * config/nvptx/nvptx.c (oacc_bcast_partition): Declare. (nvptx_init_axis_predicate): Initi

[og7] vector_length extension part 1: generalize function and variable names

2018-03-01 Thread Cesar Philippidis
it will be used in other places, including nvptx_validate_dims and the nvptx reduction handling code. This patch has been committed to openacc-gcc-7-branch. Cesar 2018-03-01 Cesar Philippidis gcc/ * config/nvptx/nvptx.c (PTX_VECTOR_LENGTH, PTX_WORKER_LENGTH, PTX_DEFAULT_RUNTIME_DIM): Move

[og7] Properly handle alloca'd arrays in OpenACC data mappings

2018-01-31 Thread Cesar Philippidis
this problem would have been detected sooner. I'm considering moving the PTX .param pass later, possible during oaccdevlow. But that will have to wait for some other time. I've applied this patch to openacc-gcc-7-branch. Cesar Properly handle alloca'd OpenACC data mappings 2018-01-3

[og7] Enable firstprivate OpenACC reductions

2018-01-31 Thread Cesar Philippidis
nside gimplify.c:omp_add_variable. I know that it's been a while since you last worked on this. Let me know if you have any state on that code, otherwise I'll handle the cleanup. Cesar Enable firstprivate OpenACC reductions 2018-01-31 Cesar Philippidis gcc/ * gimplify.c (omp_add_variable): Allow certain Op

[og7] Privatize independent OpenACC reductions

2018-01-26 Thread Cesar Philippidis
committee argue that the reduction variable in inner-reduction.c should be firstprivate, not copy. Cesar Privatize independent OpenACC reductions. 2018-01-26 Cesar Philippidis gcc/ * gimplify.c (oacc_privatize_reduction): New function. (omp_add_variable): Use it to determine if a reduction va

[og7] Build libffi during bootstrap.

2018-01-25 Thread Cesar Philippidis
al variables at runtime. Cesar Build libffi during bootstrap. 2018-01-25 Cesar Philippidis * Makefile.def: Bootstrap module libffi. Add libffi dependency to all-target-libgomp. * Makefile.in: Regenerate. * configure.ac: Add libffi to bootstrap_target_libs when libgomp is bootstrapped. * config

[og7,nvptx] Backport CUDA 9 support from trunk.

2018-01-19 Thread Cesar Philippidis
into trunk. This patch keeps both trunk and og7 consistent. Cesar [nvptx] Backport CUDA 9 support from trunk. 2018-01-19 Cesar Philippidis Backport from mainline: 2018-01-19 Cesar Philippidis PR target/83790 gcc/ * config/nvptx/nvptx.c (output_init_frag): diff --git a/gcc/config/nvpt

[og7] backport fix for PR83920

2018-01-19 Thread Cesar Philippidis
I've backported the patch Tom committed to trunk to fix PR83920 to openacc-gcc-7-branch in revision d0a1e0fa43ca4004fde33707cb6a93c01cb11507. No changes were required for og7. The original email can be found here . Cesar

Re: [PATCH,PTX] Add support for CUDA 9

2018-01-18 Thread Cesar Philippidis
On 12/19/2017 04:39 PM, Tom de Vries wrote: > On 12/20/2017 12:25 AM, Cesar Philippidis wrote: >> og7-ptx-cuda9.diff >> >> >> 2017-12-19  Cesar Philippidis  >> >> gcc/ >> * config/nvptx/nvptx.c (output_init_frag): Don't use generic addres

[PATCH,NVPTX] Fix PR83920

2018-01-17 Thread Cesar Philippidis
In PR83920, I encountered a nvptx bug where live predicate variables were clobbered before their value was broadcasted. Apparently, there were problems in certain version of the CUDA driver where the JIT would generate wrong code for shfl broadcasts. The attached patch teaches nvptx_single not to a

Re: [PATCH,PTX] Add support for CUDA 9

2018-01-17 Thread Cesar Philippidis
On 12/27/2017 01:16 AM, Tom de Vries wrote: > On 12/21/2017 06:19 PM, Cesar Philippidis wrote: >> My test results are somewhat inconsistent. On MG's build servers, there >> are no regressions in CUDA 8. > > Ack. > >> On my laptop, there are fewer regressions

Re: [PATCH,WIP] Use functional parameters for data mappings in OpenACC child functions

2017-12-21 Thread Cesar Philippidis
On 12/18/2017 02:58 PM, Cesar Philippidis wrote: > Jakub, > > I'd like your thoughts on the following problem. > > One of the offloading bottlenecks with GPU acceleration in OpenACC is > the nontrivial offloaded function invocation overhead. At present, GCC > gener

Re: [PATCH,PTX] Add support for CUDA 9

2017-12-21 Thread Cesar Philippidis
On 12/20/2017 03:15 PM, Tom de Vries wrote: > On 12/20/2017 11:59 PM, Cesar Philippidis wrote: >> On 12/19/2017 04:39 PM, Tom de Vries wrote: >>> On 12/20/2017 12:25 AM, Cesar Philippidis wrote: >>>> In CUDA 9, Nvidia removed support for treating the labels of fun

Re: [PATCH,PTX] Add support for CUDA 9

2017-12-20 Thread Cesar Philippidis
On 12/19/2017 04:39 PM, Tom de Vries wrote: > On 12/20/2017 12:25 AM, Cesar Philippidis wrote: >> In CUDA 9, Nvidia removed support for treating the labels of functions >> as generic address spaces as part of their PTX 6.0 changes. More >> specifically, >> <http:/

[PATCH,PTX] Add support for CUDA 9

2017-12-19 Thread Cesar Philippidis
eneric address space when initializing variables using a label address. Is this OK for trunk? Thanks, Cesar 2017-12-19 Cesar Philippidis gcc/ * config/nvptx/nvptx.c (output_init_frag): Don't use generic address spaces for function labels. gcc/testsuite/ * gcc.target/nvptx/indirect_call.

[PATCH,WIP] Use functional parameters for data mappings in OpenACC child functions

2017-12-18 Thread Cesar Philippidis
spend too much time on it if we decide to go with a different approach. Any thoughts are welcome. By the way, next we'll be working on increasing vector_length on nvptx targets. In conjunction with that, we'll simplifying the OpenACC execution model in the nvptx BE, along with adding

[patch] Fix bug in an OpenACC async test case

2017-12-01 Thread Cesar Philippidis
runs too fast on the GPU so that the copy'ed out data is correct, or the Nvidia's CUDA runtime blocks all device->host data transfers until the GPU is no longer processing the data. I suspect it's the former. I've applied this patch to trunk and og7 as obvious. Cesar 2017-1

Re: [PATCH, 2/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-11-07 Thread Cesar Philippidis
On 07/04/2017 03:05 AM, Tom de Vries wrote: > On 07/03/2017 04:24 PM, Tom de Vries wrote: >> On 07/03/2017 04:08 PM, Thomas Schwinge wrote: >>> Hi! >>> >>> On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek >>> wrote: On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote: > On Mon,

Re: [RFC PATCH] Coalesce host to device transfers in libgomp

2017-10-24 Thread Cesar Philippidis
On 10/24/2017 02:55 AM, Jakub Jelinek wrote: > Poeple from NVidia reported privately unexpected amount of host2dev > transfers for #pragma omp target*. Did they mention which program they were testing? > The code even had comments like: >/* FIXME: Perhaps add some smarts, lik

[og7] Enable 0-length array data mappings for implicit data clauses

2017-10-11 Thread Cesar Philippidis
n another compiler which goes through the trouble of mapping the dynamic array to the accelerator automatically, but that's beyond the scope of this patch. Cesar 2017-10-11 Cesar Philippidis gcc/ * gimplify.c (oacc_default_clause): Create implicit 0-length array data clauses for point

[og7] Enable fortran derived types in acc enter/exit data

2017-10-11 Thread Cesar Philippidis
appen in the near term. Cesar 2017-10-11 Cesar Philippidis gcc/fortran/ * openmp.c (match_acc): Add new argument derived_types. Propagate it to gfc_match_omp_clauses. (gfc_match_oacc_enter_data): Update call to match_acc. (gfc_match_oacc_exit_data): Likewise. gcc/testsuite/ * gfortran.dg/

[og7] Allow the accelerator to have more offloaded functions than the host

2017-10-11 Thread Cesar Philippidis
umber of offloaded functions. As a temporary workaround, this patch teaches libgomp to allow the accelerator to possess more offloaded functions than the host. I've applied this patch to openacc-gcc-7-branch. Is it also suitable for trunk? Cesar 2017-10-11 Cesar Philippidis libgomp/

Re: [gomp4] OpenACC async re-work

2017-10-10 Thread Cesar Philippidis
On 10/10/2017 11:08 AM, Thomas Schwinge wrote: > Reported by Cesar for a test case similar to the one below, where we > observe: > > acc_prof-cuda-1.exe: [...]/libgomp/oacc-profiling.c:592: > goacc_profiling_dispatch_p: Assertion `thr->prof_info == NULL' failed. > > This is because of: > >

[OpenACC] Don't restrict ACC wait arguments to constants in fortran

2017-09-21 Thread Cesar Philippidis
emove this restriction. I'll backport this patch to og7 shortly. Cesar 2017-09-21 Cesar Philippidis gcc/fortran/ * openmp.c (gfc_match_oacc_wait): Don't restrict wait directive arguments to constant integers. gcc/testsuite/ * gfortran.dg/goacc/wait.f90: New test. diff --git a/gc

[OpenACC] Add support for floating point type in GOMP_MAP_FIRSTPRIVATE_INT

2017-09-20 Thread Cesar Philippidis
before stage1 closes. Still, this patch does represent an incremental improvement by itself. Is this patch OK for trunk? Maybe a followup patch should enable floating point values GOMP_MAP_FIRSTPRIVATE_INT values in OpenMP. Cesar 2017-09-20 Cesar Philippidis gcc/ * omp-low.c

[OpenACC] Enable SIMD vectorization on vector loops

2017-09-13 Thread Cesar Philippidis
tch OK for trunk? Cesar 2017-09-13 Cesar Philippidis gcc/ * omp-offload.c (oacc_xform_loop): Enable SIMD vectorization on non-SIMT targets in acc vector loops. diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 2d4fd411680..9d5b8bef649 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp

[og7] dynamic num_workers

2017-08-11 Thread Cesar Philippidis
rally improves usability because it doesn't require the end user to rebuild their program multiple times to find the optimal num_workers. That's because aforementioned GOMP_OPENACC_DIM environment variable can be used at run time. Cesar 2017-08-11 Cesar Philippidis gcc/ * confi

[og7] Adjust k80 resources

2017-08-11 Thread Cesar Philippidis
ce this failure, but it does show up in various OpenACC tests such as cloverleaf. I'll try to create a reduced test case that uses a lot of hardware registers later. Cesar 2017-08-11 Cesar Philippidis libgomp/ * plugin/cuda/cuda.h (CUdevice_attribute): Add CU_DEVICE_ATTRIBUTE_COMP

[og7] Fix libgomp.oacc-c/asyncwait-2.c

2017-08-01 Thread Cesar Philippidis
allow variables. Cesar 2017-08-01 Cesar Philippidis Thomas Schwinge gcc/ * omp-expand.c (expand_omp_target): Don't expect OMP_CLAUSE_WAIT_EXPR to be a constant expression. diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c index 8301dcb0de5..bf1f127d8d6 100644 --- a/gcc/omp-exp

Re: [gomp4] OpenACC async re-work

2017-07-25 Thread Cesar Philippidis
On 07/25/2017 05:51 AM, Chung-Lin Tang wrote: > On 2017/6/29 6:31 AM, Cesar Philippidis wrote: > Attached is the updated version of the patch, re-tested. > > Thomas, do you need some more time to look over it? Or should I commit it > first? I'm not too concerned about

Re: [PATCH] update edge profile info in nvptx.c

2017-07-20 Thread Cesar Philippidis
On 07/20/2017 06:04 AM, Tom de Vries wrote: > On 07/13/2017 06:53 PM, Cesar Philippidis wrote: >> Similarly, for nvptx vector reductions, when it comes time to initialize >> the reduction variable, the nvptx BE constructs a branch so that only >> vector lanes 1 to vector_len

Re: [PATCH] allow deferred-shape pointers in OpenACC data clauses

2017-07-16 Thread Cesar Philippidis
On 07/16/2017 10:28 AM, Thomas Koenig wrote: > Am 14.07.2017 um 16:11 schrieb Cesar Philippidis: >> This patch teaches the fortran FE to allow deferred-shape pointers to be >> used in OpenACC data clauses. While the spec states that arrays must be >> contiguous, I belie

[PATCH] allow deferred-shape pointers in OpenACC data clauses

2017-07-14 Thread Cesar Philippidis
there is one. Is this OK for trunk? I tested it on x86_64 with nvptx offloading. Cesar 2017-07-14 Cesar Philippidis gcc/fortran/ * openmp.c (check_array_not_assumed): Don't error on noncontiguous deferred-shape pointers. gcc/testsuite/ * gfortran.dg/goacc/deferred-shape-pointer.f90

[PATCH] update edge profile info in nvptx.c

2017-07-13 Thread Cesar Philippidis
on to the gang and worker reductions, I set the probability of the new edge introduced for the vector reduction to even. Is this OK for trunk? Cesar 2017-07-13 Cesar Philippidis gcc * config/nvptx/nvptx.c (nvptx_lockless_update): Update edge profiling information. (nvptx_lockfull_u

[PATCH] dynamically set default num_gangs in OpenACC

2017-07-05 Thread Cesar Philippidis
them into a single patch. Is this OK for trunk? Thanks, Cesar 2017-07-05 Cesar Philippidis libgomp/ * plugin/cuda/cuda.h (CUdevice_attribute): Add CU_DEVICE_ATTRIBUTE_MAX_SHARED_MEMORY_PER_MULTIPROCESSOR. (CUfunction_attribute): Add CU_FUNC_ATTRIBUTE_BINARY_VERSION. * plug

[PATCH] Fix PR77765

2017-06-29 Thread Cesar Philippidis
his patch OK for trunk and gcc7? Thanks, Cesar 2017-06-29 Cesar Philippidis PR fortran/77765 gcc/fortran/ * openmp.c (gfc_match_oacc_routine): Check if proc_name exist before comparing the routine name against it. gcc/testsuite/ * gfortran.dg/goacc/pr77765.f90: New test. diff --g

Re: [gomp4] OpenACC async re-work

2017-06-28 Thread Cesar Philippidis
On 06/27/2017 03:56 AM, Chung-Lin Tang wrote: > On 2017/6/27 6:45 AM, Cesar Philippidis wrote: >>> (1) Instead of essentially implementing the entire OpenACC async support >>> inside the plugin, we now use an opaque 'goacc_asyncqueue' implemented >>&

Re: [gomp4] OpenACC async re-work

2017-06-26 Thread Cesar Philippidis
I still need more time to review this, but ... On 06/24/2017 12:54 AM, Chung-Lin Tang wrote: > Hi Cesar, Thomas, > This patch is the re-implementation of OpenACC async we talked about. > The changes are rather large, so I am putting it here for a few days before > actually committing them to gomp-

[gomp4] Properly handle allocatable scalars in acc update.

2017-06-08 Thread Cesar Philippidis
. Now gfc_trans_omp_clauses_1 can handle the allocatable scalar update directly. I've applied this patch to gomp-4_0-branch. Cesar 2017-06-08 Cesar Philippidis gcc/fortran/ * gfortran.h gfc_omp_clauses: Add update_allocatable. * trans-openmp.c (gfc_trans_omp_clauses_1): Set update_allocatable bit

[gomp4] Update libgomp documentation

2017-05-16 Thread Cesar Philippidis
ly impacts libgomp. And I was tempted to just to copy the complete texinfo.tex file from texinfo 6.3, like it was last done in 2012. However, I wasn't sure if that would have required any strange dependencies. I've applied this patch to gomp-4_0-branch. Cesar 2017-05-16 Cesar Philippidis

[gomp4] Handle parallel reductions explicitly initialized by the user

2017-05-12 Thread Cesar Philippidis
trunk? I'll hopefully start working on trunk again in a week or so. Cesar 2017-05-12 Cesar Philippidis gcc/ * config/nvptx/nvptx.c (nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist. libgomp/ * testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c: New test

[gomp4] correct an implicit data clause bug

2017-05-11 Thread Cesar Philippidis
which leads to bogus data clauses and failures at run time. I've applied this patch to gomp-4_0-branch to fix that. Cesar 2017-05-11 Cesar Philippidis gcc/ * gimplify.c (gomp_needs_data_present): Ensure that the correct decl is matched when scanning clauses in the enclosing acc data

[gomp4] OpenACC update if_present runtime support

2017-05-09 Thread Cesar Philippidis
27;ve applied this patch to gomp-4_0-branch. Cesar 2017-05-09 Cesar Philippidis gcc/ * gimplify.c (gimplify_omp_target_update): Relax OpenACC update data mappings to GOMP_MAP_{TO,FROM} when the user specifies if_present. gcc/testsuite/ * c-c++-common/goacc/update-if_present-1.c: Update test

[gomp4] Add front end support for the if_present clause with the update directive

2017-05-04 Thread Cesar Philippidis
with goacc.exp are built with -fopenacc, but for some reason the tests in g++.dg/goacc/ are still ran without -fopenacc for g++.dg/dg.exp. Maybe there's something wrong with g++.dg/goacc/goacc.exp handling of .C files? This patch has been committed to gomp-4_0-branch. Cesar 2017-05-04 Cesar P

[gomp4] Don't mark OpenACC auto loops as independent inside acc parallel regions

2017-05-03 Thread Cesar Philippidis
n the loop accordingly. This patch, which I've applied to gomp-4_0-branch makes GCC comply with this new behavior. Cesar 2017-05-03 Cesar Philippidis gcc/ * omp-low.c (lower_oacc_head_mark): Don't mark OpenACC auto loops as independent inside acc parallel regions. gcc/testsuite/

<    1   2   3   4   5   6   7   >