Hi! On 2018-02-22T12:23:25+0100, Tom de Vries <tom_devr...@mentor.com> wrote: > when using cuda 9 nvprof with an openacc executable, the executable hangs. > > The scenario resulting in the hang is as follows: > 1. goacc_lazy_initialize calls gomp_mutex_lock (&acc_device_lock) > 2. goacc_lazy_initialize calls acc_init_1 > 3. acc_init_1 calls goacc_profiling_dispatch (&prof_info, > &device_init_event_info, &api_info); > 4. goacc_profiling_dispatch calls the registered callback in the cuda > profiling library > 5. the registered call back calls acc_get_device_type > 6. acc_get_device_type calls gomp_mutex_lock (&acc_device_lock) > 7. The lock is not recursive, so we have deadlock > > The registered callback in cuda 8 does not call acc_get_device_type, so > the hang doesn't occur there.
(ACK for the general problem description/analysis.) > This patch fixes the hang by detecting in acc_get_device_type that the > calling thread is a thread that is currently initializing the openacc > part of the libgomp library, and returning acc_device_none, which is a > legal value given that the openacc standard states "If the device type > has not yet been selected, the value acc_device_none may be returned". (This specific way of resolving the issue I still have to look into. This may need a more general solution, to make all such libgomp OpenACC entry points re-entrant.) > Committed to og7 branch. What Frederik has discovered today in the hard way... is that the og9 version of this patch did get its code altered in a way so that it no longer resolves the problem it's meant to resolve -- the hang was back. On Git-mirror-based openacc-gcc-9-branch that's: commit 84af3c5a2fbb5023057e2ca319b0c22f5f7d4795 Author: Julian Brown <jul...@codesourcery.com> AuthorDate: Tue Feb 26 16:00:54 2019 -0800 Commit: Kwok Cheung Yeung <k...@codesourcery.com> CommitDate: Fri May 31 13:40:07 2019 -0700 Fix hang when running oacc exec with CUDA 9.0 nvprof 2018-09-20 Tom de Vries <tdevr...@suse.de> Cesar Philippidis <ce...@codesourcery.com> libgomp/ [...] ..., which got cherry-picked (automated, without any review) into current devel/omp/gcc-9 in commit f752d880a5abc591a25ad22fb892363f6520bcf1. Of course, it would've helped tremendously had the original og7 commit included a test case... :'-/ (... by simply reproducing the nested calls that CUDA 9 nvprof seems to be doing.) Still without a test case, for now I have pushed the attached patch to devel/omp/gcc-9 in commit 9ae129017c7fc1fa638d6beedd3802b515ca692b 'Fix og9 "Fix hang when running oacc exec with CUDA 9.0 nvprof"'. Grüße Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
>From 9ae129017c7fc1fa638d6beedd3802b515ca692b Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <tho...@codesourcery.com> Date: Wed, 25 Mar 2020 17:57:02 +0100 Subject: [PATCH] Fix og9 "Fix hang when running oacc exec with CUDA 9.0 nvprof" Compared to the original og7 version, and still-good og8 version, the og9 version of this patch did get its code altered in a way so that it no longer resolves the problem it's meant to resolve -- the hang was back. libgomp/ * oacc-init.c (acc_init_1): Move 'acc_init_state' logic to where it belongs. --- libgomp/ChangeLog.omp | 5 +++++ libgomp/oacc-init.c | 10 +++++----- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp index 88957864a69..75c45917998 100644 --- a/libgomp/ChangeLog.omp +++ b/libgomp/ChangeLog.omp @@ -1,3 +1,8 @@ +2020-03-25 Thomas Schwinge <tho...@codesourcery.com> + + * oacc-init.c (acc_init_1): Move 'acc_init_state' logic to where + it belongs. + 2019-11-22 Kwok Cheung Yeung <k...@codesourcery.com> * testsuite/libgomp.oacc-fortran/lib-16.f90: Fix async-safety issue. diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c index beeeb48c106..765fa2f3b95 100644 --- a/libgomp/oacc-init.c +++ b/libgomp/oacc-init.c @@ -231,6 +231,11 @@ acc_dev_num_out_of_range (acc_device_t d, int ord, int ndevs) static struct gomp_device_descr * acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit) { + gomp_mutex_lock (&acc_init_state_lock); + acc_init_state = initializing; + acc_init_thread = pthread_self (); + gomp_mutex_unlock (&acc_init_state_lock); + bool check_not_nested_p; if (implicit) { @@ -293,11 +298,6 @@ acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit) struct gomp_device_descr *base_dev, *acc_dev; int ndevs; - gomp_mutex_lock (&acc_init_state_lock); - acc_init_state = initializing; - acc_init_thread = pthread_self (); - gomp_mutex_unlock (&acc_init_state_lock); - base_dev = resolve_device (d, true); ndevs = base_dev->get_num_devices_func (); -- 2.17.1