Hi!

On 2018-02-22T12:23:25+0100, Tom de Vries <tom_devr...@mentor.com> wrote:
> when using cuda 9 nvprof with an openacc executable, the executable hangs.
>
> The scenario resulting in the hang is as follows:
> 1. goacc_lazy_initialize calls gomp_mutex_lock (&acc_device_lock)
> 2. goacc_lazy_initialize calls acc_init_1
> 3. acc_init_1 calls goacc_profiling_dispatch (&prof_info,
>     &device_init_event_info, &api_info);
> 4. goacc_profiling_dispatch calls the registered callback in the cuda
>     profiling library
> 5. the registered call back calls acc_get_device_type
> 6. acc_get_device_type calls gomp_mutex_lock (&acc_device_lock)
> 7. The lock is not recursive, so we have deadlock
>
> The registered callback in cuda 8 does not call acc_get_device_type, so
> the hang doesn't occur there.

(ACK for the general problem description/analysis.)

> This patch fixes the hang by detecting in acc_get_device_type that the
> calling thread is a thread that is currently initializing the openacc
> part of the libgomp library, and returning acc_device_none, which is a
> legal value given that the openacc standard states "If the device type
> has not yet been selected, the value acc_device_none may be returned".

(This specific way of resolving the issue I still have to look into.
This may need a more general solution, to make all such libgomp OpenACC
entry points re-entrant.)

> Committed to og7 branch.

What Frederik has discovered today in the hard way... is that the og9
version of this patch did get its code altered in a way so that it no
longer resolves the problem it's meant to resolve -- the hang was back.
On Git-mirror-based openacc-gcc-9-branch that's:

    commit 84af3c5a2fbb5023057e2ca319b0c22f5f7d4795
    Author:     Julian Brown <jul...@codesourcery.com>
    AuthorDate: Tue Feb 26 16:00:54 2019 -0800
    Commit:     Kwok Cheung Yeung <k...@codesourcery.com>
    CommitDate: Fri May 31 13:40:07 2019 -0700

        Fix hang when running oacc exec with CUDA 9.0 nvprof

        2018-09-20  Tom de Vries  <tdevr...@suse.de>
                    Cesar Philippidis  <ce...@codesourcery.com>

                libgomp/
                [...]

..., which got cherry-picked (automated, without any review) into current
devel/omp/gcc-9 in commit f752d880a5abc591a25ad22fb892363f6520bcf1.

Of course, it would've helped tremendously had the original og7 commit
included a test case...  :'-/ (... by simply reproducing the nested calls
that CUDA 9 nvprof seems to be doing.)

Still without a test case, for now I have pushed the attached patch to
devel/omp/gcc-9 in commit 9ae129017c7fc1fa638d6beedd3802b515ca692b 'Fix
og9 "Fix hang when running oacc exec with CUDA 9.0 nvprof"'.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
>From 9ae129017c7fc1fa638d6beedd3802b515ca692b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tho...@codesourcery.com>
Date: Wed, 25 Mar 2020 17:57:02 +0100
Subject: [PATCH] Fix og9 "Fix hang when running oacc exec with CUDA 9.0
 nvprof"

Compared to the original og7 version, and still-good og8 version, the og9
version of this patch did get its code altered in a way so that it no longer
resolves the problem it's meant to resolve -- the hang was back.

	libgomp/
	* oacc-init.c (acc_init_1): Move 'acc_init_state' logic to where
	it belongs.
---
 libgomp/ChangeLog.omp |  5 +++++
 libgomp/oacc-init.c   | 10 +++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 88957864a69..75c45917998 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,3 +1,8 @@
+2020-03-25  Thomas Schwinge <tho...@codesourcery.com>
+
+	* oacc-init.c (acc_init_1): Move 'acc_init_state' logic to where
+	it belongs.
+
 2019-11-22  Kwok Cheung Yeung  <k...@codesourcery.com>
 
 	* testsuite/libgomp.oacc-fortran/lib-16.f90: Fix async-safety issue.
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index beeeb48c106..765fa2f3b95 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -231,6 +231,11 @@ acc_dev_num_out_of_range (acc_device_t d, int ord, int ndevs)
 static struct gomp_device_descr *
 acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit)
 {
+  gomp_mutex_lock (&acc_init_state_lock);
+  acc_init_state = initializing;
+  acc_init_thread = pthread_self ();
+  gomp_mutex_unlock (&acc_init_state_lock);
+
   bool check_not_nested_p;
   if (implicit)
     {
@@ -293,11 +298,6 @@ acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implicit)
   struct gomp_device_descr *base_dev, *acc_dev;
   int ndevs;
 
-  gomp_mutex_lock (&acc_init_state_lock);
-  acc_init_state = initializing;
-  acc_init_thread = pthread_self ();
-  gomp_mutex_unlock (&acc_init_state_lock);
-
   base_dev = resolve_device (d, true);
 
   ndevs = base_dev->get_num_devices_func ();
-- 
2.17.1

Reply via email to