I've committed this to the devel/omp/gcc-12 branch.

The patch fixes a concurrency issue where the spin-locks didn't work well if many GPU threads tried to free low-latency memory all at once.

Adding a short sleep instruction is enough for the hardware thread to yield and allow another to proceed. The alloc routine already had this feature, so this just corrects an accidental omission.

This patch will get folded into the previous OG12 patch series when I repost it for mainline.

Andrew
amdgcn, openmp: Fix concurrency in low-latency allocator

The previous code works fine on Fiji and Vega 10 devices, but bogs down in The
spin locks on Vega 20 or newer.  Adding the sleep instructions fixes the
problem.

libgomp/ChangeLog:

        * basic-allocator.c (basic_alloc_free): Use BASIC_ALLOC_YIELD.
        (basic_alloc_realloc): Use BASIC_ALLOC_YIELD.

diff --git a/libgomp/basic-allocator.c b/libgomp/basic-allocator.c
index b4b9e4ba13a..a61828e48a0 100644
--- a/libgomp/basic-allocator.c
+++ b/libgomp/basic-allocator.c
@@ -188,6 +188,7 @@ basic_alloc_free (char *heap, void *addr, size_t size)
          break;
        }
       /* Spin.  */
+      BASIC_ALLOC_YIELD;
     }
   while (1);
 
@@ -267,6 +268,7 @@ basic_alloc_realloc (char *heap, void *addr, size_t oldsize,
          break;
        }
       /* Spin.  */
+      BASIC_ALLOC_YIELD;
     }
   while (1);
 

Reply via email to