https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88706

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #0)
> I think the same problem exists for the other work around in
> nvptx_adjust_parallelism, this one:
> ...
>   /* FIXME: This is overly conservative; worker and vector loop will        
> 
>      eventually be combined.  */
>   if (wv)
>     return inner_mask & ~GOMP_DIM_MASK (GOMP_DIM_WORKER);
> ...
> It's just harder to spot because the workaround doesn't affect vector length.

Confirmed.

With this additional patch:
...
@@ -5695,7 +5696,10 @@ nvptx_adjust_parallelism (unsigned inner_mask, unsigned
outer_mask)
   /* FIXME: This is overly conservative; worker and vector loop will
      eventually be combined.  */
   if (wv)
-    return inner_mask & ~GOMP_DIM_MASK (GOMP_DIM_WORKER);
+    {
+      fprintf (stderr, "worker-vector loop workaround applied in %s\n",
current_function_name ());
+      return inner_mask & ~GOMP_DIM_MASK (GOMP_DIM_WORKER);
+    }

   /* It's difficult to guarantee that warps in large vector_lengths
      will remain convergent when a vector loop is nested inside a
...

we see for the first case (vector_length set on parallel directive, no
-fopenacc-dim=):
...
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
worker-vector loop workaround applied in test2._omp_fn.1
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 128
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
...

and for the second case (no vector_length set on parallel directive, using
-fopenacc-dim=):
...
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
oa.vector_length in nvptx_adjust_parallelism: 32
...

Reply via email to