https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
Tom de Vries changed:
What|Removed |Added
Keywords||testsuite-fail
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #15 from Tom de Vries ---
(In reply to Tom de Vries from comment #14)
> An observation when playing around with vector-length-128-4.c:
Another observation:
...
$L11:
ld.u64 %r108,[%r109];
st.u64 [%r112],%r108;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #14 from Tom de Vries ---
An observation when playing around with vector-length-128-4.c: there are two
ways in which I can make the example pass.
1. add barrier.sync.aligned 0 or membar.cta after first broad-cast receive
2. unroll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #13 from Tom de Vries ---
(In reply to Tom de Vries from comment #10)
> [ FTR, T400, driver 470.94 ]
>
> Interestingly, changing the default ptx version to 6.3 makes the minimal
> test-case pass, as well as the full parallel-dims.c
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #12 from Tom de Vries ---
Created attachment 52285
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52285=edit
Cuda reproducer non-32 vector length
[ On T400, driver version 470.94 ]
NVCC SASS:
...
$ ./do.sh
NVCC SASS,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #11 from Tom de Vries ---
(In reply to Tom de Vries from comment #10)
> Rerunning the entire testsuite though shows that the non-32-vector-length
> test-cases are still failing.
Minimal example:
...
int
main (void)
{
#pragma acc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #10 from Tom de Vries ---
[ FTR, T400, driver 470.94 ]
Interestingly, changing the default ptx version to 6.3 makes the minimal
test-case pass, as well as the full parallel-dims.c
The only code changes are shfl -> shfl.sync and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #9 from Tom de Vries ---
Created attachment 52273
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52273=edit
New cuda reproducer
$ ./do.sh
DRIVER SASS, ptxas=-O0:
+ /home/vries/cuda/11.4.3/bin/nvcc vector-max.cu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #8 from Tom de Vries ---
New minimal oacc example:
...
int
main (void)
{
int vectors_max = -1;
#pragma acc parallel\
num_gangs (1) num_workers (1) \
copy (vectors_max)
{
for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #7 from Tom de Vries ---
(In reply to Tom de Vries from comment #6)
> (In reply to Tom de Vries from comment #5)
> > FIled https://developer.nvidia.com/nvidia_bug/3299227
>
> Nvidia reported it will be fixed in the next major cuda
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #6 from Tom de Vries ---
(In reply to Tom de Vries from comment #5)
> FIled https://developer.nvidia.com/nvidia_bug/3299227
Nvidia reported it will be fixed in the next major cuda release. I've asked
about driver fixes.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #5 from Tom de Vries ---
FIled https://developer.nvidia.com/nvidia_bug/3299227
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #4 from Tom de Vries ---
Created attachment 50662
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50662=edit
Updated cuda reproducer
Slimmed down further, eliminated gang/worker reduction parts.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #3 from Tom de Vries ---
Created attachment 50660
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50660=edit
Cuda reproducer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #2 from Tom de Vries ---
Minimal example:
...
$ cat libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c
int
main (void)
{
int vectors_max = -1;
#pragma acc parallel \
num_gangs (1) \
num_workers (1) \
vector_length
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932
--- Comment #1 from Tom de Vries ---
(In reply to Thomas Schwinge from comment #0)
> We're seeing OpenACC/nvptx offloading execution regressions (including a lot
> of timeouts) starting with CUDA 11.2-era Nvidia Driver 460.27.04. Confirmed
>
16 matches
Mail list logo