[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2022-02-02 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 Tom de Vries changed: What|Removed |Added Keywords||testsuite-fail Resolution|---

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2022-01-26 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #15 from Tom de Vries --- (In reply to Tom de Vries from comment #14) > An observation when playing around with vector-length-128-4.c: Another observation: ... $L11: ld.u64 %r108,[%r109]; st.u64 [%r112],%r108;

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2022-01-26 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #14 from Tom de Vries --- An observation when playing around with vector-length-128-4.c: there are two ways in which I can make the example pass. 1. add barrier.sync.aligned 0 or membar.cta after first broad-cast receive 2. unroll

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2022-01-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #13 from Tom de Vries --- (In reply to Tom de Vries from comment #10) > [ FTR, T400, driver 470.94 ] > > Interestingly, changing the default ptx version to 6.3 makes the minimal > test-case pass, as well as the full parallel-dims.c

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2022-01-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #12 from Tom de Vries --- Created attachment 52285 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52285=edit Cuda reproducer non-32 vector length [ On T400, driver version 470.94 ] NVCC SASS: ... $ ./do.sh NVCC SASS,

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2022-01-25 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #11 from Tom de Vries --- (In reply to Tom de Vries from comment #10) > Rerunning the entire testsuite though shows that the non-32-vector-length > test-cases are still failing. Minimal example: ... int main (void) { #pragma acc

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2022-01-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #10 from Tom de Vries --- [ FTR, T400, driver 470.94 ] Interestingly, changing the default ptx version to 6.3 makes the minimal test-case pass, as well as the full parallel-dims.c The only code changes are shfl -> shfl.sync and

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2022-01-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #9 from Tom de Vries --- Created attachment 52273 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52273=edit New cuda reproducer $ ./do.sh DRIVER SASS, ptxas=-O0: + /home/vries/cuda/11.4.3/bin/nvcc vector-max.cu

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2022-01-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #8 from Tom de Vries --- New minimal oacc example: ... int main (void) { int vectors_max = -1; #pragma acc parallel\ num_gangs (1) num_workers (1) \ copy (vectors_max) { for

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-12-09 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #7 from Tom de Vries --- (In reply to Tom de Vries from comment #6) > (In reply to Tom de Vries from comment #5) > > FIled https://developer.nvidia.com/nvidia_bug/3299227 > > Nvidia reported it will be fixed in the next major cuda

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-27 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #6 from Tom de Vries --- (In reply to Tom de Vries from comment #5) > FIled https://developer.nvidia.com/nvidia_bug/3299227 Nvidia reported it will be fixed in the next major cuda release. I've asked about driver fixes.

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-24 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #5 from Tom de Vries --- FIled https://developer.nvidia.com/nvidia_bug/3299227

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #4 from Tom de Vries --- Created attachment 50662 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50662=edit Updated cuda reproducer Slimmed down further, eliminated gang/worker reduction parts.

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #3 from Tom de Vries --- Created attachment 50660 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50660=edit Cuda reproducer

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-23 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #2 from Tom de Vries --- Minimal example: ... $ cat libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c int main (void) { int vectors_max = -1; #pragma acc parallel \ num_gangs (1) \ num_workers (1) \ vector_length

[Bug target/99932] OpenACC/nvptx offloading execution regressions starting with CUDA 11.2-era Nvidia Driver 460.27.04

2021-04-22 Thread vries at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99932 --- Comment #1 from Tom de Vries --- (In reply to Thomas Schwinge from comment #0) > We're seeing OpenACC/nvptx offloading execution regressions (including a lot > of timeouts) starting with CUDA 11.2-era Nvidia Driver 460.27.04. Confirmed >