[Bug target/122783] gcc 15,16 refuses to compile correctly with nvptx target blackwell cards (sm_120) CUDA_ERROR_INVALID_CONTEXT (error 201), CUDA_ERROR_NOT_FOUND (error 500)

2026-01-19 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783

Sam James  changed:

   What|Removed |Added

 Resolution|MOVED   |WORKSFORME

[Bug target/122783] gcc 15,16 refuses to compile correctly with nvptx target blackwell cards (sm_120) CUDA_ERROR_INVALID_CONTEXT (error 201), CUDA_ERROR_NOT_FOUND (error 500)

2025-12-22 Thread schulz.benjamin at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783

Benjamin Schulz  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |MOVED

--- Comment #5 from Benjamin Schulz  ---

Hi, the cuda sanitizer errors seem to be just gcc probing for cuda runtimes
available

I now have tried out the miscompilation in the matrix multiplication with an
old gtx 1660 ti super from 2018. The miscompilation appears there too, so it
does have nothing to do with my new blackwell rtx 5060 card. 

I also made a better reproducer where one can see what goes wrong in that
matrix multiplication. It is connected to my struct using templates.

It does not happen if the type of the data is written directly in the class,
without templates... and it does not happen with -O1.

So I am closing this bug and move this to a new bug where the problem is
described more clearly

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123272

with a more fitting reproducer, which should be  assessable for people with
knowledge in assembler.

[Bug target/122783] gcc 15,16 refuses to compile correctly with nvptx target blackwell cards (sm_120) CUDA_ERROR_INVALID_CONTEXT (error 201), CUDA_ERROR_NOT_FOUND (error 500)

2025-12-07 Thread schulz.benjamin at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783

--- Comment #4 from Benjamin Schulz  ---
I also want to note that this happens with a recent nvidia driver 580.105.08  :
0/580 and cuda 13 (with nvidia-cuda-toolkit-13.02), where sam made a gcc patch
that makes it drop sm_50 https://bugs.gentoo.org/965845 and allows compilation
with cuda-13.

 So the wrong code generation does not depend on the nvidia-drivers or the
cuda-toolkit version. 

And since clang compiles even rather complicated OpenMP code correctly, I guess
one can exclude a malfunction of my card.

I still have my old gpu. I would be willing to lend my rtx 5060 Ti for a
weekend to a developer if that is useful to provide a fix. But probably it will
take longer to investigate the source of these problems.

For clang and nvidia, i found this document on intrinsics of blackwell. Perhaps
that could help a bit, since clang compiles correct cuda code for my system:

https://llvm.org/devmtg/2025-04/slides/technical_talk/ozen_blackwell.pdf

Clang compiles fine, unless I use the message passing interface OpenMPI with
offloading. Then clang shows memory errors with cuda even if I do not actually
use OpenMP code, just configuring the clang offload compiler suffices then.
OpenMPI devs say that this would be a clang problem, apparently when it
initializes the runtime:
https://github.com/open-mpi/ompi/issues/13431#issuecomment-3558265950 So one
can not copy the entire approach of clang into gcc... but perhaps one may adapt
some of its (working) blackwell support somehow...