[Bug target/122783] gcc 15,16 refuses to compile correctly with nvptx target blackwell cards (sm_120) CUDA_ERROR_INVALID_CONTEXT (error 201), CUDA_ERROR_NOT_FOUND (error 500)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783 Sam James changed: What|Removed |Added Resolution|MOVED |WORKSFORME
[Bug target/122783] gcc 15,16 refuses to compile correctly with nvptx target blackwell cards (sm_120) CUDA_ERROR_INVALID_CONTEXT (error 201), CUDA_ERROR_NOT_FOUND (error 500)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783 Benjamin Schulz changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |MOVED --- Comment #5 from Benjamin Schulz --- Hi, the cuda sanitizer errors seem to be just gcc probing for cuda runtimes available I now have tried out the miscompilation in the matrix multiplication with an old gtx 1660 ti super from 2018. The miscompilation appears there too, so it does have nothing to do with my new blackwell rtx 5060 card. I also made a better reproducer where one can see what goes wrong in that matrix multiplication. It is connected to my struct using templates. It does not happen if the type of the data is written directly in the class, without templates... and it does not happen with -O1. So I am closing this bug and move this to a new bug where the problem is described more clearly https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123272 with a more fitting reproducer, which should be assessable for people with knowledge in assembler.
[Bug target/122783] gcc 15,16 refuses to compile correctly with nvptx target blackwell cards (sm_120) CUDA_ERROR_INVALID_CONTEXT (error 201), CUDA_ERROR_NOT_FOUND (error 500)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783 --- Comment #4 from Benjamin Schulz --- I also want to note that this happens with a recent nvidia driver 580.105.08 : 0/580 and cuda 13 (with nvidia-cuda-toolkit-13.02), where sam made a gcc patch that makes it drop sm_50 https://bugs.gentoo.org/965845 and allows compilation with cuda-13. So the wrong code generation does not depend on the nvidia-drivers or the cuda-toolkit version. And since clang compiles even rather complicated OpenMP code correctly, I guess one can exclude a malfunction of my card. I still have my old gpu. I would be willing to lend my rtx 5060 Ti for a weekend to a developer if that is useful to provide a fix. But probably it will take longer to investigate the source of these problems. For clang and nvidia, i found this document on intrinsics of blackwell. Perhaps that could help a bit, since clang compiles correct cuda code for my system: https://llvm.org/devmtg/2025-04/slides/technical_talk/ozen_blackwell.pdf Clang compiles fine, unless I use the message passing interface OpenMPI with offloading. Then clang shows memory errors with cuda even if I do not actually use OpenMP code, just configuring the clang offload compiler suffices then. OpenMPI devs say that this would be a clang problem, apparently when it initializes the runtime: https://github.com/open-mpi/ompi/issues/13431#issuecomment-3558265950 So one can not copy the entire approach of clang into gcc... but perhaps one may adapt some of its (working) blackwell support somehow...
