DickJC123 commented on a change in pull request #15551: Bypass cuda/cudnn 
checks if no driver.
URL: https://github.com/apache/incubator-mxnet/pull/15551#discussion_r304199281
 
 

 ##########
 File path: src/common/cuda_utils.cc
 ##########
 @@ -44,8 +44,15 @@ namespace cuda {
 // Dynamic init here will emit a warning if runtime and compile-time cuda lib 
versions mismatch.
 // Also if the user has recompiled their source to a version no longer tested 
by upstream CI.
 bool cuda_version_check_performed = []() {
-  // Don't bother with checks if there are no GPUs visible (e.g. with 
CUDA_VISIBLE_DEVICES="")
-  if (dmlc::GetEnv("MXNET_CUDA_VERSION_CHECKING", true) && 
Context::GetGPUCount() > 0) {
+  // MXNet might be built on a machine with a cuda toolkit, but no GPUs or GPU 
driver.
+  // To allow that machine to execute say: python -c 'import mxnet; 
print(mxnet.__version__)',
+  // we won't perform a check if there is no driver.  Any actual attempt to 
use the cuda API's
+  // will yield the desired message: CUDA driver version is insufficient for 
CUDA runtime version.
+  int cuda_driver_version = 0;
+  CUDA_CALL(cudaDriverGetVersion(&cuda_driver_version));
+  // Also, don't bother with checks if there are no GPUs visible (e.g. with 
CUDA_VISIBLE_DEVICES="")
+  if (dmlc::GetEnv("MXNET_CUDA_VERSION_CHECKING", true) && cuda_driver_version 
> 0
+                                                        && 
Context::GetGPUCount() > 0) {
 
 Review comment:
   Per your suggestion, I have reworked the PR and now have GetGPUCount return 
0 if cuda_driver_version == 0.
   
   Also, I feel now the best way to ensure not impacting non-gpu platforms is 
to perform the cuda/cudnn checks at the point where the user creates a GPU 
context (as opposed to the current approach that uses dynamic initialization of 
libmxnet.so).
   
   Since the context creation is defined in ./include/mxnet/base.h, and since I 
need a non-header file to ensure only one lib version warning will be emitted, 
I've moved my prior work in ./src/common/cuda_utils.cc to a new file 
./src/base.cc.  This follows the code placement of (for example) 
resource.h/resource.cc.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to