On Thu, Aug 11, 2016 at 12:46 PM, Ryan Novosielski <novos...@rutgers.edu> wrote:
> I’ll try adding the Gres debugging, but is there some way to figure out what 
> this alleged device “819275” is (this number will change with each job).

Weird, indeed. /dev/nv* devices should be 195:x, and slurmd should log
something like this:
Allowing access to device c 195:0 rwm
Not allowing access to device c 195:1 rwm
Not allowing access to device c 195:2 rwm
Not allowing access to device c 195:3 rwm

The fact that it's "actively" allowing access to bogus device 819275
makes me think it considers it as the actual GPU device. Except it got
the wrong major for it.
What does "ls -al /dev/nv*" look like on the GPU node? And which
version of Slurm is it?

Cheers,
-- 
Kilian

Reply via email to