[slurm-users] Specify a gpu ID

2021-06-02 Thread Ahmad Khalifa
How to send a job to a particular gpu card using its ID (0,1,2...etc)?

Re: [slurm-users] Specify a gpu ID

2021-06-03 Thread Paul Brunk
Hi: I've not tried to do that. But the below discussion might help: https://bugs.schedmd.com/show_bug.cgi?id=2626 From: slurm-users On Behalf Of Ahmad Khalifa Sent: Thursday, June 3, 2021 01:12 To: slurm-users@lists.schedmd.com Subject: [slurm-users] Specify a gpu ID [EXTERNAL S

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Stephan Roth
On 03.06.21 07:11, Ahmad Khalifa wrote: How to send a job to a particular gpu card using its ID (0,1,2...etc)? Why do you need to access a GPU based on its ID? If its to select a certain GPU type, there are other methods you can use. You could create partitions for the same GPU types or add f

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Ahmad Khalifa
Because there are failing GPUs that I'm trying to avoid. On Fri, Jun 4, 2021 at 5:04 AM Stephan Roth wrote: > On 03.06.21 07:11, Ahmad Khalifa wrote: > > How to send a job to a particular gpu card using its ID (0,1,2...etc)? > > Why do you need to access a GPU based on its ID? > > If its to sele

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Jason Simms
Unpopular opinion: remove the failing GPU. JLS On Fri, Jun 4, 2021 at 2:07 PM Ahmad Khalifa wrote: > Because there are failing GPUs that I'm trying to avoid. > > On Fri, Jun 4, 2021 at 5:04 AM Stephan Roth > wrote: > >> On 03.06.21 07:11, Ahmad Khalifa wrote: >> > How to send a job to a partic

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Ahmad Khalifa
I can't make hardware changes, but I still want to make use of the cluster. Let's keep the discussion on how to get slurm to do it, if that's possible. On Fri, Jun 4, 2021 at 11:13 AM Jason Simms wrote: > Unpopular opinion: remove the failing GPU. > > JLS > > On Fri, Jun 4, 2021 at 2:07 PM Ahmad

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Christopher Samuel
On 6/4/21 11:04 am, Ahmad Khalifa wrote: Because there are failing GPUs that I'm trying to avoid. Could you remove them from your gres.conf and adjust slurm.conf to match? If you're using cgroups enforcement for devices (ConstrainDevices=yes in cgroup.conf) then that should render them inacc

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Jason Simms
You don't need to chide me for making what is, to me, a reasonable solution. *You* may not be able to make hardware changes, but why the people who can would want failing GPUs remaining in a system is anathema to my approach to cluster management. In other words, I do not recommend you try to find

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Ahmad Khalifa
Thank you for your input Jason, I wasn't trying to "chide" you in any way. I appreciate your contribution to the discussion. On Fri, Jun 4, 2021 at 11:37 AM Jason Simms wrote: > You don't need to chide me for making what is, to me, a reasonable > solution. *You* may not be able to make hardware

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Fuzzy Rogers
My only thought here that is a little off-kilter would be to get a stupid do-nothing job assigned to the failing GPU for 100,000 hours… It might take a bit of work - and some to and fro- but “fake occupy” the failing GPU and every other job will maneuver around it. Again - it’s not a great sol

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Kilian Cavalotti
On Wed, Jun 2, 2021 at 10:13 PM Ahmad Khalifa wrote: > How to send a job to a particular gpu card using its ID (0,1,2...etc)? Well, you can't, because: 1. GPU ids are something of a relative concept: https://bugs.schedmd.com/show_bug.cgi?id=10933 2. requesting specific GPUs is not supported: ht

Re: [slurm-users] Specify a gpu ID

2021-06-04 Thread Valerio Bellizzomi
On Wed, 2021-06-02 at 22:11 -0700, Ahmad Khalifa wrote: > How to send a job to a particular gpu card using its ID > (0,1,2...etc)? If your GPUs are CUDA I can't help but, if you have OpenCL GPUs then your program can select a GPU with a call to getDeviceIDs() and select the GPU by number. Starting