I have a machine with a quad-core CPU and an Nvidia GPU with 3500+ cores. I want to run around 10 jobs in parallel on the GPU (mostly are CUDA based jobs).
PROBLEM: Each job asks for only 100 shards (runs usually for a minute or so), then I should be able to run 3500/100 = 35 jobs in parallel but slurm runs only 4 jobs in parallel keeping the rest in the queue. I have this in slurm.conf and gres.conf: # GPU GresTypes=gpu,shard # COMPUTE NODES PartitionName=pzero Nodes=ALL Default=YES MaxTime=INFINITE State=UP` PartitionName=pgpu Nodes=hostgpu MaxTime=INFINITE State=UP NodeName=hostgpu NodeAddr=x.x.x.x Gres=gpu:gtx_1080_ti:1,shard:3500 CPUs=4 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 RealMemory=64255 State=UNKNOWN ---------------------- Name=gpu Type=gtx_1080_ti File=/dev/nvidia0 Count=1 Name=shard Count=3500 File=/dev/nvidia0
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com