Hi Loris, We have a completely separate test system, complete with a few worker nodes, separate slurmctld/slurmdbd, so we can test Slurm upgrades etc.
Sean -- Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead Research Computing Services | Business Services The University of Melbourne, Victoria 3010 Australia On Mon, 7 Dec 2020 at 19:01, Loris Bennett <loris.benn...@fu-berlin.de> wrote: > UoM notice: External email. Be cautious of links, attachments, or > impersonation attempts > > Hi Sean, > > Thanks for the code - looks like you have put a lot more thought into it > than I have into mine. I'll certainly have to look at handling the > 'tres-per-*' options. > > By the way, how to you do your testing? As I don't have at test > cluster, currently I'm doing "open heart" testing, but I really need a > minimal test cluster, maybe using VMs. > > Cheers, > > Loris > > Sean Crosby <scro...@unimelb.edu.au> writes: > > > Hi Loris, > > > > This is our submit filter for what you're asking. It checks for both > --gres and --gpus > > > > ESLURM_INVALID_GRES=2072 > > ESLURM_BAD_TASK_COUNT=2025 > > if ( job_desc.partition ~= slurm.NO_VAL ) then > > if (job_desc.partition ~= nil) then > > if (string.match(job_desc.partition,"gpgpu") or > string.match(job_desc.partition,"gpgputest")) then > > --slurm.log_info("slurm_job_submit (lua): detect job for gpgpu > partition") > > --Alert on invalid gpu count - eg: gpu:0 , gpu:p100:0 > > if (job_desc.gres and string.find(job_desc.gres, "gpu")) then > > local numgpu = string.match(job_desc.gres, ":%d+$") > > if(numgpu ~= nil) then > > numgpu = numgpu:gsub(':', '') > > if ( tonumber(numgpu) < 1) then > > slurm.log_user("Invalid GPGPU count specified in GRES, > must be greater than 0") > > return ESLURM_INVALID_GRES > > end > > end > > else > > --Alternative use gpus in new version of slurm > > if (job_desc.tres_per_node == nil) then > > if (job_desc.tres_per_socket == nil) then > > if (job_desc.tres_per_task == nil) then > > slurm.log_user("You tried submitting to a GPGPU > partition, but you didn't request one with GRES or GPUS") > > return ESLURM_INVALID_GRES > > else > > if (job_desc.num_tasks == slurm.NO_VAL) then > > slurm.user_msg("--gpus-per-task option requires > --tasks specification") > > return ESLURM_BAD_TASK_COUNT > > end > > end > > end > > end > > end > > end > > end > > > > Let me know if you improve it please? We're always on the hunt to fix up > some of the logic in the submit filter. > > > > Cheers, > > Sean > > > > -- > > Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead > > Research Computing Services | Business Services > > The University of Melbourne, Victoria 3010 Australia > > > > On Fri, 4 Dec 2020 at 23:58, Loris Bennett <loris.benn...@fu-berlin.de> > wrote: > > > > UoM notice: External email. Be cautious of links, attachments, or > impersonation attempts > > > > Hi, > > > > I want to reject jobs that don't specify any GPUs when accessing our GPU > > partition and have the following in job_submit.lua: > > > > if (job_desc.partition == "gpu" and job_desc.gres == nil ) then > > slurm.log_user(string.format("Please request GPU resources in the > partition 'gpu', " .. > > "e.g. '#SBATCH --gres=gpu:1' " .. > > "Please see 'man sbatch' for more > details)")) > > slurm.log_info(string.format("check_parameters: user '%s' did not > request GPUs in partition 'gpu'", > > username)) > > return slurm.ERROR > > end > > > > If GRES is not given for the GPU partition, this produces > > > > sbatch: error: Please request GPU resources in the partition 'gpu', > e.g. '#SBATCH --gres=gpu:1' Please see 'man sbatch' for more details) > > sbatch: error: Batch job submission failed: Unspecified error > > > > My questions are: > > > > 1. Is there a better error to return? The 'slurm.ERROR' produces the > > generic second error line above (slurm_errno.h just seems to have > > ESLURM_MISSING_TIME_LIMIT and ESLURM_INVALID_KNL as errors a plugin > > might raise). This is misleading, since the error is in fact known > > and specific. > > 2. I am right in thinking that 'job_desc' does not, as of 20.02.06, have > > a 'gpus' field corresponding to the sbatch/srun option '--gpus'? > > > > Cheers, > > > > Loris > > > > -- > > Dr. Loris Bennett (Hr./Mr.) > > ZEDAT, Freie Universität Berlin Email > loris.benn...@fu-berlin.de > > > -- > Dr. Loris Bennett (Hr./Mr.) > ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de > >