Re: [slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

2020-12-16 Thread Chris Samuel
On 16/12/20 6:21 pm, Kevin Buckley wrote: The skip is occuring, in src/lua/slurm_lua.c, because of this trap That looks right to me, that's Doug's code which is checking whether the file has been updated since slurmctld last read it in. If it has then it'll reload it, but if it hasn't then

[slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

2020-12-16 Thread Kevin Buckley
Probaly not specific to 20.11.1, nor a Cray, but has anyone out there seen anything like this. As the slurmctld restarts, after upping the debug level, it all look hunky dory, [2020-12-17T09:23:46.204] debug3: Trying to load plugin /opt/slurm/20.11.1/lib64/slurm/job_submit_cray_aries.so

Re: [slurm-users] using resources effectively?

2020-12-16 Thread Weijun Gao
Thanks you Michael! I've tried the following example:     NodeName=gpunode01 Gres=gpu:1 Sockets=2 CoresPerSocket=28 ThreadsPerCore=2 State=UNKNOWN RealMemory=38     PartitionName=gpu MaxCPUsPerNode=56 MaxMemPerNode=19 Nodes=gpunode01 Default=NO MaxTime=1-0 State=UP    

[slurm-users] Questions about sacctmgr load filename

2020-12-16 Thread Richard Lefebvre
Hi, I would like to do the equivalent of: sacctmgr -i add user namef account=grpa sacctmgr -i add user nameg account=grpa ... sacctmgr -i add user namez account=grpa but with an "sacct -i load filename" in which filename contains the grpa with the list of user. The documentation mentions the

Re: [slurm-users] using resources effectively?

2020-12-16 Thread Renfro, Michael
We have overlapping partitions for GPU work and some kinds non-GPU work (both large memory and regular memory jobs). For 28-core nodes with 2 GPUs, we have: PartitionName=gpu MaxCPUsPerNode=16 … Nodes=gpunode[001-004] PartitionName=any-interactive MaxCPUsPerNode=12 …

[slurm-users] using resources effectively?

2020-12-16 Thread Weijun Gao
Hi, Say if I have a Slurm node with 1 x GPU and 112 x CPU cores, and:     1) there is a job running on the node using the GPU and 20 x CPU cores     2) there is a job waiting in the queue asking for 1 x GPU and 20 x CPU cores Is it possible to a) let a new job asking for 0 x GPU and 20 x

[slurm-users] Constraint multiple counts not working

2020-12-16 Thread Jeffrey T Frey
On a cluster running Slurm 17.11.8 (cons_res) I can submit a job that requests e.g. 2 nodes with unique features on each: $ sbatch --nodes=2 --ntasks-per-node=1 --constraint="[256GB*1&192GB*1]" … The job is submitted and runs as expected: on 1 node with feature "256GB" and 1 node with

Re: [slurm-users] getting fairshare

2020-12-16 Thread Paul Edmon
You can use the -o option to select which field you want it to print.  The last column is the FairShare score.  The equation is part of the slurm documentation: https://slurm.schedmd.com/priority_multifactor.html If you are using the Classic Fairshare you can look at our documentation:

[slurm-users] getting fairshare

2020-12-16 Thread Erik Bryer
$ sshare -a Account User RawShares NormSharesRawUsage EffectvUsage FairShare -- -- --- --- - -- root 0.00 158 1.00 root

Re: [slurm-users] Query for minimum memory required in partition

2020-12-16 Thread Paul Edmon
We do this here using the job_submit.lua script.   Here is an example:     if part == "bigmem" then     if (job_desc.pn_min_memory ~= 0) then     if (job_desc.pn_min_memory < 19 or job_desc.pn_min_memory > 2147483646) then  

Re: [slurm-users] gres names

2020-12-16 Thread Erik Bryer
I just found an error in my attempt. I ran on saga-test02 while I'd made the change to saga-test01. Things are working better now. Thanks, Erik From: Erik Bryer Sent: Wednesday, December 16, 2020 8:51 AM To: Slurm User Community List Subject: Re: [slurm-users]

[slurm-users] Query for minimum memory required in partition

2020-12-16 Thread Sistemas NLHPC
Hello Good afternoon, i have a query currently in our cluster we have different partitions: 1 partition called slims with 48 Gb of ram 1 partition called general 192 Gb of ram 1 partition called largemem with 768 Gb of ram. Is it possible to restrict access to the largemem partition and for

Re: [slurm-users] gres names

2020-12-16 Thread Erik Bryer
Hi Loris, That actually makes some sense. There is one thing that troubles me though. If, on a VM with no GPUs, I define... NodeName=saga-test01 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1800 State=UNKNOWN Gres=gpu:gtx1080ti:4 ...and try to run the following I get

Re: [slurm-users] slurm/munge problem: invalid credentials

2020-12-16 Thread Ole Holm Nielsen
Hi Olaf, Since you are testing Slurm, perhape my Slurm Wiki page may be of interest to you: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation There is a discussion about the setup of Munge. Best regards, Ole On 12/15/20 5:48 PM, Olaf Gellert wrote: Hi all, we are setting up a new test

Re: [slurm-users] [EXT] slurm/munge problem: invalid credentials

2020-12-16 Thread Sean Crosby
Hi Olaf, Check the firewalls between your compute node and the Slurm controller to make sure that they can contact each other. Slurmctld needs to contact the SlurmdPort (default 6818), and slurmd needs to contact the SlurmctldPort (default 6817). Also the other compute nodes need to be able to

[slurm-users] Tuto in building a slurm minimal in a single server

2020-12-16 Thread Richard Randriatoamanana
Hi, Surfing during days on the net and seeking talks/tutos on schedmd website, I didn’t really find a tuto (that works on a systemd env) how to install, configure and deploy a slurm system on a single compute server with many cores and many memory. Explanations and tutos in administration I

Re: [slurm-users] slurm/munge problem: invalid credentials

2020-12-16 Thread Ward Poelmans
On 15/12/2020 17:48, Olaf Gellert wrote: So munge seems to work as far as I can say. What else does slurm using munge? Are hostnames part of the authentication? Do I have to wonder about the time "Thu Jan 01 01:00:00 1970" I'm not an expert but I know that hostnames are part of munge