date:20201216

Re: [slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

2020-12-16 Thread Chris Samuel

On 16/12/20 6:21 pm, Kevin Buckley wrote: The skip is occuring, in src/lua/slurm_lua.c, because of this trap That looks right to me, that's Doug's code which is checking whether the file has been updated since slurmctld last read it in. If it has then it'll reload it, but if it hasn't then

[slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

2020-12-16 Thread Kevin Buckley

Probaly not specific to 20.11.1, nor a Cray, but has anyone out there seen anything like this. As the slurmctld restarts, after upping the debug level, it all look hunky dory, [2020-12-17T09:23:46.204] debug3: Trying to load plugin /opt/slurm/20.11.1/lib64/slurm/job_submit_cray_aries.so

Re: [slurm-users] using resources effectively?

2020-12-16 Thread Weijun Gao

Thanks you Michael! I've tried the following example: NodeName=gpunode01 Gres=gpu:1 Sockets=2 CoresPerSocket=28 ThreadsPerCore=2 State=UNKNOWN RealMemory=38 PartitionName=gpu MaxCPUsPerNode=56 MaxMemPerNode=19 Nodes=gpunode01 Default=NO MaxTime=1-0 State=UP

[slurm-users] Questions about sacctmgr load filename

2020-12-16 Thread Richard Lefebvre

Hi, I would like to do the equivalent of: sacctmgr -i add user namef account=grpa sacctmgr -i add user nameg account=grpa ... sacctmgr -i add user namez account=grpa but with an "sacct -i load filename" in which filename contains the grpa with the list of user. The documentation mentions the

Re: [slurm-users] using resources effectively?

2020-12-16 Thread Renfro, Michael

We have overlapping partitions for GPU work and some kinds non-GPU work (both large memory and regular memory jobs). For 28-core nodes with 2 GPUs, we have: PartitionName=gpu MaxCPUsPerNode=16 … Nodes=gpunode[001-004] PartitionName=any-interactive MaxCPUsPerNode=12 …

[slurm-users] using resources effectively?

2020-12-16 Thread Weijun Gao

Hi, Say if I have a Slurm node with 1 x GPU and 112 x CPU cores, and: 1) there is a job running on the node using the GPU and 20 x CPU cores 2) there is a job waiting in the queue asking for 1 x GPU and 20 x CPU cores Is it possible to a) let a new job asking for 0 x GPU and 20 x

[slurm-users] Constraint multiple counts not working

2020-12-16 Thread Jeffrey T Frey

On a cluster running Slurm 17.11.8 (cons_res) I can submit a job that requests e.g. 2 nodes with unique features on each: $ sbatch --nodes=2 --ntasks-per-node=1 --constraint="[256GB*1&192GB*1]" … The job is submitted and runs as expected: on 1 node with feature "256GB" and 1 node with

Re: [slurm-users] getting fairshare

2020-12-16 Thread Paul Edmon

You can use the -o option to select which field you want it to print. The last column is the FairShare score. The equation is part of the slurm documentation: https://slurm.schedmd.com/priority_multifactor.html If you are using the Classic Fairshare you can look at our documentation:

[slurm-users] getting fairshare

2020-12-16 Thread Erik Bryer

$ sshare -a Account User RawShares NormSharesRawUsage EffectvUsage FairShare -- -- --- --- - -- root 0.00 158 1.00 root

Re: [slurm-users] Query for minimum memory required in partition

2020-12-16 Thread Paul Edmon

We do this here using the job_submit.lua script. Here is an example: if part == "bigmem" then if (job_desc.pn_min_memory ~= 0) then if (job_desc.pn_min_memory < 19 or job_desc.pn_min_memory > 2147483646) then

Re: [slurm-users] gres names

2020-12-16 Thread Erik Bryer

I just found an error in my attempt. I ran on saga-test02 while I'd made the change to saga-test01. Things are working better now. Thanks, Erik From: Erik Bryer Sent: Wednesday, December 16, 2020 8:51 AM To: Slurm User Community List Subject: Re: [slurm-users]

[slurm-users] Query for minimum memory required in partition

2020-12-16 Thread Sistemas NLHPC

Hello Good afternoon, i have a query currently in our cluster we have different partitions: 1 partition called slims with 48 Gb of ram 1 partition called general 192 Gb of ram 1 partition called largemem with 768 Gb of ram. Is it possible to restrict access to the largemem partition and for

Re: [slurm-users] gres names

2020-12-16 Thread Erik Bryer

Hi Loris, That actually makes some sense. There is one thing that troubles me though. If, on a VM with no GPUs, I define... NodeName=saga-test01 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1 RealMemory=1800 State=UNKNOWN Gres=gpu:gtx1080ti:4 ...and try to run the following I get

Re: [slurm-users] slurm/munge problem: invalid credentials

2020-12-16 Thread Ole Holm Nielsen

Hi Olaf, Since you are testing Slurm, perhape my Slurm Wiki page may be of interest to you: https://wiki.fysik.dtu.dk/niflheim/Slurm_installation There is a discussion about the setup of Munge. Best regards, Ole On 12/15/20 5:48 PM, Olaf Gellert wrote: Hi all, we are setting up a new test

Re: [slurm-users] [EXT] slurm/munge problem: invalid credentials

2020-12-16 Thread Sean Crosby

Hi Olaf, Check the firewalls between your compute node and the Slurm controller to make sure that they can contact each other. Slurmctld needs to contact the SlurmdPort (default 6818), and slurmd needs to contact the SlurmctldPort (default 6817). Also the other compute nodes need to be able to

[slurm-users] Tuto in building a slurm minimal in a single server

2020-12-16 Thread Richard Randriatoamanana

Hi, Surfing during days on the net and seeking talks/tutos on schedmd website, I didn’t really find a tuto (that works on a systemd env) how to install, configure and deploy a slurm system on a single compute server with many cores and many memory. Explanations and tutos in administration I

Re: [slurm-users] slurm/munge problem: invalid credentials

2020-12-16 Thread Ward Poelmans

On 15/12/2020 17:48, Olaf Gellert wrote: So munge seems to work as far as I can say. What else does slurm using munge? Are hostnames part of the authentication? Do I have to wonder about the time "Thu Jan 01 01:00:00 1970" I'm not an expert but I know that hostnames are part of munge

Re: [slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

[slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

Re: [slurm-users] using resources effectively?

[slurm-users] Questions about sacctmgr load filename

Re: [slurm-users] using resources effectively?

[slurm-users] using resources effectively?

[slurm-users] Constraint multiple counts not working

Re: [slurm-users] getting fairshare

[slurm-users] getting fairshare

Re: [slurm-users] Query for minimum memory required in partition

Re: [slurm-users] gres names

[slurm-users] Query for minimum memory required in partition

Re: [slurm-users] gres names

Re: [slurm-users] slurm/munge problem: invalid credentials

Re: [slurm-users] [EXT] slurm/munge problem: invalid credentials

[slurm-users] Tuto in building a slurm minimal in a single server

Re: [slurm-users] slurm/munge problem: invalid credentials

17 matches

Site Navigation

Mail list logo

Footer information