Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-06 Thread Matteo F
Thanks Bill, I really appreciate the time you spent giving this detailed answer. I will have a look at the plugin system as the integration with out accounting system would be a nice feature. @Chris thanks, I've had a look GrpTRES but I'll probably go with the Spank route. Best, Matteo On 6 Febr

[slurm-users] Slurm version 17.11.3 available

2018-02-06 Thread Tim Wickberg
We are pleased to announce the availability of Slurm version 17.11.3. This includes over 44 fixes made since 17.11.2 was released last month, including one issue that can result in stray processes when a job is canceled during a long-running prolog script. Slurm can be downloaded from https:/

Re: [slurm-users] Problem with nodes appear as DOWN (Not responding) slurm 17.02.9

2018-02-06 Thread Marcin Stolarek
Check returntoservice parameter in slurm.conf On Mon, 5 Feb 2018 at 20:30, Guy - wrote: > Hi, > I've compiled and installed slurm on ubuntu. it works great but if I take > a node down by running slurmd stop and start, it keeps appearing as DOWN > (Not responding) > The only fix is restarting slu

Re: [slurm-users] LAST TASK ID

2018-02-06 Thread Michael Gutteridge
Hi The environment variable SLURM_ARRAY_TASK_MAX might be used for this as well, e.g.: if [ $SLURM_ARRAY_TASK_ID -eq $SLURM_ARRAY_TASK_MAX ] then # last task fi Though I'd caution that if you need this to run after all the jobs in the array are _complete_, you should use a job

[slurm-users] LAST TASK ID

2018-02-06 Thread david martin
Hi, I´m running a batch array script and would like to execute a command after the last task #SBATCH --array 1-10%10:1 sh myscript.R inputdir/file.${SLURM_ARRAY_TASK_ID} # Would like to run a command after the last task For exemple when i was using SGE there was something like this | if($

[slurm-users] spank plugin parameter max length ?

2018-02-06 Thread Tueur Volvo
Hello, i create spank plugin and i have a problem. With my plugin i create new parameters --hbm If i write srun command it's work srun --hbm="tototututititatatetetyt" hostname but if i add 1 caractere, my slurm job "freeze", job is in R status srun --hbm="tototututititatatetetyty" hostname So ma

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-06 Thread Bill Barth
Chris probably gives the Slurm-iest way to do this, but we use a Spank plugin that counts the jobs that a user has in queue (running and waiting) and sets a hard cap on how many they can have. This should probably be scaled to the size of the system and the partition they are submitting to, but

[slurm-users] Is QOS always inherited explicitly?

2018-02-06 Thread Loris Bennett
Hi, [I didn't get an answer to this when I tacked it onto the end of another question (which I also didn't get an answer to :-/), so I'm starting a new thread.] The documentation for 'sacctmgr' says Note: the QOS that can be used at a given account in the hierarchy are inherited by the child

Re: [slurm-users] Single user consuming all resources of the cluster

2018-02-06 Thread Christopher Samuel
On 06/02/18 21:40, Matteo F wrote: I've tried to limit the number of running job using Qos -> MaxJobsPerAccount, but this wouldn't stop a user to just fill up the cluster with fewer (but bigger) jobs. You probably want to look at what you can do with the slurmdbd database and associations. Th

[slurm-users] Single user consuming all resources of the cluster

2018-02-06 Thread Matteo F
Hello there. I've just set up a small Slurm cluster for our on-premise computation needs (nothing too exotic, just a bunch of R scripts). The systems "works" if the sense that users are able to submit jobs, but I have an issue with resources management: a single user can consume all resources of