[slurm-dev] Re: Single gres.conf file and multiple GPUs

2015-01-14 Thread Kilian Cavalotti
Hi Jared, On Wed, Jan 14, 2015 at 2:14 PM, Jared David Baker wrote: > NodeName=loren[01-60] Name=gpu Type=k20x File=/dev/nvidia[0-3] I don't think you can aggregate multiple GPUs on a single line (at least that was the case in 14.03). So you would have to split it up over 4 lines: NodeName=lor

[slurm-dev] Re: Single gres.conf file and multiple GPUs

2015-01-14 Thread Franco Broi
I didn't know you could have node names in the gres.conf files, I thought you needed one per node, but anyway, in my setup the gres count is in the gres.conf file and not in the node specification in slurm.conf. On Wed, 2015-01-14 at 14:13 -0800, Jared David Baker wrote: > Hello all, > > >

[slurm-dev] Single gres.conf file and multiple GPUs

2015-01-14 Thread Jared David Baker
Hello all, I've been playing with the config of Slurm-14.11.3 a bit now, but after reading much of the documentation on gres.conf file and the slurm.conf file, I'm not seeing the expected behavior relative to the documentation. I have the following in slurm.conf: -- GresTypes=gpu NodeName=lore

[slurm-dev] Re: agent waited too long for nodes to respond, sending batch request anyway...

2015-01-14 Thread Robert Zeigler
Although the job is allocated to the node, slum won't try to actually send the job to the node until the node's slurmd is responsive. So there is window of opportunity to update the ip address before slurm starts trying to send jobs to the node. The difficult part is that it's simplest to set th

[slurm-dev] Re: agent waited too long for nodes to respond, sending batch request anyway...

2015-01-14 Thread Anatoliy Kovalenko
how can we do it? I mean, that as soon as slurm receives a job, along with launching of the "ResumeProgram", it already knows an old IP address of the node (the one the node had last time). Thus, while node is loading and update information about it, slurm returns a message that an error has occurr

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Chrysovalantis Paschoulas
Hi Loris! What I would do in your case is the following: - First option: I would use 2 QoSs: the first one would be the default "normal" QoS with the normal limits and then second would be the "restrict" QoS with the more restricting limits for the specific partition. In each user association

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Trey Dockendorf
Mapping a given partition to a specific QOS can be done using the job_submit plugins. We use the lua job submit plugin to assign a QOS based on the partition requested by a user. Some of our user's applications (OSG related) have no way to specify a QOS so we do it for them. Older example of wha

[slurm-dev] Re: agent waited too long for nodes to respond, sending batch request anyway...

2015-01-14 Thread Robert Zeigler
The trick is having your resume program operation update the ip address of the newly booted node before it returns. Robert GATAATGCTATTTCTTTAACGAA > On Jan 14, 2015, at 4:51 AM, Anatoliy Kovalenko > wrote: > > Our script executes the suggested operation every time when RedsumeProgram i

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Loris Bennett
Hi, Uwe Sauter writes: > Hi, > > accounts are structured in a tree, where every branch inherites the > limits from its parent. The leafs of such a branch then would be users. > You can then change the limit of a branch account without having to edit > every of your 300 user accounts. I have se

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Nathan Harper
I've looked at this in the past - however, when I use your example (replacing with real data) I just get a 'Nothing Modified' response -- *Nathan Harper* // IT Systems Architect *e: * nathan.har...@cfms.org.uk // *t: * 0117 906 1104 // *m: * 07875 510891 // *w: * www.cfms.org.uk

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Mehdi Denou
Hi, Raw example: sacctmgr modify user where name=foo cluster=my_cluster account=bar partition=part1 set maxjobs=5 Le 14/01/2015 13:59, tejas.deshpa...@wipro.com a écrit : Hi Team, I also need to do the similar configuration to restrict number of procs. Per user using associations. Could

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Loris Bennett
Hi Tejas, writes: > Hi Team, > > I also need to do the similar configuration to restrict number of > procs. Per user using associations. Could you please suggest any > command.. or Configuration related stuff..? > > > Thanks, > Tejas I'm not sure what you mean by "procs". However, using 'sac

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Uwe Sauter
Hi, accounts are structured in a tree, where every branch inherites the limits from its parent. The leafs of such a branch then would be users. You can then change the limit of a branch account without having to edit every of your 300 user accounts. A user may then also be connected to several b

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Uwe Sauter
Please have a look into the documentation at http://slurm.schedmd.com , esp. the man pages of sacctmgr and slurm.conf and the pages about accounting, qos and resource limits. Without any warranty these are the options in slurm.conf I needed: EnforcePartLimits=YES AccountingStorageEnforce=associa

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Loris Bennett
Hi, Uwe Sauter writes: > Hi, > > an association is the combination of > > * QoS > * partition > * account > * cluster If I understand correctly, an association actually seems to be a combination of user (rather than QoS), cluster, partition, and account, with each association potentially suppo

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread tejas.deshpande
Hi Team, I also need to do the similar configuration to restrict number of procs. Per user using associations. Could you please suggest any command.. or Configuration related stuff..? Thanks, Tejas -Original Message- From: Uwe Sauter [mailto:uwe.sauter...@gmail.com] Sent: Wednesday, J

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Uwe Sauter
Hi, an association is the combination of * QoS * partition * account * cluster With that in mind you can restrict jobs per user on a partition level, you just have to add the partition to the association rules. Regards, Uwe Am 14.01.2015 um 11:22 schrieb Loris Bennett: > > Hi Uwe,

[slurm-dev] Re: agent waited too long for nodes to respond, sending batch request anyway...

2015-01-14 Thread Anatoliy Kovalenko
Our script executes the suggested operation every time when RedsumeProgram is starting. This is proposed in the 4th step of documentation http://slurm.schedmd.com/elastic_computing.html Also, if every time we must do this operation "by hand", then dynamic allocation of nodes for its jobs by slurm d

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Loris Bennett
Hi Uwe, Uwe Sauter writes: > Hi, > > have a look into Slurm accounting/QoS. There are options to limit jobs > per user, jobs per group, etc. pp. > > http://slurm.schedmd.com/accounting.html > http://slurm.schedmd.com/qos.html I saw that a QOS can restrict the number of jobs per user, but how c

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Loris Bennett
Hallo Anatoliy, Anatoliy Kovalenko writes: > Re: [slurm-dev] Restricting number of jobs per user in partition > > https://computing.llnl.gov/linux/slurm/resource_limits.html MaxJobs = > The total number of jobs able to run at any given time from this > association. If this limit is reached new

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Uwe Sauter
Hi, have a look into Slurm accounting/QoS. There are options to limit jobs per user, jobs per group, etc. pp. http://slurm.schedmd.com/accounting.html http://slurm.schedmd.com/qos.html Regards, Uwe Am 14.01.2015 um 10:14 schrieb Loris Bennett: > > Hi, > > I have a test partition i

[slurm-dev] Re: Restricting number of jobs per user in partition

2015-01-14 Thread Anatoliy Kovalenko
https://computing.llnl.gov/linux/slurm/resource_limits.html *MaxJobs = The total number of jobs able to run at any given time from this association. If this limit is reached new jobs will be queued but only allowed to run after previous jobs complete from this association.* 2015-01-14 11:14 GMT+02

[slurm-dev] Restricting number of jobs per user in partition

2015-01-14 Thread Loris Bennett
Hi, I have a test partition in which I would like to be able to restrict the maximum number of jobs a user may have running concurrently. Is this possible? Cheers, Loris -- This signature is currently under construction.