On Tue, Nov 06, 2012 at 12:52:04PM -0700, Danny Auble wrote:

> 
> This is at least partially fixed in 2.5.
> 
> https://github.com/SchedMD/slurm/commit/177f85e7f7695eea6336658ee89b69ce5cc0f839
> 
> The same kind of thing could be done for GrpWall.  I am guessing it is 
> the same issue.  The patch should work with 2.4 if you didn't want to 
> wait for 2.5.

Thanks Danny, that looks just like what I was shooting for. Will try it out.

Thanks!

Paddy

> 
> Danny
> 
> On 11/06/2012 10:35 AM, Paddy Doyle wrote:
> > Hi again,
> >
> > I'd just like to raise the issue of GrpCPUMins and GrpWall causing running 
> > jobs
> > to be killed, when limits are reached.
> >
> > I personally think this is a bit heavy-handed..
> >
> > I would prefer the system to prevent the job from being started, rather than
> > killing a running job.
> >
> > This obviously would require (much) more logic at the job launch stage
> > to calculate requested time * allocated cpus, and check if that added to
> > the current usage would bring it over the limit. If you take into account
> > multiple users in an assocation submitting multiple jobs, I appreciate that
> > this is a non-trivial issue. It has shades of GOLD pre-allocation of time, 
> > of
> > which I don't have fond memories!
> >
> >
> > Perhaps a compromise might be an additional slurm.conf boolean value, 
> > something
> > like:
> >
> > AccountingStorageEnforceAllowFinish=true
> >
> > (that's a terrible name!)
> >
> > It could default to false, to preserve the current behaviour, but if set to
> > true, it would allow running jobs to finish, even if they run over the 
> > limit.
> >
> > That way it's less cruel to users, but they still end up going over the 
> > limit,
> > and it affects their future jobs, rather than their currently running jobs.
> > Sure, a user could end up having multiple jobs go over the limit, but 
> > eventually
> > they won't be able to run.
> >
> > To implement this, you'd need additional slurm.conf parsing logic, and then 
> > in
> > the src/slurmctld/job_mgr.c:job_time_limit() function you'd have an 
> > additional
> > boolean check in each of the usage checks, similar to my previously proposed
> > patch.
> >
> >
> > Any thoughts / comments?
> >
> > Thanks,
> > Paddy
> >
> 

-- 
Paddy Doyle
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
Phone: +353-1-896-3725
http://www.tchpc.tcd.ie/

Reply via email to