On Tue, Nov 06, 2012 at 12:52:04PM -0700, Danny Auble wrote: > > This is at least partially fixed in 2.5. > > https://github.com/SchedMD/slurm/commit/177f85e7f7695eea6336658ee89b69ce5cc0f839 > > The same kind of thing could be done for GrpWall. I am guessing it is > the same issue. The patch should work with 2.4 if you didn't want to > wait for 2.5.
Thanks Danny, that looks just like what I was shooting for. Will try it out. Thanks! Paddy > > Danny > > On 11/06/2012 10:35 AM, Paddy Doyle wrote: > > Hi again, > > > > I'd just like to raise the issue of GrpCPUMins and GrpWall causing running > > jobs > > to be killed, when limits are reached. > > > > I personally think this is a bit heavy-handed.. > > > > I would prefer the system to prevent the job from being started, rather than > > killing a running job. > > > > This obviously would require (much) more logic at the job launch stage > > to calculate requested time * allocated cpus, and check if that added to > > the current usage would bring it over the limit. If you take into account > > multiple users in an assocation submitting multiple jobs, I appreciate that > > this is a non-trivial issue. It has shades of GOLD pre-allocation of time, > > of > > which I don't have fond memories! > > > > > > Perhaps a compromise might be an additional slurm.conf boolean value, > > something > > like: > > > > AccountingStorageEnforceAllowFinish=true > > > > (that's a terrible name!) > > > > It could default to false, to preserve the current behaviour, but if set to > > true, it would allow running jobs to finish, even if they run over the > > limit. > > > > That way it's less cruel to users, but they still end up going over the > > limit, > > and it affects their future jobs, rather than their currently running jobs. > > Sure, a user could end up having multiple jobs go over the limit, but > > eventually > > they won't be able to run. > > > > To implement this, you'd need additional slurm.conf parsing logic, and then > > in > > the src/slurmctld/job_mgr.c:job_time_limit() function you'd have an > > additional > > boolean check in each of the usage checks, similar to my previously proposed > > patch. > > > > > > Any thoughts / comments? > > > > Thanks, > > Paddy > > > -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/