Re: [slurm-users] sacct end time for failed jobs

2019-03-05 Thread Chris Samuel
On Tuesday, 5 March 2019 10:07:30 AM PST Brian Andrus wrote: > Does anyone have a process they use to handle empty (aka "Unknown") end > times for jobs that are not running? What does: sacctmgr list runawayjobs say? -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] How to enable QOS correctly?

2019-03-05 Thread Chris Samuel
On Tuesday, 5 March 2019 10:09:42 AM PST Matthew BETTINGER wrote: > Everyone's qos is 'normal' so, > sacctmgr show associations format=account,user,qos That's listing the QOS they have access to, but you should check that their defaultqos is also set to "normal" too. Though if it wasn't then I

Re: [slurm-users] How to enable QOS correctly?

2019-03-05 Thread Matthew BETTINGER
Hey there Everyone's qos is 'normal' so, sacctmgr show associations format=account,user,qos Account User QOS -- -- .. dgstaff j0458951 normal ep l0525429 normal ... We tried t

Re: [slurm-users] sacct end time for failed jobs

2019-03-05 Thread Brian Andrus
Hmm. I have it as an issue as well as several jobs that are in the db without an end time, even though they are not running. Not sure how that happened, but I do want to find a good way to clean it up. Without and end time, sacct reports the jobs as if they continue to run and the total elapsed tim

Re: [slurm-users] How to enable QOS correctly?

2019-03-05 Thread Matthew BETTINGER
So here is a default partition PartitionName=BDW AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL AllocNodes=ALL Default=YES QoS=N/A DefaultTime=01:00:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=1-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNL

Re: [slurm-users] Slurm message aggregation

2019-03-05 Thread Christopher Samuel
On 3/5/19 6:58 AM, Paul Edmon wrote: We tried it once back when they first introduced it and shelved it after we found that we didn't really need it. Thanks Paul. -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] How to enable QOS correctly?

2019-03-05 Thread Christopher Samuel
On 3/5/19 7:37 AM, Matthew BETTINGER wrote: Every time we attempt this no one can submit a job, slurm says waiting on resources I believe. We have accounting enabled and everyone is a member of the default qos group "normal". Is it also their default QOS? Do you still have the slurmctld l

Re: [slurm-users] How to enable QOS correctly?

2019-03-05 Thread Michael Gutteridge
Hi It might be useful to see the configuration of the partition and how the QOS is set up... but at first blush I suspect you may need to set OverPartQOS (https://slurm.schedmd.com/resource_limits.html) to get the QOS limit to take precedence over the limit in the partition. However, the "reason"

[slurm-users] How to enable QOS correctly?

2019-03-05 Thread Matthew BETTINGER
Hey slurm gurus. We have been trying to enable slurm QOS on a cray system here off and on for quite a while but can never get it working. Every time we try to enable QOS we disrupt the cluster and users and have to fall back. I'm not sure what we are doing wrong. We run a pretty open system

Re: [slurm-users] Slurm message aggregation

2019-03-05 Thread Paul Edmon
We tried it once back when they first introduced it and shelved it after we found that we didn't really need it. -Paul Edmon- On 3/4/19 2:26 PM, Christopher Samuel wrote: Hi folks, Anyone here tried Slurm's message aggregation (MsgAggregationParams in slurm.conf) at all? All the best, Chri