I'm running slurm on a Cray XE6m, and a relatively small one at that...
with fair-share enabled. Today we had an episode where a single user was
effectively snagging the entire machine with small, short (wallclock) jobs
while all other users had to wait.

He's using array control (--array=1-21), requesting 4 nodes, and generally
a 1-hr wall clock limit. And occupying a significant portion of the machine.

I've already restricted him (sacctmgr) to maxjobs=16, maxsubmit=32. If I
understand the parameters correctly, that should reduce the number of jobs,
and thus, nodes, he's consuming at a given time.

I'm considering telling him to restrict the number of simultaneous array
members to n=4, and reducing his maxnodes to 1, which will result in a
slower run, but should still be managable.

Are there other options/parameters I'm missing here, in configuring slurm
for fair-share? It doesn't seem like it was a very fair distribution of
resources...

Thanks
gerry
-- 
Gerry Creager
NSSL/CIMMS
405.325.6371
++++++++++++++++++++++
“Big whorls have little whorls,
That feed on their velocity;
And little whorls have lesser whorls,
And so on to viscosity.”
Lewis Fry Richardson (1881-1953)

Reply via email to