I'm running slurm on a Cray XE6m, and a relatively small one at that... with fair-share enabled. Today we had an episode where a single user was effectively snagging the entire machine with small, short (wallclock) jobs while all other users had to wait.
He's using array control (--array=1-21), requesting 4 nodes, and generally a 1-hr wall clock limit. And occupying a significant portion of the machine. I've already restricted him (sacctmgr) to maxjobs=16, maxsubmit=32. If I understand the parameters correctly, that should reduce the number of jobs, and thus, nodes, he's consuming at a given time. I'm considering telling him to restrict the number of simultaneous array members to n=4, and reducing his maxnodes to 1, which will result in a slower run, but should still be managable. Are there other options/parameters I'm missing here, in configuring slurm for fair-share? It doesn't seem like it was a very fair distribution of resources... Thanks gerry -- Gerry Creager NSSL/CIMMS 405.325.6371 ++++++++++++++++++++++ “Big whorls have little whorls, That feed on their velocity; And little whorls have lesser whorls, And so on to viscosity.” Lewis Fry Richardson (1881-1953)
