Thanks!

That looks like something that could be useful indeed. We are for the moment stuck with version 2.2.7, though, and if I understood the docs correctly, most of the partition based parameters are of later date and versions. We might upgrade in some future, though.

It also seems like the scheduling problems are not directly related to the setup of the partitions, but rather to some odd behaviour on the part of SLURM. I'll write a separate post about that, as it seems it might be a bug.

Cheers, and thanks again,
    Mikael J.
    http://www.iki.fi/~mpjohans/


On Mon, 20 Oct 2014, Paul Edmon wrote:

I advise using the following SchedulerParameters, partition_job_depth, and bf_max_job_part. This will force the scheduler to schedule jobs for each partition. Otherwise it will take a strictly top down approach.


This is what we run:

#  default_queue_depth should be some multiple of the partition_job_depth,
#  ideally number_of_partitions * partition_job_depth.
SchedulerParameters=default_queue_depth=5700,partition_job_depth=100,bf_interval=1,bf_continue,bf_window=2880,bf_resolution=3600,bf_max_job_test=50000,bf_max_job_part=50000,bf_max_job_user=1,bf_max_job_start=100,max_rpc_cnt=8

These parameters work well for a cluster of 50,000 cores, 57 queues, and about 40,000 jobs per day. We are running 14.03.8

-Paul Edmon-

Reply via email to