Miguel, While you make a good argument to base the job size factor on the number of active nodes, this could confuse users. You may get users who wonder why their priority is too low and look to you to explain how their job priority was calculated. It is easier to say that that the job size component is the number of nodes their job requested / the (fixed) number of nodes in the cluster.
As the number of active nodes varies over time, you could have a 10 node job receive a job size factor of 10/68, and an instant later have the two bad nodes come back up. Assuming a second, 10 node job is submitted, it would receive a job size factor of 10/70. The user of the second job could complain that that is not fair. Don > -----Original Message----- > From: Moe Jette [mailto:[email protected]] > Sent: Wednesday, August 22, 2012 7:59 AM > To: slurm-dev > Subject: [slurm-dev] Re: Issue with Job Size Factor of Multifactor plugin > > > Quoting Miguel Méndez <[email protected]>: > > > Hi, > > > > Job Size Factor in Multifactor Priority Plugin gets its value considering > > relative job size, and this size is relative to "node_record_count". The > > problems I see with this are two: > > > > - "node_record_count" includes my login node, which is never going to be > > used to run jobs. I would solve this by just substracting one to this value. > > Only compute nodes are needed in your node list. The login node would > generally not be included. > > > > - "node_record_count" includes all existing nodes in the cluster, doesn't > > matter if they are down. I think Job Size priority should be relative to > > the maximun size of a job that could be run if there were no other jobs > > running in the cluster. So if I have a 70 node cluster, with 2 nodes down, > > and a 10 node job, priority for this job should be 10/68, not 10/70. > > > > What would be the easiest way of getting the number of allocated or idle > > nodes? I have been trough slurmctld and sinfo code, but I understand they > > use loops for this, and I would prefer not having to do this every time I > > recalculate priorities. > > bit_set_count(avail_node_bitmap) will give you the count of nodes up > and available very quickly in the slurmctld daemon. > > > > Thanks, > > > > Miguel > >
