Bill,

I may be wrong (corrections welcomed), but I'm pretty sure you'll have to use a database query. My understanding is that the decayed usage is stored as a single usage_raw value per association (https://github.com/SchedMD/slurm/blob/f8025c1484838ecbe3e690fa565452d990123361/src/plugins/priority/multifactor/priority_multifactor.c#L1119). There is no history of any kind.

You would have to do a fairly complex query to get an accurate representation or write some code to recreate the way Slurm does it. If you look at _apply_decay() and _apply_new_usage() in src/plugins/priority/multifactor/priority_multifactor.c, you can see all that happens. Basically, once per decay thread iteration each association's usage_raw and the job's cputime for that time period is calculated and decayed accordingly. This can happen many, many times over the length of a job. If a job terminates before reaching its timelimit, the remaining allocated cputime is immediately added all at the same time (https://github.com/SchedMD/slurm/blob/f8025c1484838ecbe3e690fa565452d990123361/src/plugins/priority/multifactor/priority_multifactor.c#L1036).

Those are some of the issues that you may run into while creating a database tool for this.

I could be mistaken on some of the details but that is my understanding of the code (we looked recently for an unrelated reason).

Ryan

On 07/14/2014 02:15 PM, Bill Wichser wrote:

Is there any way to get a better view of fairshare than the "sshare" command?

Under PBS, there was the diagnose -f command which showed the breakdown per set time period which calculated this value. What was nice about this was I could point a group to this command, or cut and paste, showing that you have been using 20% over the last 30 days even though you haven't run anything in the last three days.

It's a much more difficult problem when asked now. I have no tool which shows the value, and decay, over the time. So I'm wondering if anyone has a method to demonstrate that, yes, this fairshare value is correct and here is why. Or do I just need to figure out a database query to cull this information?

Thanks,
Bill

--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University

Reply via email to