Howdy.
I am using my own load formula "cores_in_use" with the following scheduler
settings:
# qconf -ssconf
algorithm default
schedule_interval 0:0:15
maxujobs 0
queue_sort_method seqno
job_load_adjustments cores_in_use=1
load_adjustment_decay_time 00:02:00
load_formula -cores_in_use
schedd_job_info true
flush_submit_sec 5
flush_finish_sec 5
# qconf -sconf | fgrep load_report_time
load_report_time 00:00:20
My complex for "cores_in_use" is set to:
$ qconf -sc | grep cores_in_use
cores_in_use cu INT <= YES YES 0 1000
The "cores_in_use" is populated by a script that counts all cores in use ( serial
& parallel ) on the node. I do this to pack jobs unto nodes before going to next node.
When I submit a job and I wait long enough ( about 30-40 seconds) for the nodes to report
back their "cores_in_use", all work great - jobs are packed correctly.
If I submit a job and wait only a few seconds, like 8 seconds where the nodes have not had enough time to report back their "cores_in_use", my jobs don't pack ( which is expected of course since
cores_in_use has not yet been updated correctly ).
So now enters the GE scheduler option "job_load_adjustment" and
"job_load_adjustments" to artificially correct this. I have the scheduler set with:
job_load_adjustments cores_in_use=1
load_adjustment_decay_time 00:02:00
which from my understanding should artificially set "cores_in_use" immediately to 1 for the next 2 minutes so that if I then submit another job a few seconds later, "cores_is_use" is already set to 1
core.
But this is not working. I don't know the syntax well enough to know if I am
doing this correctly? Do I have my complex setup wrong? Anything else
missing?
A note: I do know that submitted parallel jobs will only count as 1 for the
"cores_in_use=1" until my script corrects the cores_in_use count and this is ok.
Thanks,
Joseph
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users