Howdy.

I am using my own load formula "cores_in_use" with the following scheduler 
settings:

# qconf -ssconf
algorithm                         default
schedule_interval                 0:0:15
maxujobs                          0
queue_sort_method                 seqno
job_load_adjustments              cores_in_use=1
load_adjustment_decay_time        00:02:00
load_formula                      -cores_in_use
schedd_job_info                   true
flush_submit_sec                  5
flush_finish_sec                  5

# qconf -sconf  |  fgrep load_report_time
load_report_time             00:00:20

My complex for "cores_in_use" is set to:
$ qconf -sc | grep cores_in_use
cores_in_use        cu         INT       <=    YES YES        0        1000

The "cores_in_use" is populated by a script that counts all cores in use ( serial 
& parallel ) on the node.   I do this to pack jobs unto nodes before going to next node.

When I submit a job and I wait long enough ( about 30-40 seconds) for the nodes to report 
back their "cores_in_use", all work great - jobs are packed correctly.

If I submit a job and wait only a few seconds, like 8 seconds where the nodes have not had enough time to report back their "cores_in_use", my jobs don't pack ( which is expected of course since cores_in_use has not yet been updated correctly ).

So now enters the GE scheduler option "job_load_adjustment" and 
"job_load_adjustments" to artificially correct this.     I have the scheduler set with:

    job_load_adjustments                 cores_in_use=1
    load_adjustment_decay_time    00:02:00

which from my understanding should artificially set "cores_in_use" immediately to 1 for the next 2 minutes so that if I then submit another job a few seconds later, "cores_is_use" is already set to 1 core.

But this is not working.    I don't know the syntax well enough to know if I am 
doing this correctly?    Do I have my complex setup wrong?   Anything else 
missing?

A note:   I do know that submitted parallel jobs will only count as 1 for the 
"cores_in_use=1" until my script corrects the cores_in_use count and this is ok.

Thanks,
Joseph










_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to