[gridengine users] Using multiple queues inherits s_rt & h_rt

Joseph Farran Thu, 28 May 2015 12:30:44 -0700

Hi all.

I am not sure if this is a bug or the way Grid Engine works.

We have several queues our users submit jobs to. One of the queues"free64" has a 3-day wall-clock limit:


$ qconf -sq free64 | grep "_rt"
s_rt                  72:00:00
h_rt                  72:05:00

While other queue "bio" does not:

$ qconf -sq bio | grep "_rt"
s_rt                  INFINITY
h_rt                  INFINITY

When a user submits a job to both queues "-q free64,bio", jobs that runlonger than 3 days are killed whether they land on "free64" or "bio"queue. Why are jobs that land on the "bio" queue being killed after 3days?


The jobs are also using GE checkpoint restart:

$ qconf -sckpt restart
ckpt_name          restart
interface          USERDEFINED
ckpt_command       NONE
migr_command       NONE
restart_command    NONE
clean_command      none
ckpt_dir           $SGE_O_WORKDIR
signal             usr1
when               xsr

Is it that checkpoint restart the cause of this? I am guessing that ajob that landed first on free64 queue picked-up the 3-days wall-clocklimit and when it is restarted on the bio queue, it inherited thewall-clock 3-days limit from free64? If this is what is happening, isthis a bug? Is there a workaround?


Joseph
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

[gridengine users] Using multiple queues inherits s_rt & h_rt

Reply via email to