Hi William.

It is a large job array witch each task ending in bio or free64 queue.

I think the issue is that a task that *starts* on free64 queue picks up the h_rt limit and when it is restarted and lands on bio queue which has no time-limit, it get's killed because the original h_rt sticks.

Not sure if this answers your question?

Joseph


On 05/29/2015 05:12 AM, William Hay wrote:
On Thu, 28 May 2015 19:27:07 +0000
Joseph Farran <[email protected]> wrote:

Hi all.

I am not sure if this is a bug or the way Grid Engine works.

We have several queues our users submit jobs to.    One of the queues
"free64" has a 3-day wall-clock limit:

$ qconf -sq free64 | grep "_rt"
s_rt                  72:00:00
h_rt                  72:05:00

While other queue "bio" does not:

$ qconf -sq bio | grep "_rt"
s_rt                  INFINITY
h_rt                  INFINITY

When a user submits a job to both queues  "-q free64,bio", jobs that
run longer than 3 days are killed whether they land on "free64" or
"bio" queue.    Why are jobs that land on the "bio" queue being
killed after 3 days?

Are you sure the whole job is in the bio queue? Might a slave task be
in the free64 queue?

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to