Hi,

Am 12.02.2016 um 20:34 schrieb Wagner, Justin:

> I’m running SoGE v8.1.0 and we notice from time to time that qstat doesn’t 
> always report all of the jobs that are currently executing.

Sometimes there is a small gap between the states "qw", "t" and "r". I would 
assume that the transfer between states is not atomic.


> Is this a known issue with qstat? Is there is a fix for it in newer versions 
> of the code?
>  
> Here is the context:
>  
> This causes a problem to systems we have that poll qstat to determine if jobs 
> have completed or not.  In fact I’ve never noticed the problem when running 
> qstat myself, but the problem seems to only present itself to applications 
> that are periodically polling qstat.

What "-hold_jid <wc_job_list>" help? It can also include jobnames and wildcards.


>  I know there is a workaround where you can query qacct via “qacct -j 
> job_number” to see if the job is done, so I don’t necessarily need any 
> suggested workarounds.

It could also mean that the job got reschduled (in this case you end up witht 
several entries in the accounting file for one and the same job), not to 
mention parallel jobs where you can get a bunch of entries (depending on the PE 
setting "accounting_summary").

What's the goal behind it? Do you want to start another job or start something 
external?

Besides looking into DRMAA, there is also a small tool `qevent` to trigger an 
external script in case a job/task finishes.

- Reuti


> I’m simply asking if this is a known issue with qstat, and if there is a fix 
> for it in newer versions of the code.
>  
> Thanks,
>  
> Justin
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to