Hi,

> Am 11.02.2015 um 19:28 schrieb Michael Stauffer <[email protected]>:
> 
> Hi,
> 
> Is there a way to easily query if a job is idle or otherwise stuck even 
> though a queue state says it's running? I've seen some old jobs that are 
> listed as running in the queue, but upon investigation on their compute node 
> there is no cpu activity associated with the processes, there are no error 
> messages in output files.

The used CPU time you can check by looking at the "usage" line in the `qstat -j 
<job_id>` output.

Any logic to have a safe indication whether a job is stuck in an infinity loop 
or still computing won't be easy to be implemented and will most likely depend 
on each particular application, whether there are any output or scratch files 
which can be checked too. But even then the same output may repeatedly being 
written thereto. 

We have even jobs which compute (apparently) fine, but only by manual 
investigation one can say that the computed values converge to a wrong state or 
are oscillating between states and won't stop ever.

-- Reuti

> I can devise a script to do this, but if there's already something for this 
> I'd just use that. Thanks.
> 
> -M
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to