Only rank 0 of the job is suspended if I recall correctly - it was
designed specifically because not all parallel jobs are able to handle
suspend/restart correctly - for example you can get TCP timeouts and
things like those.

Rayson



On Mon, Jun 11, 2012 at 3:53 PM, Joseph Farran <[email protected]> wrote:
> Hi.
>
> With the help of this group, I've been able to make good progress on setting
> up OGE 2011.11 with our cluster.
>
> I am testing the Suspend & Resume features and it works great for serial
> jobs but not able to get Parallel jobs suspended.
>
> I created a simple Parallel Environment (PE) called mpi and I submitted a
> NAMD job to it and it runs just fine.    I then tried suspending it using
> qmon 'suspend' button and it says that it suspended the job and qstat also
> confirms that job is suspended with the 's' flag, however looking at the
> nodes on which NAMD is running, NAMD continues to run.
>
> What am I missing with respect to being able to suspend PE jobs since it
> works for serial jobs?
>
> Joseph
>
>
>
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to