Only rank 0 of the job is suspended if I recall correctly - it was designed specifically because not all parallel jobs are able to handle suspend/restart correctly - for example you can get TCP timeouts and things like those.
Rayson On Mon, Jun 11, 2012 at 3:53 PM, Joseph Farran <[email protected]> wrote: > Hi. > > With the help of this group, I've been able to make good progress on setting > up OGE 2011.11 with our cluster. > > I am testing the Suspend & Resume features and it works great for serial > jobs but not able to get Parallel jobs suspended. > > I created a simple Parallel Environment (PE) called mpi and I submitted a > NAMD job to it and it runs just fine. I then tried suspending it using > qmon 'suspend' button and it says that it suspended the job and qstat also > confirms that job is suspended with the 's' flag, however looking at the > nodes on which NAMD is running, NAMD continues to run. > > What am I missing with respect to being able to suspend PE jobs since it > works for serial jobs? > > Joseph > > > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
