Thanks for the clarification.
This is NAMD run, so I am launching it via "charmrun" and not mpirun.
If the OGE code suspend via rank 0, I would think that charmrun and/or any
other parallel job would suspend as well, no?
I will try an mpirun job next to see if it behaves differently and suspends
correctly or not.
Joseph
On 06/11/2012 01:32 PM, Rayson Ho wrote:
Clarify... rank 0 in the previous email = the parallel job launcher
(eg. mpirun) process - usually running on the rank 0 machine.
A few years ago, we added code to allow every process to get the
suspend signal (only for the tight-integration case), but Sun at that
time did not integrate it into the tree so we will need to start the
discussion again and see if it really is a good idea to suspend
parallel jobs.
Rayson
On Mon, Jun 11, 2012 at 4:21 PM, Rayson Ho<[email protected]> wrote:
Only rank 0 of the job is suspended if I recall correctly - it was
designed specifically because not all parallel jobs are able to handle
suspend/restart correctly - for example you can get TCP timeouts and
things like those.
Rayson
On Mon, Jun 11, 2012 at 3:53 PM, Joseph Farran<[email protected]> wrote:
Hi.
With the help of this group, I've been able to make good progress on setting
up OGE 2011.11 with our cluster.
I am testing the Suspend& Resume features and it works great for serial
jobs but not able to get Parallel jobs suspended.
I created a simple Parallel Environment (PE) called mpi and I submitted a
NAMD job to it and it runs just fine. I then tried suspending it using
qmon 'suspend' button and it says that it suspended the job and qstat also
confirms that job is suspended with the 's' flag, however looking at the
nodes on which NAMD is running, NAMD continues to run.
What am I missing with respect to being able to suspend PE jobs since it
works for serial jobs?
Joseph
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users