On Apr 13, 2011, at 10:29 AM, Jack Bryan wrote: > Hi , > > If I cannot ssh to a worker node, it means that my program cannot work > correctly ?
No, that's not true. People thought you were on a cluster using ssh as the launcher. From prior notes, you were using Torque, so not being allowed to ssh is just an admin thing. > > I can run it on 32 nodes *4 cores/node parallel processes. But, for larger > parallel processes, > 128 nodes * 1 cpu/node, it is killed by signal 9. > > Is this a reason ? No, it isn't > > thanks > > > Date: Wed, 13 Apr 2011 05:59:10 -0700 > > From: n...@aol.com > > To: us...@open-mpi.org > > Subject: Re: [OMPI users] OMPI monitor each process behavior > > > > On 4/12/2011 8:55 PM, Jack Bryan wrote: > > > > > > > > I need to monitor the memory usage of each parallel process on a linux > > > Open MPI cluster. > > > > > > But, top, ps command cannot help here because they only show the head > > > node information. > > > > > > I need to follow the behavior of each process on each cluster node. > > Did you consider ganglia et al? > > > > > > I cannot use ssh to access each node. > > How can MPI run? > > > > > > The program takes 8 hours to finish. > > > > > > > > -- > > Tim Prince > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users