Your job may be queued, not executing, because there are no
resources available, all nodes are busy.
Try qstat -a.

Posting a code snippet with all your MPI calls may prove effective.
You might get a trove of advice for a thrift of effort.

Jeff Squyres wrote:
Check the man page for qsub for proper use.


On Oct 25, 2010, at 1:49 PM, Jack Bryan wrote:

thanks

I use qsub -I nsga2_job.sh
        qsub: waiting for job 48270.clusterName to start

By qstat
I found the job name is none and no results show up. No shell prompt appear, the command line is hang there , no response. Any help is appreciated.
Thanks

Jack
Oct. 25 2010

From: jsquy...@cisco.com
Date: Mon, 25 Oct 2010 13:39:30 -0400
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

Can you use the interactive mode of PBS to get 5 cores on 1 node? IIRC, "qsub -I 
..." ?

Then you get a shell prompt with your allocated cores and can run stuff 
interactively. I don't know if your site allows this, but interactive debugging 
here might be *significantly* easier than try to automate some debugging.


On Oct 25, 2010, at 1:35 PM, Jack Bryan wrote:

thanks

I have to use #PBS to submit any jobs in my cluster. I cannot use command line to hang a job on my cluster. this is my script: --------------------------------------
#!/bin/bash
#PBS -N jobname
#PBS -l walltime=00:08:00,nodes=1
#PBS -q queuename
COMMAND=/mypath/myprog
NCORES=5

cd $PBS_O_WORKDIR
NODES=`cat $PBS_NODEFILE | wc -l`
NPROC=$(( $NCORES * $NODES ))

mpirun -np $NPROC --mca btl self,sm,openib $COMMAND

-------------------------------------------

Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ? And how to get ZOMBIE_PID from the script ? Any help is appreciated.
thanks

Oct. 25 2010

Date: Mon, 25 Oct 2010 19:24:35 +0200
From: j...@59a2.org
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

On Mon, Oct 25, 2010 at 19:07, Jack Bryan <dtustud...@hotmail.com> wrote:
I need to use #PBS parallel job script to submit a job on MPI cluster.
Is it not possible to reproduce locally? Most clusters have a way to submit an 
interactive job (which would let you start this thing and then inspect 
individual processes). Ashley's Padb suggestion will certainly be better in a 
non-interactive environment.

Where should I put the (gdb --batch -ex 'bt full' -ex 'info reg' -pid ZOMBIE_PID) in the script ?
Is control returning to your script after rank 0 has exited? In that case, you 
can just put this on the next line.

How to get the ZOMBIE_PID ?
"ps" from the command line, or getpid() from C code.

Jed

_______________________________________________ users mailing list 
us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to