thanksBut, I cannot see the attachment in the email. Would you please send me 
again ? and also copy to another my email:tomviewisu@yahoo.comthanksOct. 25 2010
From: dtustud...@hotmail.com
To: ash...@pittman.co.uk
Subject: RE: [OMPI users] Open MPI program cannot complete
List-Post: users@lists.open-mpi.org
Date: Mon, 25 Oct 2010 16:53:32 -0600












thanks
But, I cannot see the attachment in the email. 

Would you please send me again ? 
and also copy to another my email:
tomview...@yahoo.com
thanks
Oct. 25 2010

> Subject: Re: [OMPI users] Open MPI program cannot complete
> From: ash...@pittman.co.uk
> Date: Mon, 25 Oct 2010 23:41:32 +0100
> To: dtustud...@hotmail.com
> 
> 
> Thanks, that's tells me a lot.
> 
> Try the attached padb, I've added the patch for you and remove the -w option. 
>  Can you run it and send me back the output please.
> 
> Ashley.
> 
> On 25 Oct 2010, at 23:29, Jack Bryan wrote:
> 
> > Thanks
> > 
> > Here is the 
> > 
> > -bash-3.2$ qstat -fB
> > Server: clusterName
> >     server_state = Active
> >     scheduling = True
> >     total_jobs = 26
> >     state_count = Transit:0 Queued:7 Held:0 Waiting:0 Running:18 Exiting:0
> >     acl_hosts = clustername
> >     default_queue = normal
> >     log_events = 511
> >     mail_from = adm
> >     query_other_jobs = True
> >     resources_assigned.nodect = 246
> >     scheduler_iteration = 600
> >     node_check_rate = 150
> >     tcp_timeout = 6
> >     mom_job_sync = True
> >     pbs_version = 2.4.2
> >     keep_completed = 300
> >     submit_hosts = clusterName
> >     next_job_number = 48293
> >     net_counter = 2 9 6
> > 
> > -bash-3.2$ qstat -w -n
> > qstat: invalid option -- w
> > 
> > 
> > Which line should I put the 
> > -----------------------------------------------------
> > --- padb (revision 401)
> > +++ padb (working copy)
> > @@ -2824,6 +2824,7 @@
> > foreach my $server (@servers) {
> > pbs_get_lqsub( $user, $server ); # get job list by qsub
> > }
> > + print Dumper \%pbs_tabjobs;
> > return \%pbs_tabjobs;
> > }
> > ----------------------------------------
> > 
> > in the bin file   padb
> > 
> > Any help is appreciated.
> > 
> > thanks
> > 
> > Jack 
> > 
> > Oct. 25 2010
> > 
> > 
> > 
> > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > > From: ash...@pittman.co.uk
> > > Date: Mon, 25 Oct 2010 22:54:21 +0100
> > > To: dtustud...@hotmail.com
> > > 
> > > 
> > > [off list]
> > > 
> > > The PBS support was added by a third-party so I've not used it in anger 
> > > myself, it appears you are doing the correct thing as far as I can tell.
> > > 
> > > Can you send me the output of the following two commands and also apply 
> > > the patch below to padb (you can do this just in the bin dir - it's a 
> > > perl script) and send me the output when you run that as well?
> > > 
> > > qstat -fB
> > > qstat -w -n
> > > 
> > > --- padb (revision 401)
> > > +++ padb (working copy)
> > > @@ -2824,6 +2824,7 @@
> > > foreach my $server (@servers) {
> > > pbs_get_lqsub( $user, $server ); # get job list by qsub
> > > }
> > > + print Dumper \%pbs_tabjobs;
> > > return \%pbs_tabjobs;
> > > }
> > > 
> > > On 25 Oct 2010, at 22:30, Jack Bryan wrote:
> > > 
> > > > Thanks
> > > > 
> > > > I have downloaded 
> > > > http://padb.googlecode.com/files/padb-3.2-beta1.tar.gz
> > > > 
> > > > and followed the instructions of INSTALL file and installed it at 
> > > > /mypath/padb32 
> > > > 
> > > > But, I got:
> > > > 
> > > > -bash-3.2$ padb -Ormgr=pbs -Q 48279.cluster
> > > > Job 48279.cluster is not active
> > > > 
> > > > Actually, the job was running. 
> > > > 
> > > > I have installed 
> > > > bin at 
> > > > 
> > > > /mypath/padb32/bin
> > > > 
> > > > 
> > > > libexec at
> > > > /lustre/jxding/padb32/libexec
> > > > 
> > > > When I installed it, I used 
> > > > 
> > > > ./configure --prefix=/mypath/padb32
> > > > 
> > > > I got 
> > > > -----------------------------
> > > > 
> > > > checking for a BSD-compatible install... /usr/bin/install -c
> > > > checking whether build environment is sane... yes
> > > > checking for a thread-safe mkdir -p... /bin/mkdir -p
> > > > checking for gawk... gawk
> > > > checking whether make sets $(MAKE)... yes
> > > > checking for gcc... gcc
> > > > checking whether the C compiler works... yes
> > > > checking for C compiler default output file name... a.out
> > > > checking for suffix of executables...
> > > > checking whether we are cross compiling... no
> > > > checking for suffix of object files... o
> > > > checking whether we are using the GNU C compiler... yes
> > > > checking whether gcc accepts -g... yes
> > > > checking for gcc option to accept ISO C89... none needed
> > > > checking for style of include used by make... GNU
> > > > checking dependency style of gcc... gcc3
> > > > checking whether gcc and cc understand -c and -o together... yes
> > > > configure: creating ./config.status
> > > > config.status: creating Makefile
> > > > config.status: creating src/Makefile
> > > > config.status: executing depfiles commands
> > > > 
> > > > -------------------------------
> > > > 
> > > > -bash-3.2$ make
> > > > Making all in src
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" 
> > > > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" 
> > > > -DPACKAGE=\"padb\" -DVERSION=\"3.2-beta1\" -I. -Wall -g -O2 -MT 
> > > > minfo-minfo.o -MD -MP -MF .deps/minfo-minfo.Tpo -c -o minfo-minfo.o 
> > > > `test -f 'minfo.c' || echo './'`minfo.c
> > > > minfo.c: In function âfind_symâ:
> > > > minfo.c:158: warning: dereferencing type-punned pointer will break 
> > > > strict-aliasing rules
> > > > minfo.c: In function âmainâ:
> > > > minfo.c:649: warning: type-punning to incomplete type might break 
> > > > strict-aliasing rules
> > > > minfo.c:650: warning: type-punning to incomplete type might break 
> > > > strict-aliasing rules
> > > > mv -f .deps/minfo-minfo.Tpo .deps/minfo-minfo.Po
> > > > gcc -Wall -g -O2 -ldl -o minfo minfo-minfo.o
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[1]: Nothing to be done for `all-am'.
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1'
> > > > -------------------------------------------------
> > > > 
> > > > -bash-3.2$ make install
> > > > Making install in src
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[2]: Entering directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > test -z "/lustre/jxding/padb32/bin" || /bin/mkdir -p 
> > > > "/mypath/padb32/bin"
> > > > /usr/bin/install -c padb '/lustre/jxding/padb32/bin'
> > > > test -z "/lustre/jxding/padb32/libexec" || /bin/mkdir -p 
> > > > "/mypath/padb32/libexec"
> > > > /usr/bin/install -c minfo '/lustre/jxding/padb32/libexec'
> > > > make[2]: Nothing to be done for `install-data-am'.
> > > > make[2]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[2]: Entering directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[2]: Nothing to be done for `install-exec-am'.
> > > > make[2]: Nothing to be done for `install-data-am'.
> > > > make[2]: Leaving directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1'
> > > > -bash-3.2$ make installcheck
> > > > Making installcheck in src
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Nothing to be done for `installcheck'.
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src'
> > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1'
> > > > make[1]: Nothing to be done for `installcheck-am'.
> > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1'
> > > > --------------------------------------------------
> > > > 
> > > > Are there something wrong with what I have done ?
> > > > 
> > > > Any help is appreciated. 
> > > > 
> > > > thanks
> > > > 
> > > > Jack
> > > > 
> > > > Oct. 25 2010
> > > > 
> > > > 
> > > > > From: ash...@pittman.co.uk
> > > > > Date: Mon, 25 Oct 2010 20:40:18 +0100
> > > > > To: us...@open-mpi.org
> > > > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > > > > 
> > > > > 
> > > > > On 25 Oct 2010, at 20:18, Jack Bryan wrote:
> > > > > 
> > > > > > Thanks
> > > > > > I have downloaded 
> > > > > > http://padb.googlecode.com/files/padb-3.0.tgz
> > > > > > 
> > > > > > and compile it.
> > > > > > 
> > > > > > But, no user manual, I can not use it by padb -aQ.
> > > > > 
> > > > > The -a flag is a shortcut to all jobs, if you are providing a jobid 
> > > > > (which is normally numeric) then don't set the -a flag.
> > > > > 
> > > > > > Do you have use manual about how to use it ? 
> > > > > 
> > > > > In my previous mail I was assuming you were using orte to launch the 
> > > > > jobs but if you are using PBS then you'll need to use the 3.2 beta as 
> > > > > the PBS code is new, alternatively you could find the host where the 
> > > > > PBS script itself runs and check of the "ompi-ps" command gives you 
> > > > > any output, if it does then you could run it from there giving it the 
> > > > > orte jobid.
> > > > > 
> > > > > A bit of background about resource managers (in which I'm including 
> > > > > orte and PBS), padb supports many resource managers and tries to 
> > > > > automatically detect which ones you have installed on your system. If 
> > > > > you don't specify one then it'll see what is installed, if there is 
> > > > > more than one resource manager installed then it'll see which of them 
> > > > > claim to have active jobs - if only one resource manager meets this 
> > > > > criteria then it'll pick that one - hence 99% of the time it should 
> > > > > just work. If more than one resource manager claims to have active 
> > > > > jobs then padb will refuse to run but ask the user to specify one 
> > > > > explicitly.
> > > > > 
> > > > > You should try the following in order once you have 3.2 installed.
> > > > > 
> > > > > padb -Ormgr=pbs -Q <myjob>
> > > > > 
> > > > > Or - find the node where the PBS script is being executed, check that 
> > > > > the ompi-ps command is returning the jobid and then run
> > > > > 
> > > > > padb -Ormgr=orte -Q <openmpi_jobid>
> > > > > 
> > > > > Ashley,
> > > > > 
> > > > > -- 
> > > > > 
> > > > > Ashley Pittman, Bath, UK.
> > > > > 
> > > > > Padb - A parallel job inspection tool for cluster computing
> > > > > http://padb.pittman.org.uk
> > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > users mailing list
> > > > > us...@open-mpi.org
> > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > > _______________________________________________
> > > > users mailing list
> > > > us...@open-mpi.org
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > 
> > > -- 
> > > 
> > > Ashley Pittman, Bath, UK.
> > > 
> > > Padb - A parallel job inspection tool for cluster computing
> > > http://padb.pittman.org.uk
> > > 
> 
> -- 
> 
> Ashley Pittman, Bath, UK.
> 
> Padb - A parallel job inspection tool for cluster computing
> http://padb.pittman.org.uk
> 
                                          

Reply via email to