thanksBut, I cannot see the attachment in the email. Would you please send me again ? and also copy to another my email:tomviewisu@yahoo.comthanksOct. 25 2010 From: dtustud...@hotmail.com To: ash...@pittman.co.uk Subject: RE: [OMPI users] Open MPI program cannot complete List-Post: users@lists.open-mpi.org Date: Mon, 25 Oct 2010 16:53:32 -0600
thanks But, I cannot see the attachment in the email. Would you please send me again ? and also copy to another my email: tomview...@yahoo.com thanks Oct. 25 2010 > Subject: Re: [OMPI users] Open MPI program cannot complete > From: ash...@pittman.co.uk > Date: Mon, 25 Oct 2010 23:41:32 +0100 > To: dtustud...@hotmail.com > > > Thanks, that's tells me a lot. > > Try the attached padb, I've added the patch for you and remove the -w option. > Can you run it and send me back the output please. > > Ashley. > > On 25 Oct 2010, at 23:29, Jack Bryan wrote: > > > Thanks > > > > Here is the > > > > -bash-3.2$ qstat -fB > > Server: clusterName > > server_state = Active > > scheduling = True > > total_jobs = 26 > > state_count = Transit:0 Queued:7 Held:0 Waiting:0 Running:18 Exiting:0 > > acl_hosts = clustername > > default_queue = normal > > log_events = 511 > > mail_from = adm > > query_other_jobs = True > > resources_assigned.nodect = 246 > > scheduler_iteration = 600 > > node_check_rate = 150 > > tcp_timeout = 6 > > mom_job_sync = True > > pbs_version = 2.4.2 > > keep_completed = 300 > > submit_hosts = clusterName > > next_job_number = 48293 > > net_counter = 2 9 6 > > > > -bash-3.2$ qstat -w -n > > qstat: invalid option -- w > > > > > > Which line should I put the > > ----------------------------------------------------- > > --- padb (revision 401) > > +++ padb (working copy) > > @@ -2824,6 +2824,7 @@ > > foreach my $server (@servers) { > > pbs_get_lqsub( $user, $server ); # get job list by qsub > > } > > + print Dumper \%pbs_tabjobs; > > return \%pbs_tabjobs; > > } > > ---------------------------------------- > > > > in the bin file padb > > > > Any help is appreciated. > > > > thanks > > > > Jack > > > > Oct. 25 2010 > > > > > > > > > Subject: Re: [OMPI users] Open MPI program cannot complete > > > From: ash...@pittman.co.uk > > > Date: Mon, 25 Oct 2010 22:54:21 +0100 > > > To: dtustud...@hotmail.com > > > > > > > > > [off list] > > > > > > The PBS support was added by a third-party so I've not used it in anger > > > myself, it appears you are doing the correct thing as far as I can tell. > > > > > > Can you send me the output of the following two commands and also apply > > > the patch below to padb (you can do this just in the bin dir - it's a > > > perl script) and send me the output when you run that as well? > > > > > > qstat -fB > > > qstat -w -n > > > > > > --- padb (revision 401) > > > +++ padb (working copy) > > > @@ -2824,6 +2824,7 @@ > > > foreach my $server (@servers) { > > > pbs_get_lqsub( $user, $server ); # get job list by qsub > > > } > > > + print Dumper \%pbs_tabjobs; > > > return \%pbs_tabjobs; > > > } > > > > > > On 25 Oct 2010, at 22:30, Jack Bryan wrote: > > > > > > > Thanks > > > > > > > > I have downloaded > > > > http://padb.googlecode.com/files/padb-3.2-beta1.tar.gz > > > > > > > > and followed the instructions of INSTALL file and installed it at > > > > /mypath/padb32 > > > > > > > > But, I got: > > > > > > > > -bash-3.2$ padb -Ormgr=pbs -Q 48279.cluster > > > > Job 48279.cluster is not active > > > > > > > > Actually, the job was running. > > > > > > > > I have installed > > > > bin at > > > > > > > > /mypath/padb32/bin > > > > > > > > > > > > libexec at > > > > /lustre/jxding/padb32/libexec > > > > > > > > When I installed it, I used > > > > > > > > ./configure --prefix=/mypath/padb32 > > > > > > > > I got > > > > ----------------------------- > > > > > > > > checking for a BSD-compatible install... /usr/bin/install -c > > > > checking whether build environment is sane... yes > > > > checking for a thread-safe mkdir -p... /bin/mkdir -p > > > > checking for gawk... gawk > > > > checking whether make sets $(MAKE)... yes > > > > checking for gcc... gcc > > > > checking whether the C compiler works... yes > > > > checking for C compiler default output file name... a.out > > > > checking for suffix of executables... > > > > checking whether we are cross compiling... no > > > > checking for suffix of object files... o > > > > checking whether we are using the GNU C compiler... yes > > > > checking whether gcc accepts -g... yes > > > > checking for gcc option to accept ISO C89... none needed > > > > checking for style of include used by make... GNU > > > > checking dependency style of gcc... gcc3 > > > > checking whether gcc and cc understand -c and -o together... yes > > > > configure: creating ./config.status > > > > config.status: creating Makefile > > > > config.status: creating src/Makefile > > > > config.status: executing depfiles commands > > > > > > > > ------------------------------- > > > > > > > > -bash-3.2$ make > > > > Making all in src > > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src' > > > > gcc -DPACKAGE_NAME=\"\" -DPACKAGE_TARNAME=\"\" -DPACKAGE_VERSION=\"\" > > > > -DPACKAGE_STRING=\"\" -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" > > > > -DPACKAGE=\"padb\" -DVERSION=\"3.2-beta1\" -I. -Wall -g -O2 -MT > > > > minfo-minfo.o -MD -MP -MF .deps/minfo-minfo.Tpo -c -o minfo-minfo.o > > > > `test -f 'minfo.c' || echo './'`minfo.c > > > > minfo.c: In function âfind_symâ: > > > > minfo.c:158: warning: dereferencing type-punned pointer will break > > > > strict-aliasing rules > > > > minfo.c: In function âmainâ: > > > > minfo.c:649: warning: type-punning to incomplete type might break > > > > strict-aliasing rules > > > > minfo.c:650: warning: type-punning to incomplete type might break > > > > strict-aliasing rules > > > > mv -f .deps/minfo-minfo.Tpo .deps/minfo-minfo.Po > > > > gcc -Wall -g -O2 -ldl -o minfo minfo-minfo.o > > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src' > > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1' > > > > make[1]: Nothing to be done for `all-am'. > > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1' > > > > ------------------------------------------------- > > > > > > > > -bash-3.2$ make install > > > > Making install in src > > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src' > > > > make[2]: Entering directory `/mypath/padb32/padb-3.2-beta1/src' > > > > test -z "/lustre/jxding/padb32/bin" || /bin/mkdir -p > > > > "/mypath/padb32/bin" > > > > /usr/bin/install -c padb '/lustre/jxding/padb32/bin' > > > > test -z "/lustre/jxding/padb32/libexec" || /bin/mkdir -p > > > > "/mypath/padb32/libexec" > > > > /usr/bin/install -c minfo '/lustre/jxding/padb32/libexec' > > > > make[2]: Nothing to be done for `install-data-am'. > > > > make[2]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src' > > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src' > > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1' > > > > make[2]: Entering directory `/mypath/padb32/padb-3.2-beta1' > > > > make[2]: Nothing to be done for `install-exec-am'. > > > > make[2]: Nothing to be done for `install-data-am'. > > > > make[2]: Leaving directory `/mypath/padb32/padb-3.2-beta1' > > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1' > > > > -bash-3.2$ make installcheck > > > > Making installcheck in src > > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1/src' > > > > make[1]: Nothing to be done for `installcheck'. > > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1/src' > > > > make[1]: Entering directory `/mypath/padb32/padb-3.2-beta1' > > > > make[1]: Nothing to be done for `installcheck-am'. > > > > make[1]: Leaving directory `/mypath/padb32/padb-3.2-beta1' > > > > -------------------------------------------------- > > > > > > > > Are there something wrong with what I have done ? > > > > > > > > Any help is appreciated. > > > > > > > > thanks > > > > > > > > Jack > > > > > > > > Oct. 25 2010 > > > > > > > > > > > > > From: ash...@pittman.co.uk > > > > > Date: Mon, 25 Oct 2010 20:40:18 +0100 > > > > > To: us...@open-mpi.org > > > > > Subject: Re: [OMPI users] Open MPI program cannot complete > > > > > > > > > > > > > > > On 25 Oct 2010, at 20:18, Jack Bryan wrote: > > > > > > > > > > > Thanks > > > > > > I have downloaded > > > > > > http://padb.googlecode.com/files/padb-3.0.tgz > > > > > > > > > > > > and compile it. > > > > > > > > > > > > But, no user manual, I can not use it by padb -aQ. > > > > > > > > > > The -a flag is a shortcut to all jobs, if you are providing a jobid > > > > > (which is normally numeric) then don't set the -a flag. > > > > > > > > > > > Do you have use manual about how to use it ? > > > > > > > > > > In my previous mail I was assuming you were using orte to launch the > > > > > jobs but if you are using PBS then you'll need to use the 3.2 beta as > > > > > the PBS code is new, alternatively you could find the host where the > > > > > PBS script itself runs and check of the "ompi-ps" command gives you > > > > > any output, if it does then you could run it from there giving it the > > > > > orte jobid. > > > > > > > > > > A bit of background about resource managers (in which I'm including > > > > > orte and PBS), padb supports many resource managers and tries to > > > > > automatically detect which ones you have installed on your system. If > > > > > you don't specify one then it'll see what is installed, if there is > > > > > more than one resource manager installed then it'll see which of them > > > > > claim to have active jobs - if only one resource manager meets this > > > > > criteria then it'll pick that one - hence 99% of the time it should > > > > > just work. If more than one resource manager claims to have active > > > > > jobs then padb will refuse to run but ask the user to specify one > > > > > explicitly. > > > > > > > > > > You should try the following in order once you have 3.2 installed. > > > > > > > > > > padb -Ormgr=pbs -Q <myjob> > > > > > > > > > > Or - find the node where the PBS script is being executed, check that > > > > > the ompi-ps command is returning the jobid and then run > > > > > > > > > > padb -Ormgr=orte -Q <openmpi_jobid> > > > > > > > > > > Ashley, > > > > > > > > > > -- > > > > > > > > > > Ashley Pittman, Bath, UK. > > > > > > > > > > Padb - A parallel job inspection tool for cluster computing > > > > > http://padb.pittman.org.uk > > > > > > > > > > > > > > > _______________________________________________ > > > > > users mailing list > > > > > us...@open-mpi.org > > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > > > users mailing list > > > > us...@open-mpi.org > > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > > > > > > Ashley Pittman, Bath, UK. > > > > > > Padb - A parallel job inspection tool for cluster computing > > > http://padb.pittman.org.uk > > > > > -- > > Ashley Pittman, Bath, UK. > > Padb - A parallel job inspection tool for cluster computing > http://padb.pittman.org.uk >