References: <200809290102.m8t12ic5022...@despam-11.iastate.edu> <5118_1222651029_m8t1h7c9014112_297d3668-bbfa-480c-8aa3-4dfe9a7dc...@lanl.gov> 
<200809290207.m8t27hg6030...@despam-10.iastate.edu> <19464_1222702229_m8tfursa024528_a4205240-a331-4854-b32c-bfb27b24d...@cisco.com> 
<200809291541.m8tffqih010...@despam-11.iastate.edu> <22576_1222703352_m8tfn8wb024921_bd4e4429-86d4-465a-993d-71f35dc36...@cisco.com> 
<200809291627.m8tgrmxd023...@despam-11.iastate.edu> <12253_1222707367_m8tgu5xa015179_04150996-1df4-439e-ab65-a6dc37b9b...@staff.uni-marburg.de> 
<200809291706.m8th66i0032...@despam-11.iastate.edu> <8951_1222709874_m8thbqlr017868_7b53c0fe-9368-4509-990f-0b05b82ff...@staff.uni-marburg.de> 
<200809292033.m8tkxr60021...@despam-11.iastate.edu> <31399_1222721194_m8tkkvms012588_6f99b227-4218-4584-84e1-ef8cbecdb...@staff.uni-marburg.de> 
<200809292112.m8tlciwp030...@despam-11.iastate.edu> <31433_1222723468_m8tlopgp010573_f5ff317b-297f-4029-8001-356da5348...@rain.org> 
<200809292130.m8tlurdv004...@despam-11.iastate.edu>
X-Mailer: Apple Mail (2.929.2)
Return-Path: jsquy...@cisco.com
X-OriginalArrivalTime: 29 Sep 2008 21:51:21.0482 (UTC) 
FILETIME=[82A41EA0:01C9227D]

It sounds like your Torque is not setup properly if the job never started.

You probably want to take the conversation back to the Torque list... this unfortunately is not the right place to get Torque help.

Sorry!



On Sep 29, 2008, at 5:30 PM, Zhiliang Hu wrote:

At 02:15 PM 9/29/2008 -0700, you wrote:
It sounds like you may not have setup paswordless ssh between all
your nodes.

Doug Reeder

That's not the case.  paswordless ssh is set up and it works fine.
-- that's how I can do "mpirun -np 6 -machinefiles ......" fine.

Zhiliang


On Sep 29, 2008, at 2:12 PM, Zhiliang Hu wrote:

At 10:45 PM 9/29/2008 +0200, you wrote:
Am 29.09.2008 um 22:33 schrieb Zhiliang Hu:

At 07:37 PM 9/29/2008 +0200, Reuti wrote:

"-l nodes=6:ppn=2" is all I have to specify the node requests:

this might help: http://www.open-mpi.org/faq/?category=tm

Essentially the examples given on this web is no difference from
what I did.
Only thing new is, I suppose "qsub -I " is for interactive mode.
When I did this:

qsub -I -l nodes=7 mpiblastn.sh

It hangs on "qsub: waiting for job 798.nagrp2.ansci.iastate.edu to
start".


UNIX_PROMPT> qsub -l nodes=6:ppn=2 /path/to/mpi_program
where "mpi_program" is a file with one line:
/path/to/mpirun -np 12 /path/to/my_program

Can you please try this jobscript instead:

#!/bin/sh
set | grep PBS
/path/to/mpirun /path/to/my_program

All should be handled by Open MPI automatically. With the "set"
bash
command you will get a list with all defined variables for further analysis; and where you can check for the variables set by Torque.

-- Reuti

"set | grep PBS" part had nothing in output.

Strange - you checked the .o end .e files of the job? - Reuti

There is nothing in -o nor -e output.  I had to kill the job.
I checked torque log, it shows (/var/spool/torque/server_logs):

09/29/2008 15:52:16;0100;PBS_Server;Job;799.xxx.xxx.xxx;enqueuing
into default, state 1 hop 1
09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Queued
at request of z...@xxx.xxx.xxx, owner = z...@xxx.xxx.xxx, job name =
mpiblastn.sh, queue = default
09/29/2008 15:52:16;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent
command new
09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job
Modified at request of schedu...@xxx.xxx.xxx
09/29/2008 15:52:27;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job deleted
at request of z...@xxx.xxx.xxx
09/29/2008 15:52:27;0100;PBS_Server;Job;799.xxx.xxx.xxx;dequeuing
from default, state EXITING
09/29/2008 15:52:27;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent
command term
09/29/2008 15:52:47;0001;PBS_Server;Svr;PBS_Server;is_request, bad
attempt to connect from 172.16.100.1:1021 (address not trusted -
check entry in server_priv/nodes)

where the server_priv/nodes has:
node001 np=4
node002 np=4
node003 np=4
node004 np=4
node005 np=4
node006 np=4
node007 np=4

which was set up by the vender.

What is "address not trusted"?

Zhiliang




_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to