At 10:45 PM 9/29/2008 +0200, you wrote: >Am 29.09.2008 um 22:33 schrieb Zhiliang Hu: > >>At 07:37 PM 9/29/2008 +0200, Reuti wrote: >> >>>>"-l nodes=6:ppn=2" is all I have to specify the node requests: >>> >>>this might help: http://www.open-mpi.org/faq/?category=tm >> >>Essentially the examples given on this web is no difference from >>what I did. >>Only thing new is, I suppose "qsub -I " is for interactive mode. >>When I did this: >> >> qsub -I -l nodes=7 mpiblastn.sh >> >>It hangs on "qsub: waiting for job 798.nagrp2.ansci.iastate.edu to >>start". >> >> >>>>UNIX_PROMPT> qsub -l nodes=6:ppn=2 /path/to/mpi_program >>>>where "mpi_program" is a file with one line: >>>> /path/to/mpirun -np 12 /path/to/my_program >>> >>>Can you please try this jobscript instead: >>> >>>#!/bin/sh >>>set | grep PBS >>>/path/to/mpirun /path/to/my_program >>> >>>All should be handled by Open MPI automatically. With the "set" bash >>>command you will get a list with all defined variables for further >>>analysis; and where you can check for the variables set by Torque. >>> >>>-- Reuti >> >>"set | grep PBS" part had nothing in output. > >Strange - you checked the .o end .e files of the job? - Reuti
There is nothing in -o nor -e output. I had to kill the job. I checked torque log, it shows (/var/spool/torque/server_logs): 09/29/2008 15:52:16;0100;PBS_Server;Job;799.xxx.xxx.xxx;enqueuing into default, state 1 hop 1 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Queued at request of z...@xxx.xxx.xxx, owner = z...@xxx.xxx.xxx, job name = mpiblastn.sh, queue = default 09/29/2008 15:52:16;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent command new 09/29/2008 15:52:16;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job Modified at request of schedu...@xxx.xxx.xxx 09/29/2008 15:52:27;0008;PBS_Server;Job;799.xxx.xxx.xxx;Job deleted at request of z...@xxx.xxx.xxx 09/29/2008 15:52:27;0100;PBS_Server;Job;799.xxx.xxx.xxx;dequeuing from default, state EXITING 09/29/2008 15:52:27;0040;PBS_Server;Svr;xxx.xxx.xxx;Scheduler sent command term 09/29/2008 15:52:47;0001;PBS_Server;Svr;PBS_Server;is_request, bad attempt to connect from 172.16.100.1:1021 (address not trusted - check entry in server_priv/nodes) where the server_priv/nodes has: node001 np=4 node002 np=4 node003 np=4 node004 np=4 node005 np=4 node006 np=4 node007 np=4 which was set up by the vender. What is "address not trusted"? Zhiliang