Hi all,

got a problem with my cluster using OpenMPI + Torque+ Maui.

I can submit 50 different jobs (single process) and the batching system will run all 50 in parallel, but I cant get an MPI job to run on more that 1 node. I assumed it must be my pbs script, but I have tried just about every config I can find/think of and still no luck.

pbsnodes -a produces the following, but all the way up to 16 nodes and the second half are quad cores, just showing the first 2 for brevity.

tuxta@WPCluster:~$ pbsnodes -a
WPCluster.workstation.griffith.edu.au
     state = free
     np = 4
     ntype = cluster
status = rectime=1321274238,varattr=,jobs=,state=free,netload=87819550163,gres=,loadave=0.00,ncpus=4,physmem=4047980kb,availmem=11466240kb,totmem=11860068kb,idletime=1574,nusers=1,nsessions=1,sessions=27627,uname=Linux WPCluster 2.6.32-34-server #77-Ubuntu SMP Tue Sep 13 20:54:38 UTC 2011 x86_64,opsys=linux

node02
     state = free
     np = 2
     ntype = cluster
status = rectime=1321274239,varattr=,jobs=,state=free,netload=191116409,gres=,loadave=0.00,ncpus=2,physmem=1021584kb,availmem=8642904kb,totmem=8832648kb,idletime=2258,nusers=0,nsessions=? 15201,sessions=? 15201,uname=Linux node02 2.6.32-34-server #77-Ubuntu SMP Tue Sep 13 20:54:38 UTC 2011 x86_64,opsys=linux


The following file works fine when nodes=1, but when I make nodes=2 it never runs, just keeps a state of E rather than R


#!/bin/bash

#PBS -N Hello_Test
#PBS -l nodes=2:ppn=4

cd $PBS_O_WORKDIR

mpiexec -np 8 hello


Not sure what other info is helpful so don't want to put heaps of stuff here, if any more info is needed just let me know.

Does anyone have any ideas?

Regards

Tuxta



--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to