Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-18 Thread Chris Jewell
0001 [exec5:10861] [[59432,0],5] odls:default:fork binding child [[59432,1],6] to cpus 0001 Hope that helps clear up the confusion! Please say it does, my head hurts... Chris -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Chris Jewell
ly different to what I had before, but our cluster is busier today :-) Chris -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Chris Jewell
] [[23443,0],6] odls:default:fork binding child [[23443,1],7] to cpus 0001 [exec1:30779] [[23443,0],5] odls:default:fork binding child [[23443,1],6] to cpus 0001 [exec3:12818] [[23443,0],3] odls:default:fork binding child [[23443,1],4] to cpus 0001 . C -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Chris Jewell
cluster.stats.local: PE_HOSTFILE shows slots=1 [exec4:17384] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows slots=1 [exec4:17384] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows slots=1 Chris -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Chris Jewell
t then I guess we're into the realm of setting allocation_rule. Is it going to be worth looking at creating a patch for this? I don't know much of the internals of SGE -- would it be hard work to do? I've not that much time to dedicate towards it, but I could put some effort in if necessary... Chris -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Chris Jewell
for the 2nd job? > > --td That's exactly it. Each MPI process needs to be bound to 1 processor in a way that reflects GE's slot allocation scheme. C -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Chris Jewell
currently... Cheers, Chris -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Chris Jewell
are using a version that > predates the change. > > Thanks > Ralph Hi Ralph, I'm using OMPI version 1.4.2. I can upgrade and try it out if necessary. Is there anything I can give you as potential debug material? Cheers, Chris -- Dr Chris Jewell Department of Statistics U

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Chris Jewell
.stats.local 1 batch.q@exec2.cluster.stats.local 0,1:0,2 exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1:0,2 So the pe_hostfile still doesn't give an accurate representation of the binding allocation for use by OpenMPI. Question: is there a system file or command that I co

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Chris Jewell
re allocated where on the cluster, and write an OpenMPI rankfile. Any thoughts on that? Cheers, Chris -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-13 Thread Chris Jewell
JSV to remove core binding for the MPI jobs (but retain it for serial and SMP jobs). Any more ideas?? Cheers, Chris (PS. Dave: how is my alma mater these days??) -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-10-05 Thread Chris Jewell
> > It looks to me like your remote nodes aren't finding the orted executable. I > suspect the problem is that you need to forward the path and ld_library_path > tot he remove nodes. Use the mpirun -x option to do so. Hi, problem sorted. It was actually caused by the system I currently use t

[OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-10-04 Thread Chris Jewell
e suggest either what is wrong, or how I might progress with getting more information? Many thanks, Chris -- Dr Chris Jewell Department of Statistics University of Warwick Coventry CV4 7AL UK Tel: +44 (0)24 7615 0778