0001
[exec5:10861] [[59432,0],5] odls:default:fork binding child [[59432,1],6] to
cpus 0001
Hope that helps clear up the confusion! Please say it does, my head hurts...
Chris
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
ly different to what I
had before, but our cluster is busier today :-)
Chris
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
] [[23443,0],6] odls:default:fork binding child [[23443,1],7] to
cpus 0001
[exec1:30779] [[23443,0],5] odls:default:fork binding child [[23443,1],6] to
cpus 0001
[exec3:12818] [[23443,0],3] odls:default:fork binding child [[23443,1],4] to
cpus 0001
.
C
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
cluster.stats.local: PE_HOSTFILE shows
slots=1
[exec4:17384] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows
slots=1
[exec4:17384] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows
slots=1
Chris
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
t then I guess we're into the realm of setting
allocation_rule.
Is it going to be worth looking at creating a patch for this? I don't know
much of the internals of SGE -- would it be hard work to do? I've not that
much time to dedicate towards it, but I could put some effort in if necessary...
Chris
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
for the 2nd job?
>
> --td
That's exactly it. Each MPI process needs to be bound to 1 processor in a way
that reflects GE's slot allocation scheme.
C
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
currently...
Cheers,
Chris
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
are using a version that
> predates the change.
>
> Thanks
> Ralph
Hi Ralph,
I'm using OMPI version 1.4.2. I can upgrade and try it out if necessary. Is
there anything I can give you as potential debug material?
Cheers,
Chris
--
Dr Chris Jewell
Department of Statistics
U
.stats.local 1 batch.q@exec2.cluster.stats.local 0,1:0,2
exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1:0,2
So the pe_hostfile still doesn't give an accurate representation of the binding
allocation for use by OpenMPI. Question: is there a system file or command
that I co
re allocated where on the cluster, and write an OpenMPI
rankfile.
Any thoughts on that?
Cheers,
Chris
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
JSV to remove core binding for the
MPI jobs (but retain it for serial and SMP jobs). Any more ideas??
Cheers,
Chris
(PS. Dave: how is my alma mater these days??)
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
>
> It looks to me like your remote nodes aren't finding the orted executable. I
> suspect the problem is that you need to forward the path and ld_library_path
> tot he remove nodes. Use the mpirun -x option to do so.
Hi, problem sorted. It was actually caused by the system I currently use t
e suggest either what is wrong, or how I might progress
with getting more information?
Many thanks,
Chris
--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778
13 matches
Mail list logo