Hi

> > thank you very much for your answer. I have compiled your program
> > and get different behaviours for openmpi-1.6.4rc3 and openmpi-1.9.
> > 
> > I get the following output for openmpi-1.9 (different outputs !!!).
> > 
> > sunpc1 rankfiles 104 mpirun --report-bindings --rankfile myrankfile
> >  ./a.out
> > [sunpc1:26554] MCW rank 0 bound to socket 0[core 0[hwt 0]], 
> >   socket 0[core 1[hwt 0]]: [B/B][./.]
> > unbound
> > 
> > sunpc1 rankfiles 105 mpirun --report-bindings --rankfile myrankfile_0
> >  ./a.out
> > [sunpc1:26557] MCW rank 0 bound to socket 0[core 0[hwt 0]]:   [B/.][./.]
> > bind to 0
> 
> I think what's happening is that although you specified "0:0" or "0:1"
> in the rankfile, the string "0,0" or "0,1" is getting passed 
> in (at least in the runs I looked at).  That colon became a comma.
> So, it's just by accident that myrankfile_0 is working out all 
> right.

It is working for 0:0 and 1:1 and it isn't working for 0:1 and
1:0. The machine is a Sun Ultra 40 by the way.

sunpc1 rankfiles 104 ompi_info | grep "MPI:"
                Open MPI: 1.9a1r28035
sunpc1 rankfiles 105 cat myrankfile_*
rank 0=sunpc1 slot=0:0
rank 0=sunpc1 slot=0:1
rank 0=sunpc1 slot=1:0
rank 0=sunpc1 slot=1:1
sunpc1 rankfiles 106 cc check.c 
sunpc1 rankfiles 107 mpirun --report-bindings \
  --rankfile myrankfile_0 ./a.out
bind to 0
[sunpc1:26988] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
  [B/.][./.]

sunpc1 rankfiles 108 mpirun --report-bindings \
  --rankfile myrankfile_1 ./a.out
[sunpc1:26991] MCW rank 0 bound to socket 0[core 0[hwt 0]],
  socket 0[core 1[hwt 0]]: [B/B][./.]
unbound

sunpc1 rankfiles 109 mpirun --report-bindings \
  --rankfile myrankfile_2 ./a.out
[sunpc1:26994] MCW rank 0 bound to socket 1[core 2[hwt 0]],
  socket 1[core 3[hwt 0]]: [./.][B/B]
unbound

sunpc1 rankfiles 110 mpirun --report-bindings \
  --rankfile myrankfile_3 ./a.out
[sunpc1:26997] MCW rank 0 bound to socket 1[core 3[hwt 0]]:
  [./.][./B]
bind to 3
sunpc1 rankfiles 111 


> Could someone who knows the code better than I do help me narrow this
> down?  E.g., where is the rankfile parsed?  For what it's 
> worth, by the time mpirun reaches
> orte_odls_base_default_get_add_procs_data(), orte_job_data already
> contains the corrupted 
> cpu_bitmap string.


Thank you very much for any help in advance.


Kind regards

Siegmar

Reply via email to