Hi today I tried a different rankfile and got once more a problem. :-((
> > thank you very much for your patch. I have applied the patch to > > openmpi-1.6.4rc4. > > > > Open MPI: 1.6.4rc4r28022 > > : [B .][. .] (slot list 0:0) > > : [. B][. .] (slot list 0:1) > > : [B B][. .] (slot list 0:0-1) > > : [. .][B .] (slot list 1:0) > > : [. .][. B] (slot list 1:1) > > : [. .][B B] (slot list 1:0-1) > > : [B B][B B] (slot list 0:0-1,1:0-1) > > That looks great. I'll file a CMR to get this patch into 1.6. > Unless you indicate otherwise, I'll assume this issue is understood > for 1.6. Rankfile rf_6 is the same as last time. I have added one more line in rf_7 and I switched the sequence of the hosts in rf_8. Everything is still fine with rf_6. I don't get any output for rank 1 with rf_7 and I get an error for rf_8. Both machines use the same hardware. sunpc1 rankfiles 106 cat rf_6 # mpiexec -report-bindings -rf rf_6 hostname rank 0=sunpc1 slot=0:0-1,1:0-1 sunpc1 rankfiles 107 cat rf_7 # mpiexec -report-bindings -rf rf_7 hostname rank 0=sunpc1 slot=0:0-1,1:0-1 rank 1=sunpc0 slot=0:0-1 sunpc1 rankfiles 108 cat rf_8 # mpiexec -report-bindings -rf rf_8 hostname rank 0=sunpc0 slot=0:0-1,1:0-1 rank 1=sunpc1 slot=0:0-1 sunpc1 rankfiles 109 mpiexec -report-bindings -rf rf_6 hostname [sunpc1:09779] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) sunpc1 rankfiles 110 mpiexec -report-bindings -rf rf_7 hostname [sunpc1:09782] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) sunpc1 rankfiles 111 mpiexec -report-bindings -rf rf_8 hostname -------------------------------------------------------------------------- The rankfile that was used claimed that a host was either not allocated or oversubscribed its slots. Please review your rank-slot assignments and your host allocation to ensure a proper match. Also, some systems may require using full hostnames, such as "host1.example.com" (instead of just plain "host1"). Host: sunpc0 -------------------------------------------------------------------------- I get the following output, if I use sunpc0 as local host. sunpc0 rankfiles 102 mpiexec -report-bindings -rf rf_6 hostname -------------------------------------------------------------------------- All nodes which are allocated for this job are already filled. -------------------------------------------------------------------------- sunpc0 rankfiles 103 mpiexec -report-bindings -rf rf_7 hostname -------------------------------------------------------------------------- The rankfile that was used claimed that a host was either not allocated or oversubscribed its slots. Please review your rank-slot assignments and your host allocation to ensure a proper match. Also, some systems may require using full hostnames, such as "host1.example.com" (instead of just plain "host1"). Host: sunpc1 -------------------------------------------------------------------------- sunpc0 rankfiles 104 mpiexec -report-bindings -rf rf_8 hostname [sunpc0:19027] MCW rank 0 bound to socket 0[core 0-1] socket 1[core 0-1]: [B B][B B] (slot list 0:0-1,1:0-1) I get the following output, if I use tyr as local host. tyr rankfiles 218 mpiexec -report-bindings -rf rf_6 hostname -------------------------------------------------------------------------- All nodes which are allocated for this job are already filled. -------------------------------------------------------------------------- tyr rankfiles 219 mpiexec -report-bindings -rf rf_7 hostname -------------------------------------------------------------------------- All nodes which are allocated for this job are already filled. -------------------------------------------------------------------------- tyr rankfiles 220 mpiexec -report-bindings -rf rf_8 hostname -------------------------------------------------------------------------- All nodes which are allocated for this job are already filled. -------------------------------------------------------------------------- Do you have any ideas why this happens? Thank you very much for any help in advance. Kind regards Siegmar