Bonsoir Ralph,

Le 15/12/2010 18:45, Ralph Castain a écrit :
It looks like all the messages are flowing within a single job (all three processes mentioned in the error have the same identifier). Only possibility I can think of is that somehow you are reusing ports - is it possible your system doesn't have enough ports to support all the procs?

 Seems there is on every worker node a range of almost 30k ports available:
> ssh r33i0n0 cat /proc/sys/net/ipv4/ip_local_port_range
32768    61000

 This is AFAIK the only way I can get info about this.
Are these 30k ports this enough ?

Question is : is OpenMPI opening ports from every node towards every other node ?
In such a case I could figure out why it is going to to lacking ports when
I increase the number of nodes.

But: is there a possibility (mca param ?) to prevent OpenMPI to open so many ports ? Indeed, apart from rank 0 node, every MPI process will need to communicate with ONLY the 8 (nearest) neighbour nodes. So, there should be a switch somewhere telling OpenMPI to open a port ONLY when needed, but I did not find it among ompi_info stuff ;-)
Which one is it ?

 Thanks,   Best,   G.

Reply via email to