On Dec 15, 2010, at 1:11 PM, Gilbert Grosdidier wrote:

> Ralph,
> 
> I am not using the TCP BTL, only OPENIB one. Does this change the number of 
> sockets in use per node, please ?

I believe the openib btl opens sockets for connection purposes, so the count is 
likely the same. An IB person can confirm that...

> 
> But I suspect the ORTE daemons are communicating only through TCP anyway, 
> right ?

Yes

> 
> Also, is there anybody in the OpenMPI team using an SGI Altix cluster with a 
> high number of nodes
> (1k nodes, ie 8k cores) that I could ask him (her) about the right setup ?

Not sure, but 1k nodes isn't particularly large - we launch much larger 
clusters without problem.

My guess is that you have some zombie procs running that are messing up the 
communication. You said you had undergone a lot of crashed jobs that might have 
left zombies in their wake. Can you clean those up?


> 
> Thanks,   Best,    G.
> 
> 
> 
> Le 15/12/2010 21:03, Ralph Castain a écrit :
>> On Dec 15, 2010, at 12:30 PM, Gilbert Grosdidier wrote:
>> 
>>> Bonsoir Ralph,
>>> 
>>> Le 15/12/2010 18:45, Ralph Castain a écrit :
>>>> It looks like all the messages are flowing within a single job (all three 
>>>> processes mentioned in the error have the same identifier). Only 
>>>> possibility I can think of is that somehow you are reusing ports - is it 
>>>> possible your system doesn't have enough ports to support all the procs?
>>>> 
>>> Seems there is on every worker node a range of almost 30k ports available:
>>>> ssh r33i0n0 cat /proc/sys/net/ipv4/ip_local_port_range
>>> 32768    61000
>>> 
>>> This is AFAIK the only way I can get info about this.
>>> Are these 30k ports this enough ?
>> Depends on how many nodes there are in your system.
>> 
>>> Question is : is OpenMPI opening ports from every node towards every other 
>>> node ?
>>> In such a case I could figure out why it is going to to lacking ports when
>>> I increase the number of nodes.
>> Yes - in two ways:
>> 
>> 1. each ORTE daemon opens a port to every other daemon in the system. Thus, 
>> you need at least M ports if your job is running across M nodes
>> 
>> 2. each MPI process will open a direct port to any other MPI process that it 
>> communicates with. So if you have N processes on a node, and they only 
>> communicate to the 8 nearest neighbor nodes (each of which have N 
>> processes), and you are using the TCP btl, then you will consume an 
>> additional 8*N*N sockets on each node.
>> 
>>> But: is there a possibility (mca param ?) to prevent OpenMPI to open so 
>>> many ports ?
>>> Indeed, apart from rank 0 node, every MPI process will need to communicate 
>>> with ONLY
>>> the 8 (nearest) neighbour nodes. So, there should be a switch somewhere 
>>> telling OpenMPI
>>> to open a port ONLY when needed, but I did not find it among ompi_info 
>>> stuff ;-)
>> It always only opens a port when something tries to communicate - we never 
>> open ports in advance.
>> 
>>> Which one is it ?
>>> 
>>> Thanks,   Best,   G.


Reply via email to