I confirm that raising the max 'open files' limit to 2048 allows launching up to 510 processes per node.

By the way, I just discovered that launching the processes while being logged directly onto the host instead of the front-end machine gives a clearer error message that would have probably tipped me off:

[cut]
[fsimula@q012 ~]$ mpirun -np 255 -host q012 uptime | wc -l
[q012.qng:31455] [[22942,0],0] ORTE_ERROR_LOG: The system limit on number of pipes a process can open was reached in file base/odls_base_default_fns.c at line 1739
--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error
on node q012.qng. More information may be available above.
--------------------------------------------------------------------------

[fsimula@q012 ~]$ ulimit -n 2048
[fsimula@q012 ~]$ mpirun -np 510 -host q012 uptime | wc -l
510
[/cut]

Many thanks to both Jeff and Ralph for pointing me in the right direction.
Francesco

Il 2013-09-11 09:46 Jeff Squyres (jsquyres) ha scritto:
As Ralph said, you're probably running out of file descriptors;
mpirun uses a few (2-3? I don't remember offhand) for each MPI process
launched.

There are many factors that can cause limits like this -- file
descriptors are only one.  It very much depends on the configuration
of the machine on which you're running.  My point: Sorry, but it'll
likely take some experimentation on your part to figure out how many
you can run on a single machine.


On Sep 10, 2013, at 4:10 PM, Francesco Simula
<francesco.sim...@roma1.infn.it> wrote:

Dear forum,

I probably must apologize in advance for the very basic question but I wasn't able to find an answer elsewhere: how do I find the maximum number of processes that can be concurrently instantiated by mpirun on one single host of a cluster?

If I launch (on an CentOS 6.3 cluster with quad-core dual Xeons nodes, equipped with OpenMPI 1.5.4 and IB HCAs but I think this latter is of no consequence):

[cut]
mpirun -np 250 -host q012 hostname
[/cut]

I expect and obtain 250 rows of:
[cut]
q012.qng
[/cut]

The same for 251, 252, 253 and 254 BUT not for 255, when it returns:

[cut]

--------------------------------------------------------------------------
mpirun was unable to start the specified application as it encountered an error
on node q012. More information may be available above.

--------------------------------------------------------------------------
[/cut]

I know that 250 processes is quite an oversubscription for a single node that has no more than 8 real cores but I wanted to see the actual degradation of performances instead of a crash.

Which hard limit (in OpenMPI or in the system) am I hitting for not being able to run 255 MPI processes on one single host?

The output of ulimit -a for the user is:

[cut]
ulimit -a
core file size          (blocks, -c) 1000000
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 95054
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 100000
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[/cut]

Many thanks,
Francesco
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to