Hello everyone,

I am using openmpi-1.10.2 and I am using the `spawn_multiple` MPI function
inside a for-loop. My program spawns N workers within each iteration of the
for-loop, makes some changes to the input for the next iteration, and then
proceeds to the next iteration.

After a few iterations (~40), I am getting the following error:

ORTE_ERROR_LOG: The system limit on number of children a process can have
was reached in file odls_default_module.c at line 928

However, I believe I am successfully disconnecting the workers at the end
of each iteration, and I am never creating more than 8 workers at a time. I
am running on a single node with 16 cores.

I received some help from you previously in this
<https://www.open-mpi.org/community/lists/users/2016/06/29445.php> thread,
where Nathan Hjelm / Ralph Castian found a bug that was leading to a
different error. Ralph found a work-around for now by telling me to add
"-mca btl tcp,sm,self" to the mpirun cmd line.  I also use the
"-oversubscribe" option. My full executable line looks like this:

mpiexec -np 16 -oversubscribe -mca btl tcp,sm,self python
../../structopt/genetic.py genetic.in.json

I use mpi4py which is why python is run as the executable.

I am hoping someone might know why I am getting the "system limit on number
of children" error or if someone has had a similar error in the past. I
couldn't find anything on Google.

Please let me know if I can give you additional information to help.

Thank you,
Jason
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to