Hello everyone, I am using openmpi-1.10.2 and I am using the `spawn_multiple` MPI function inside a for-loop. My program spawns N workers within each iteration of the for-loop, makes some changes to the input for the next iteration, and then proceeds to the next iteration.
After a few iterations (~40), I am getting the following error: ORTE_ERROR_LOG: The system limit on number of children a process can have was reached in file odls_default_module.c at line 928 However, I believe I am successfully disconnecting the workers at the end of each iteration, and I am never creating more than 8 workers at a time. I am running on a single node with 16 cores. I received some help from you previously in this <https://www.open-mpi.org/community/lists/users/2016/06/29445.php> thread, where Nathan Hjelm / Ralph Castian found a bug that was leading to a different error. Ralph found a work-around for now by telling me to add "-mca btl tcp,sm,self" to the mpirun cmd line. I also use the "-oversubscribe" option. My full executable line looks like this: mpiexec -np 16 -oversubscribe -mca btl tcp,sm,self python ../../structopt/genetic.py genetic.in.json I use mpi4py which is why python is run as the executable. I am hoping someone might know why I am getting the "system limit on number of children" error or if someone has had a similar error in the past. I couldn't find anything on Google. Please let me know if I can give you additional information to help. Thank you, Jason
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users