That line of code is here: https://github.com/open-mpi/ompi/blob/v1.10.2/orte/mca/plm/lsf/plm_lsf_module.c#L346 (Unfortunately we didn't catch the rc from lsb_launch to see why it failed - I'll fix that).
So it looks like LSF failed to launch our daemon on one or more remote machines. This could be an LSF issue on one of the machines in your allocation. One thing to try is a blaunch from the command line to launch one process per node in your allocation (which is similar to what we are trying to do in this function). I would expect that to fail, but might show you which machine is problematic. On Fri, Sep 15, 2017 at 7:15 AM, Jing Gong <gongj...@kth.se> wrote: > Hi, > > > We tried to run a job of openfoam with 4480 cpus using the IBM LSF system > but got the following error messages: > > > ... > > [bs209:16251] [[25529,0],0] ORTE_ERROR_LOG: The specified application > failed to start in file /software/OpenFOAM/ThirdParty- > v1606+/openmpi-1.10.2/orte/mca/plm/lsf/plm_lsf_module.c at line 346 > [bs209: 16251] lsb_launch failed: 0 > ... > > > The openfoam is built by openmpi 1.10.2 within its Thirdparty package and > it works fine around 2000 cpus on the same cluster. > > > Is the issue related to the LSF system? Are there any openmpi flags > available to > > diagnose the problem ? > > > Thanks a lot. > > > Regards, Jing > > > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > -- Josh Hursey IBM Spectrum MPI Developer
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users