That line of code is here:

https://github.com/open-mpi/ompi/blob/v1.10.2/orte/mca/plm/lsf/plm_lsf_module.c#L346
(Unfortunately we didn't catch the rc from lsb_launch to see why it failed
- I'll fix that).

So it looks like LSF failed to launch our daemon on one or more remote
machines. This could be an LSF issue on one of the machines in your
allocation. One thing to try is a blaunch from the command line to launch
one process per node in your allocation (which is similar to what we are
trying to do in this function). I would expect that to fail, but might show
you which machine is problematic.


On Fri, Sep 15, 2017 at 7:15 AM, Jing Gong <gongj...@kth.se> wrote:

> Hi,
>
>
> We tried to run a job of openfoam with 4480 cpus using the IBM LSF system
> but got the following error messages:
>
>
> ...
>
> [bs209:16251] [[25529,0],0] ORTE_ERROR_LOG: The specified application
> failed to start in file /software/OpenFOAM/ThirdParty-
> v1606+/openmpi-1.10.2/orte/mca/plm/lsf/plm_lsf_module.c at line 346
> [bs209: 16251] lsb_launch failed: 0
> ...
>
>
> The openfoam is built by openmpi 1.10.2 within its Thirdparty package and
> it works fine  around 2000 cpus on the same cluster.
>
>
> Is the issue related to the LSF system? Are there any openmpi flags
> available to
>
> diagnose the problem ?
>
>
> Thanks a lot.
>
>
> Regards, Jing
>
>
>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>



-- 
Josh Hursey
IBM Spectrum MPI Developer
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to