I'm trying to use MPI_Comm_Spawn with MPI_Info's host key to spawn processes from a process not started with mpirun. This works with the host key set to the localhost's hostname, but it does not work when I use other hosts.
I'm using version 1.3a1r19602. I need to use orte_launch_agent to set up my environment a bit before orted is started, but it fails with errors listed below. When I try to run orted directly on the command line with some of the verbosity flags turned to "11", I receive the same messages. Does anybody have any suggestions? thank you, Will [fqdn:24761] mca: base: components_open: Looking for ess components [fqdn:24761] mca: base: components_open: opening ess components [fqdn:24761] mca: base: components_open: found loaded component env [fqdn:24761] mca: base: components_open: component env has no register function [fqdn:24761] mca: base: components_open: component env open function successful [fqdn:24761] mca: base: components_open: found loaded component hnp [fqdn:24761] mca: base: components_open: component hnp has no register function [fqdn:24761] mca: base: components_open: component hnp open function successful [fqdn:24761] mca: base: components_open: found loaded component singleton [fqdn:24761] mca: base: components_open: component singleton has no register function [fqdn:24761] mca: base: components_open: component singleton open function successful [fqdn:24761] mca: base: components_open: found loaded component slurm [fqdn:24761] mca: base: components_open: component slurm has no register function [fqdn:24761] mca: base: components_open: component slurm open function successful [fqdn:24761] mca: base: components_open: found loaded component tool [fqdn:24761] mca: base: components_open: component tool has no register function [fqdn:24761] mca: base: components_open: component tool open function successful [fqdn:24761] mca:base:select: Auto-selecting ess components [fqdn:24761] mca:base:select:( ess) Querying component [env] [fqdn:24761] mca:base:select:( ess) Skipping component [env]. Query failed to return a module [fqdn:24761] mca:base:select:( ess) Querying component [hnp] [fqdn:24761] mca:base:select:( ess) Skipping component [hnp]. Query failed to return a module [fqdn:24761] mca:base:select:( ess) Querying component [singleton] [fqdn:24761] mca:base:select:( ess) Skipping component [singleton]. Query failed to return a module [fqdn:24761] mca:base:select:( ess) Querying component [slurm] [fqdn:24761] mca:base:select:( ess) Skipping component [slurm]. Query failed to return a module [fqdn:24761] mca:base:select:( ess) Querying component [tool] [fqdn:24761] mca:base:select:( ess) Skipping component [tool]. Query failed to return a module [fqdn:24761] mca:base:select:( ess) No component selected! [fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 125 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_base_select failed --> Returned value Not found (-13) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [fqdn:24761] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file orted/orted_main.c at line 315