I would like to mpirun across nodes that do not share a filesystem and might have the executable in different directories. For example, node0 has the executable at /tmp/job42/mpitest and node1 has it at /tmp/job100/mpitest.
If you can grant me that I have a ssh wrapper script (that gets set as the orte/plm_rsh_agent**) that cds to where the executable lies on each worker node before launching orted, is there a way to tell the worker node orted processes to run the executable from the current working directory rather than from the absolute path that (I presume) the head node process advertises? I've tried adding/changing orte_remote_tmpdir_base per each worker orted process, but then I get an error about having both global_tmpdir and remote_tmpdir set. Then if I set local_tmpdir to match the head node, I'm back at square one. I know this sounds fairly convoluted, but I'm updating helper scripts for HTCondor so that its parallel universe can work with newer MPI versions (dealing with similar headaches trying to get hydra to cooperate). The default behavior is for condor to place each "job" (i.e. sshd+orted process) in a sandbox, and we cannot know the name of the sandbox directories ahead of time or assume that they will have the same name across nodes. The easiest way to deal with this is if we can assume the executable lies on a shared fs, but the fewer assumptions from our POV the better. (Even better would be if someone /really/ wants to build in condor support like has been done for other launchers; that's beyond me right now.) **Also, what is the correct parameter to set to rsh_agent? ompi_info (and mpirun) says orte_rsh_agent is deprecated, but online docs seem to suggest that plm_rsh_agent is deprecated. I'm using version 1.8.1. Thanks for any insight you can provide Jason Patton _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users