Dear Reuti,

thanks a lot, you're right! But why did the default behavior change but not the 
value of this parameter:

2.1.0: MCA plm rsh: parameter "plm_rsh_agent" (current value: "ssh : rsh", data 
source: default, level: 2 user/detail, type: string, synonyms: pls_rsh_agent, 
orte_rsh_agent)
                          The command used to launch executables on remote 
nodes (typically either "ssh" or "rsh")

1.10.6:  MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data 
source: default, level: 2 user/detail, type: string, synonyms: pls_rsh_agent, 
orte_rsh_agent)
                          The command used to launch executables on remote 
nodes (typically either "ssh" or "rsh")

That means there must have been changes in the code regarding that, perhaps for 
detecting SGE? Do you know of a way to revert to the old style (e.g. configure 
option)? Otherwise all my users have to add this option.

Thanks again, and have a nice day

Ado Arnolds

On 22.03.2017 13:58, Reuti wrote:
> Hi,
> 
>> Am 22.03.2017 um 10:44 schrieb Heinz-Ado Arnolds 
>> <arno...@mpa-garching.mpg.de>:
>>
>> Dear users and developers,
>>
>> first of all many thanks for all the great work you have done for OpenMPI!
>>
>> Up to OpenMPI-1.10.6 the mechanism for starting orted was to use SGE/qrsh:
>>  mpirun -np 8 --map-by ppr:4:node ./myid
>>  /opt/sge-8.1.8/bin/lx-amd64/qrsh -inherit -nostdin -V <DNS-Name of Remote 
>> Machine> orted --hnp-topo-sig 2N:2S:2L3:20L2:20L1:20C:40H:x86_64 -mca ess 
>> "env" -mca orte_ess_jobid "1621884928" -mca orte_ess_vpid 1 -mca 
>> orte_ess_num_procs "2" -mca orte_hnp_uri "1621884928.0;tcp://<IP-addr of 
>> Master>:41031" -mca plm "rsh" -mca rmaps_base_mapping_policy "ppr:4:node" 
>> --tree-spawn
>>
>> Now with OpenMPI-2.1.0 (and the release candidates) "ssh" is used to start 
>> orted:
>>  mpirun -np 8 --map-by ppr:4:node -mca mca_base_env_list OMP_NUM_THREADS=5 
>> ./myid
>>  /usr/bin/ssh -x <DNS-Name of Remote Machine>     
>> PATH=/afs/...../openmpi-2.1.0/bin:$PATH ; export PATH ; 
>> LD_LIBRARY_PATH=/afs/...../openmpi-2.1.0/lib:$LD_LIBRARY_PATH ; export 
>> LD_LIBRARY_PATH ; 
>> DYLD_LIBRARY_PATH=/afs/...../openmpi-2.1.0/lib:$DYLD_LIBRARY_PATH ; export 
>> DYLD_LIBRARY_PATH ;   /afs/...../openmpi-2.1.0/bin/orted --hnp-topo-sig 
>> 2N:2S:2L3:20L2:20L1:20C:40H:x86_64 -mca ess "env" -mca ess_base_jobid 
>> "1626013696" -mca ess_base_vpid 1 -mca ess_base_num_procs "2" -mca 
>> orte_hnp_uri "1626013696.0;usock;tcp://<IP-addr of Master>:43019" -mca 
>> plm_rsh_args "-x" -mca plm "rsh" -mca rmaps_base_mapping_policy "ppr:4:node" 
>> -mca pmix "^s1,s2,cray"
>>
>> qrsh set the environment properly on the remote side, so that environment 
>> variables from job scripts are properly transferred. With the ssh variant 
>> the environment is not set properly on the remote side, and it seems that 
>> there are handling problems with Kerberos tickets and/or AFS tokens.
>>
>> Is there any way to revert the 2.1.0 behavior to the 1.10.6 (use SGE/qrsh) 
>> one? Are there mca params to set this?
>>
>> If you need more info, please let me know. (Job submitting machine and 
>> target cluster are the same with all tests. SW is residing in AFS 
>> directories visible on all machines. Parameter "plm_rsh_disable_qrsh" 
>> current value: "false")
> 
> It looks like `mpirun` still needs:
> 
> -mca plm_rsh_agent foo
> 
> to allow SGE to be detected.
> 
> -- Reuti
> 
> 
> 
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to