Hi,

> Am 22.03.2017 um 10:44 schrieb Heinz-Ado Arnolds 
> <arno...@mpa-garching.mpg.de>:
> 
> Dear users and developers,
> 
> first of all many thanks for all the great work you have done for OpenMPI!
> 
> Up to OpenMPI-1.10.6 the mechanism for starting orted was to use SGE/qrsh:
>  mpirun -np 8 --map-by ppr:4:node ./myid
>  /opt/sge-8.1.8/bin/lx-amd64/qrsh -inherit -nostdin -V <DNS-Name of Remote 
> Machine> orted --hnp-topo-sig 2N:2S:2L3:20L2:20L1:20C:40H:x86_64 -mca ess 
> "env" -mca orte_ess_jobid "1621884928" -mca orte_ess_vpid 1 -mca 
> orte_ess_num_procs "2" -mca orte_hnp_uri "1621884928.0;tcp://<IP-addr of 
> Master>:41031" -mca plm "rsh" -mca rmaps_base_mapping_policy "ppr:4:node" 
> --tree-spawn
> 
> Now with OpenMPI-2.1.0 (and the release candidates) "ssh" is used to start 
> orted:
>  mpirun -np 8 --map-by ppr:4:node -mca mca_base_env_list OMP_NUM_THREADS=5 
> ./myid
>  /usr/bin/ssh -x <DNS-Name of Remote Machine>     
> PATH=/afs/...../openmpi-2.1.0/bin:$PATH ; export PATH ; 
> LD_LIBRARY_PATH=/afs/...../openmpi-2.1.0/lib:$LD_LIBRARY_PATH ; export 
> LD_LIBRARY_PATH ; 
> DYLD_LIBRARY_PATH=/afs/...../openmpi-2.1.0/lib:$DYLD_LIBRARY_PATH ; export 
> DYLD_LIBRARY_PATH ;   /afs/...../openmpi-2.1.0/bin/orted --hnp-topo-sig 
> 2N:2S:2L3:20L2:20L1:20C:40H:x86_64 -mca ess "env" -mca ess_base_jobid 
> "1626013696" -mca ess_base_vpid 1 -mca ess_base_num_procs "2" -mca 
> orte_hnp_uri "1626013696.0;usock;tcp://<IP-addr of Master>:43019" -mca 
> plm_rsh_args "-x" -mca plm "rsh" -mca rmaps_base_mapping_policy "ppr:4:node" 
> -mca pmix "^s1,s2,cray"
> 
> qrsh set the environment properly on the remote side, so that environment 
> variables from job scripts are properly transferred. With the ssh variant the 
> environment is not set properly on the remote side, and it seems that there 
> are handling problems with Kerberos tickets and/or AFS tokens.
> 
> Is there any way to revert the 2.1.0 behavior to the 1.10.6 (use SGE/qrsh) 
> one? Are there mca params to set this?
> 
> If you need more info, please let me know. (Job submitting machine and target 
> cluster are the same with all tests. SW is residing in AFS directories 
> visible on all machines. Parameter "plm_rsh_disable_qrsh" current value: 
> "false")

It looks like `mpirun` still needs:

-mca plm_rsh_agent foo

to allow SGE to be detected.

-- Reuti

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to