I have already sent it On Thu, May 28, 2015 at 10:21 AM.

28 мая 2015 г. 20:07 пользователь Mike Dubman <mi...@dev.mellanox.co.il> написал:

it is fine to recompile OMPI from HPCx to apply site default (choice of job scheduler for example, OMPI from HPCX compiled with ssh support only, etc.).

If ssh launcher is working on your system - than OMPI from HPCX should work as well.

could you please send to Alina (in cc) the command line and its output from hpcx/ompi failure?

Thanks


On Thu, May 28, 2015 at 7:33 PM, Timur Ismagilov <tismagilov@mail.ru> wrote:

Is it normal to rebuild openmpi from hpcx?
Why binaries don't work?




Четверг, 28 мая 2015, 14:01 +03:00 от Alina Sklarevich <alinas@dev.mellanox.co.il>:

Thank you for this info.

If 'yalla' now works for you, is there anything that is still wrong?

Thanks,
Alina.

On Thu, May 28, 2015 at 10:21 AM, Timur Ismagilov <tismagilov@mail.ru> wrote:

I'm sorry for the delay.

Here it is:
(I used 5 min time limit)
/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-mellanox-v1.8/bin/mpirun -x LD_PRELOAD=/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-          redhat6.2-x86_64/mxm/debug/lib/libmxm.so -x MXM_LOG_LEVEL=data -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off --mca pml yalla --hostfile hostlist ./hello 1> hello_debugMXM_n-2_ppn-2.out  2>hello_debugMXM_n-2_ppn-2.err 

P.S.
yalla warks fine with rebuilded ompi: --with-mxm=$HPCX_MXM_DIR






Вторник, 26 мая 2015, 16:22 +03:00 от Alina Sklarevich <alinas@dev.mellanox.co.il>:

Hi Timur,

HPCX has a debug version of MXM. Can you please add the following to your command line with pml yalla in order to use it and attach the output? 
"-x LD_PRELOAD=$HPCX_MXM_DIR/debug/lib/libmxm.so -x MXM_LOG_LEVEL=data"

Also, could you please attach the entire output of "$HPCX_MPI_DIR/bin/ompi_info -a" 

Thank you,
Alina. 

On Tue, May 26, 2015 at 3:39 PM, Mike Dubman <miked@dev.mellanox.co.il> wrote:
Alina - could you please take a look?
Thx


---------- Forwarded message ----------
From: Timur Ismagilov <tismagilov@mail.ru>
Date: Tue, May 26, 2015 at 12:40 PM
Subject: Re[12]: [OMPI users] MXM problem
To: Open MPI Users <users@open-mpi.org>
Cc: Mike Dubman <miked@dev.mellanox.co.il>


It does not work for single node:

1) host: $  $HPCX_MPI_DIR/bin/mpirun -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off -host node5 -mca pml yalla -x MXM_TLS=ud,self,shm --prefix $HPCX_MPI_DIR -mca plm_base_verbose 5  -mca oob_base_verbose 10 -mca rml_base_verbose 10 --debug-daemons  -np 1 ./hello &> yalla.out                                

2) host: $  $HPCX_MPI_DIR/bin/mpirun -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off -host node5  --mca pml cm --mca mtl mxm --prefix $HPCX_MPI_DIR -mca plm_base_verbose 5  -mca oob_base_verbose 10 -mca rml_base_verbose 10 --debug-daemons -np 1 ./hello &> cm_mxm.out

I've attached the yalla.out and cm_mxm.out to this email.



Вторник, 26 мая 2015, 11:54 +03:00 от Mike Dubman <miked@dev.mellanox.co.il>:

does it work from single node?
could you please run with opts below and attach output?

 -mca plm_base_verbose 5  -mca oob_base_verbose 10 -mca rml_base_verbose 10 --debug-daemons

On Tue, May 26, 2015 at 11:38 AM, Timur Ismagilov <tismagilov@mail.ru> wrote:

1. mxm_perf_test - OK.
2. no_tree_spawn  - OK.
3. ompi yalla and "--mca pml cm --mca mtl mxm" still does not work (I use prebuild ompi-1.8.5 from hpcx-v1.3.330)
3.a) host:$  $HPCX_MPI_DIR/bin/mpirun -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off -host node5,node153  --mca pml cm --mca mtl mxm --prefix $HPCX_MPI_DIR ./hello
--------------------------------------------------------------------------                               
A requested component was not found, or was unable to be opened.  This                                   
means that this component is either not installed or is unable to be                                     
used on your system (e.g., sometimes this means that shared libraries                                    
that the component requires are unable to be found/loaded).  Note that                                   
Open MPI stopped checking at the first component that it did not find.                                   
                                                                                                         
Host:      node153                                                                                       
Framework: mtl                                                                                           
Component: mxm                                                                                           
--------------------------------------------------------------------------                               
[node5:113560] PML cm cannot be selected                                                                 
--------------------------------------------------------------------------                               
No available pml components were found!                                                                  
                                                                                                         
This means that there are no components of this type installed on your                                   
system or all the components reported that they could not be used.                                       
                                                                                                         
This is a fatal error; your MPI process is likely to abort.  Check the                                   
output of the "ompi_info" command and ensure that components of this                                     
type are available on your system.  You may also wish to check the                                       
value of the "component_path" MCA parameter and ensure that it has at                                    
least one directory that contains valid MCA components.                                                  
--------------------------------------------------------------------------                               
[node153:44440] PML cm cannot be selected                                                                
-------------------------------------------------------                                                  
Primary job  terminated normally, but 1 process returned                                                 
a non-zero exit code.. Per user-direction, the job has been aborted.                                     
-------------------------------------------------------                                                  
--------------------------------------------------------------------------                               
mpirun detected that one or more processes exited with non-zero status, thus causing                     
the job to be terminated. The first process to do so was:                                                
                                                                                                         
  Process name: [[43917,1],0]                                                                            
  Exit code:    1                                                                                        
--------------------------------------------------------------------------                               
[login:110455] 1 more process has sent help message help-mca-base.txt / find-available:not-valid         
[login:110455] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages        
[login:110455] 1 more process has sent help message help-mca-base.txt / find-available:none-found        
                           
3.b) host:$  $HPCX_MPI_DIR/bin/mpirun -x MXM_IB_PORTS=mlx4_0:1 -x MXM_SHM_KCOPY_MODE=off -host node5,node153 -mca pml yalla --prefix $HPCX_MPI_DIR ./hello
--------------------------------------------------------------------------                               
A requested component was not found, or was unable to be opened.  This                                   
means that this component is either not installed or is unable to be                                     
used on your system (e.g., sometimes this means that shared libraries                                    
that the component requires are unable to be found/loaded).  Note that                                   
Open MPI stopped checking at the first component that it did not find.                                   
                                                                                                         
Host:      node153                                                                                       
Framework: pml                                                                                           
Component: yalla                                                                                         
--------------------------------------------------------------------------                               
*** An error occurred in MPI_Init                                                                        
--------------------------------------------------------------------------                               
It looks like MPI_INIT failed for some reason; your parallel process is                                  
likely to abort.  There are many reasons that a parallel process can                                     
fail during MPI_INIT; some of which are due to configuration or environment                              
problems.  This failure appears to be an internal failure; here's some                                   
additional information (which may only be relevant to an Open MPI                                        
developer):                                                                                              
                                                                                                         
  mca_pml_base_open() failed                                                                             
  --> Returned "Not found" (-13) instead of "Success" (0)                                                
--------------------------------------------------------------------------                               
*** on a NULL communicator                                                                               
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,                                 
***    and potentially your MPI job)                                                                     
[node153:43979] Local abort before MPI_INIT completed successfully; not able to aggregate error messages,
 and not able to guarantee that all other processes were killed!                                         
-------------------------------------------------------                                                  
Primary job  terminated normally, but 1 process returned                                                 
a non-zero exit code.. Per user-direction, the job has been aborted.                                     
-------------------------------------------------------                                                  
--------------------------------------------------------------------------                               
mpirun detected that one or more processes exited with non-zero status, thus causing                     
the job to be terminated. The first process to do so was:                                                
                                                                                                         
  Process name: [[44992,1],1]                                                                            
  Exit code:    1                                                                                        
--------------------------------------------------------------------------                               



host:$  echo $HPCX_MPI_DIR                                                                         
/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64/ompi-mellanox-v1.8
host:$ ompi_info | grep pml                                                                        
                 MCA pml: v (MCA v2.0, API v2.0, Component v1.8.5)                                       
                 MCA pml: cm (MCA v2.0, API v2.0, Component v1.8.5)                                      
                 MCA pml: bfo (MCA v2.0, API v2.0, Component v1.8.5)                                     
                 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.8.5)                                     
                 MCA pml: yalla (MCA v2.0, API v2.0, Component v1.8.5) 
host: tests$  ompi_info | grep mtl                                   
                 MCA mtl: mxm (MCA v2.0, API v2.0, Component v1.8.5)

P.S.
possible error in the FAQ? (http://www.open-mpi.org/faq/?category=openfabrics#mxm)

47. Does Open MPI support MXM?
............
NOTE: Please note that the 'yalla' pml is available only from Open MPI v1.9 and above
...........
But here we have(or not...) yalla in ompi 1.8.5



Вторник, 26 мая 2015, 9:53 +03:00 от Mike Dubman <miked@dev.mellanox.co.il>:

Hi Timur,

Here it goes:

wget ftp://bgate.mellanox.com/hpc/hpcx/custom/v1.3/hpcx-v1.3.330-icc-OFED-1.5.4.1-redhat6.2-x86_64.tbz

Please let me know if it works for you and will add 1.5.4.1 mofed to the default distribution list.

M


On Mon, May 25, 2015 at 9:38 PM, Timur Ismagilov <tismagilov@mail.ru> wrote:
Thanks a lot.

Понедельник, 25 мая 2015, 21:28 +03:00 от Mike Dubman <miked@dev.mellanox.co.il>:

will send u the link tomorrow.

On Mon, May 25, 2015 at 9:15 PM, Timur Ismagilov <tismagilov@mail.ru> wrote:
Where can i find MXM for ofed 1.5.4.1?


Понедельник, 25 мая 2015, 21:11 +03:00 от Mike Dubman <miked@dev.mellanox.co.il>:

btw, the ofed on your system is 1.5.4.1 while HPCx in use is for ofed 1.5.3

seems like ABI issue between ofed versions

On Mon, May 25, 2015 at 8:59 PM, Timur Ismagilov <tismagilov@mail.ru> wrote:

I did as you said, but got an error:


node1$ export MXM_IB_PORTS=mlx4_0:1
node1$  ./mxm_perftest                                                                            
Waiting for connection...                                                                                
Accepted connection from 10.65.0.253                                                                     
[1432576262.370195] [node153:35388:0]         shm.c:65   MXM  WARN  Could not open the KNEM device file at /dev/knem : No such file or directory. Won't use knem.                                                 
Failed to create endpoint: No such device                                                                

node2$ export MXM_IB_PORTS=mlx4_0:1
node2$ ./mxm_perftest node1  -t send_lat                                                       
[1432576262.367523] [node158:99366:0]         shm.c:65   MXM  WARN  Could not open the KNEM device file at /dev/knem : No such file or directory. Won't use knem.
Failed to create endpoint: No such device




Понедельник, 25 мая 2015, 20:31 +03:00 от Mike Dubman <miked@dev.mellanox.co.il>:

scif is a OFA device from Intel.
can you please select export MXM_IB_PORTS=mlx4_0:1 explicitly and retry

On Mon, May 25, 2015 at 8:26 PM, Timur Ismagilov <tismagilov@mail.ru> wrote:

Hi, Mike,
that is what i have:

$ echo $LD_LIBRARY_PATH | tr ":" "\n"
/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/fca/lib               
/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/hcoll/lib             
/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/mxm/lib               
/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/ompi-mellanox-v1.8/lib
 +intel compiler paths

$ echo $OPAL_PREFIX                                                                     
/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/ompi-mellanox-v1.8

I don't use LD_PRELOAD.

In the attached file(ompi_info.out) you will find the output of ompi_info -l 9  command.

P.S.
node1 $ ./mxm_perftest
node2 $  ./mxm_perftest node1  -t send_lat
[1432568685.067067] [node151:87372:0]         shm.c:65   MXM  WARN  Could not open the KNEM device file $t /dev/knem : No such file or directory. Won't use knem.         ( I don't have knem)
[1432568685.069699] [node151:87372:0]      ib_dev.c:531  MXM  WARN  skipping device scif0 (vendor_id/par$_id = 0x8086/0x0) - not a Mellanox device                               (???)
Failed to create endpoint: No such device

$  ibv_devinfo                                         
hca_id: mlx4_0                                                  
        transport:                      InfiniBand (0)          
        fw_ver:                         2.10.600                
        node_guid:                      0002:c903:00a1:13b0     
        sys_image_guid:                 0002:c903:00a1:13b3     
        vendor_id:                      0x02c9                  
        vendor_part_id:                 4099                    
        hw_ver:                         0x0                     
        board_id:                       MT_1090120019           
        phys_port_cnt:                  2                       
                port:   1                                       
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)        
                        active_mtu:             4096 (5)        
                        sm_lid:                 1               
                        port_lid:               83              
                        port_lmc:               0x00            
                                                                
                port:   2                                       
                        state:                  PORT_DOWN (1)   
                        max_mtu:                4096 (5)        
                        active_mtu:             4096 (5)        
                        sm_lid:                 0               
                        port_lid:               0               
                        port_lmc:               0x00            

Best regards,
Timur.


Понедельник, 25 мая 2015, 19:39 +03:00 от Mike Dubman <miked@dev.mellanox.co.il>:

Hi Timur,
seems that yalla component was not found in your OMPI tree.
can it be that your mpirun is not from hpcx? Can you please check LD_LIBRARY_PATH,PATH, LD_PRELOAD and OPAL_PREFIX that it is pointing to the right mpirun?

Also, could you please check that yalla is present in the ompi_info -l 9 output?

Thanks

On Mon, May 25, 2015 at 7:11 PM, Timur Ismagilov <tismagilov@mail.ru> wrote:
I can password-less ssh to all nodes:
base$ ssh node1
node1$ssh node2
Last login: Mon May 25 18:41:23
node2$ssh node3
Last login: Mon May 25 16:25:01
node3$ssh node4
Last login: Mon May 25 16:27:04
node4$

Is this correct?

In ompi-1.9 i do not have no-tree-spawn problem.


Понедельник, 25 мая 2015, 9:04 -07:00 от Ralph Castain <rhc@open-mpi.org>:

I can’t speak to the mxm problem, but the no-tree-spawn issue indicates that you don’t have password-less ssh authorized between the compute nodes


On May 25, 2015, at 8:55 AM, Timur Ismagilov <tismagilov@mail.ru> wrote:

Hello!

I use ompi-v1.8.4 from hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2;
OFED-1.5.4.1;
CentOS release 6.2;
infiniband 4x FDR



I have two problems:
1. I can not use mxm:
1.a) $mpirun --mca pml cm --mca mtl mxm -host node5,node14,node28,node29 -mca plm_rsh_no_tree_spawn 1 -np 4 ./hello
--------------------------------------------------------------------------                               
A requested component was not found, or was unable to be opened.  This                                   
means that this component is either not installed or is unable to be                                     
used on your system (e.g., sometimes this means that shared libraries                                    
that the component requires are unable to be found/loaded).  Note that                                   
Open MPI stopped checking at the first component that it did not find.                                   
                                                                                                         
Host:      node14                                                                                        
Framework: pml                                                                                           
Component: yalla                                                                                         
--------------------------------------------------------------------------                               
*** An error occurred in MPI_Init                                                                        
--------------------------------------------------------------------------                               
It looks like MPI_INIT failed for some reason; your parallel process is                                  
likely to abort.  There are many reasons that a parallel process can                                     
fail during MPI_INIT; some of which are due to configuration or environment                              
problems.  This failure appears to be an internal failure; here's some                                   
additional information (which may only be relevant to an Open MPI                                        
developer):                                                                                              
                                                                                                         
  mca_pml_base_open() failed                                                                             
  --> Returned "Not found" (-13) instead of "Success" (0)                                                
--------------------------------------------------------------------------                               
*** on a NULL communicator                                                                               
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,                                 
***    and potentially your MPI job)                                                                     
*** An error occurred in MPI_Init                                                                        
[node28:102377] Local abort before MPI_INIT completed successfully; not able to aggregate error messages,
 and not able to guarantee that all other processes were killed!                                         
*** on a NULL communicator                                                                               
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,                                 
***    and potentially your MPI job)                                                                     
[node29:105600] Local abort before MPI_INIT completed successfully; not able to aggregate error messages,
 and not able to guarantee that all other processes were killed!                                         
*** An error occurred in MPI_Init                                                                        
*** on a NULL communicator                                                                               
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,                                 
***    and potentially your MPI job)                                                                     
[node5:102409] Local abort before MPI_INIT completed successfully; not able to aggregate error messages,
and not able to guarantee that all other processes were killed!                                          
*** An error occurred in MPI_Init                                                                        
*** on a NULL communicator                                                                               
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,                                 
***    and potentially your MPI job)                                                                     
[node14:85284] Local abort before MPI_INIT completed successfully; not able to aggregate error messages,
and not able to guarantee that all other processes were killed!                                          
-------------------------------------------------------                                                  
Primary job  terminated normally, but 1 process returned                                                 
a non-zero exit code.. Per user-direction, the job has been aborted.                                     
-------------------------------------------------------                                                  
--------------------------------------------------------------------------                               
mpirun detected that one or more processes exited with non-zero status, thus causing                     
the job to be terminated. The first process to do so was:                                                
                                                                                                         
  Process name: [[9372,1],2]
  Exit code:    1                                                                                        
--------------------------------------------------------------------------                               
[login:08295] 3 more processes have sent help message help-mca-base.txt / find-available:not-valid       
[login:08295] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages         
[login:08295] 3 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failur
e                                                                                                        

1.b $mpirun --mca pml yalla -host node5,node14,node28,node29 -mca plm_rsh_no_tree_spawn 1 -np 4 ./hello
--------------------------------------------------------------------------                              
A requested component was not found, or was unable to be opened.  This                                  
means that this component is either not installed or is unable to be                                    
used on your system (e.g., sometimes this means that shared libraries                                   
that the component requires are unable to be found/loaded).  Note that                                  
Open MPI stopped checking at the first component that it did not find.                                  
                                                                                                        
Host:      node5                                                                                        
Framework: pml                                                                                          
Component: yalla                                                                                        
--------------------------------------------------------------------------                              
*** An error occurred in MPI_Init                                                                       
*** on a NULL communicator                                                                              
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,                                
***    and potentially your MPI job)                                                                    
[node5:102449] Local abort before MPI_INIT completed successfully; not able to aggregate error messages,
and not able to guarantee that all other processes were killed!                                         
--------------------------------------------------------------------------                              
It looks like MPI_INIT failed for some reason; your parallel process is                                 
likely to abort.  There are many reasons that a parallel process can                                    
fail during MPI_INIT; some of which are due to configuration or environment                             
problems.  This failure appears to be an internal failure; here's some                                  
additional information (which may only be relevant to an Open MPI                                       
developer):                                                                                             
                                                                                                        
  mca_pml_base_open() failed                                                                            
  --> Returned "Not found" (-13) instead of "Success" (0)                                               
--------------------------------------------------------------------------                              
-------------------------------------------------------                                                 
Primary job  terminated normally, but 1 process returned                                                
a non-zero exit code.. Per user-direction, the job has been aborted.                                    
-------------------------------------------------------                                                 
*** An error occurred in MPI_Init                                                                       
*** on a NULL communicator                                                                              
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,                                
***    and potentially your MPI job)                                                                    
[node14:85325] Local abort before MPI_INIT completed successfully; not able to aggregate error messages,
and not able to guarantee that all other processes were killed!                                         
--------------------------------------------------------------------------                              
mpirun detected that one or more processes exited with non-zero status, thus causing                    
the job to be terminated. The first process to do so was:                                               
                                                                                                        
  Process name: [[9619,1],0]                                                                            
  Exit code:    1                                                                                       
--------------------------------------------------------------------------                              
[login:08552] 1 more process has sent help message help-mca-base.txt / find-available:not-valid         
[login:08552] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages        

2. I can not remove -mca plm_rsh_no_tree_spawn 1 from mpirun cmd line:

$mpirun -host node5,node14,node28,node29 -np 4 ./hello
sh: -c: line 0: syntax error near unexpected token `--tree-spawn'                                        
sh: -c: line 0: `( test ! -r ./.profile || . ./.profile; OPAL_PREFIX=/gpfs/NETHOME/oivt1/nicevt/itf/sourc
es/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/ompi-mellanox-v1.8 ; export OPAL_PREFIX; PATH=/gpfs/NETHOME/o
ivt1/nicevt/itf/sources/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/ompi-mellanox-v1.8/bin:$PATH ; export PA
TH ; LD_LIBRARY_PATH=/gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/ompi
-mellanox-v1.8/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; DYLD_LIBRARY_PATH=/gpfs/NETHOME/oivt1/nice
vt/itf/sources/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/ompi-mellanox-v1.8/lib:$DYLD_LIBRARY_PATH ; expor
t DYLD_LIBRARY_PATH ;   /gpfs/NETHOME/oivt1/nicevt/itf/sources/hpcx-v1.3.0-327-icc-OFED-1.5.3-redhat6.2/o
mpi-mellanox-v1.8/bin/orted --hnp-topo-sig 2N:2S:2L3:16L2:16L1:16C:32H:x86_64 -mca ess "env" -mca orte_es
s_jobid "625606656" -mca orte_ess_vpid 3 -mca orte_ess_num_procs "5" -mca orte_parent_uri "625606656.1;tc
p://10.65.0.105,10.64.0.105,10.67.0.105:56862" -mca orte_hnp_uri "625606656.0;tcp://10.65.0.2,10.67.0.2,8
3.149.214.101,10.64.0.2:54893" --mca pml "yalla" -mca plm_rsh_no_tree_spawn "0" -mca plm "rsh" ) --tree-s
pawn'                                                                                                    
--------------------------------------------------------------------------                               
ORTE was unable to reliably start one or more daemons.                                                   
This usually is caused by:                                                                               
                                                                                                         
* not finding the required libraries and/or binaries on                                                  
  one or more nodes. Please check your PATH and LD_LIBRARY_PATH                                          
  settings, or configure OMPI with --enable-orterun-prefix-by-default                                    
                                                                                                         
* lack of authority to execute on one or more specified nodes.                                           
  Please verify your allocation and authorities.                                                         
                                                                                                         
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).                            
  Please check with your sys admin to determine the correct location to use.                             
                                                                                                         
*  compilation of the orted with dynamic libraries when static are required                              
  (e.g., on Cray). Please check your configure cmd line and consider using                               
  one of the contrib/platform definitions for your system type.                                          
                                                                                                         
* an inability to create a connection back to mpirun due to a                                            
  lack of common network interfaces and/or no route found between                                        
  them. Please check network connectivity (including firewalls                                           
  and network routing requirements).                                                                     
--------------------------------------------------------------------------                               
mpirun: abort is already in progress...hit ctrl-c again to forcibly terminate                          
                                                                                                         

Thank you for your comments.
 
Best regards,
Timur.
 



_______________________________________________
users mailing list
users@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2015/05/26919.php





_______________________________________________
users mailing list
users@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2015/05/26922.php



--

Kind Regards,

M.






--

Kind Regards,

M.






--

Kind Regards,

M.






--

Kind Regards,

M.






--

Kind Regards,

M.






--

Kind Regards,

M.






--

Kind Regards,

M.









_______________________________________________
users mailing list
users@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2015/05/26965.php



--

Kind Regards,

M.

Reply via email to