[OMPI users] Run-time issues with openmpi-2.0.2 and gcc

2017-04-13 Thread Vincent Drach
Dear mailing list, We are experimenting run time failure on a small cluster with openmpi-2.0.2 and gcc 6.3 and gcc 5.4. The job start normally and lots of communications are performed. After 5-10 minutes the connection to the hosts is closed and the following error message is reported: --

[OMPI users] Support for 50G/100G HCA in openmpi

2017-04-13 Thread Devesh Sharma via users
Hello list, I am trying to run IMB using openmpi-2.0.1/2.1.0 on a 50G 2-node cluster in my lab, but the test does not start. it fails with following error: Starting for 0 th iteration. Using openmpi LOGPATH: /MPI/Logs/openmpi/imb/runlog-openmpi-np6-n2-0 ---

Re: [OMPI users] Run-time issues with openmpi-2.0.2 and gcc

2017-04-13 Thread Reuti
Hi, > Am 13.04.2017 um 11:00 schrieb Vincent Drach : > > > Dear mailing list, > > We are experimenting run time failure on a small cluster with openmpi-2.0.2 > and gcc 6.3 and gcc 5.4. > The job start normally and lots of communications are performed. After 5-10 > minutes the connection to t

Re: [OMPI users] Run-time issues with openmpi-2.0.2 and gcc

2017-04-13 Thread Gilles Gouaillardet
Vincent, Can you try a small program such as examples/ring_c.c ? Does your app do MPI_Comm_spawn and friends ? Can you post your mpirun command line ? Are you using a batch manager ? This error message is typical of unresolved libraries. (E.g. "ssh host ldd orted" fails to resolve some libs becau

[OMPI users] openmpi hang observed with RoCE transport and P_Write_Indv test

2017-04-13 Thread Sriharsha Basavapatna via users
Hi, I'm seeing an issue with Openmpi Version 2.0.1. The setup uses 2 nodes with 1 process on each node and the test case is P_Write_Indv. The problem occurs when the test runs 4MB byte size and the mode is NON-AGGREGATE. The test just hangs at that point. Here's the exact command/options that's be

Re: [OMPI users] openmpi hang observed with RoCE transport and P_Write_Indv test

2017-04-13 Thread Jeff Squyres (jsquyres)
Can you try the latest version of Open MPI? There have been bug fixes in the MPI one-sided area. Try Open MPI v2.1.0, or v2.0.2 if you want to stick with the v2.0.x series. I think there have been some post-release one-sided fixes, too -- you may also want to try nightly snapshots on both of

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-13 Thread Heinz-Ado Arnolds
Dear Gilles, thanks a lot for your response! 1. You're right, my stupid error, I forgot the "export" of OMP_PROC_BIND in my job script. Now this example is working nearly as expected: [pascal-1-07:25617] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-13 Thread Heinz-Ado Arnolds
On 13.04.2017 15:20, gil...@rist.or.jp wrote: ... > in your second case, there are 2 things > - MPI binds to socket, that is why two MPI tasks are assigned the same > hyperthreads > - the GNU OpenMP runtime looks unable to figure out 2 processes use the > same cores, and hence end up binding >

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-13 Thread r...@open-mpi.org
You can always specify a particular number of cpus to use for each process by adding it to the map-by directive: mpirun -np 8 --map-by ppr:2:socket:pe=5 --use-hwthread-cpus -report-bindings --mca plm_rsh_agent "qrsh" ./myid would map 2 processes to each socket, binding each process to 5 HTs on

Re: [OMPI users] scheduling to real cores, not using hyperthreading (openmpi-2.1.0)

2017-04-13 Thread Heinz-Ado Arnolds
Dear Ralph, thanks a lot for this valuable advice. Binding now works like expected! Since adding the ":pe=" option I'm getting warnings WARNING: a request was made to bind a process. While the system supports binding the process itself, at least one node does NOT support binding memory to

Re: [OMPI users] openmpi hang observed with RoCE transport and P_Write_Indv test

2017-04-13 Thread Sriharsha Basavapatna via users
Hi Jeff, The same problem is seen with OpenMPI v2.1.0 too. Thanks, -Harsha On Thu, Apr 13, 2017 at 4:41 PM, Jeff Squyres (jsquyres) wrote: > Can you try the latest version of Open MPI? There have been bug fixes in the > MPI one-sided area. > > Try Open MPI v2.1.0, or v2.0.2 if you want to st