Hi all,

We installed the latest release of OpenMPI 1.10.1 on our Linux cluster and find 
it having some performance issues. We tested the OpenMPI performance against 
the MD simulation package LAMMPS (http://lammps.sandia.gov/). Compared to our 
previous installation of version 1.8.4, the 1.10.1 is nearly three times slower 
when running on multiple nodes. Run time across four computing nodes have the 
following results:

        1.10.1  1.8.4
1       0:09:39 0:09:21
2       0:50:29 0:09:23
3       0:50:29 0:09:28
4       0:13:38 0:09:27
5       0:10:43 0:09:34
Ave     0:27:00 0:09:27


Unit is hour:minute:second. Five tests are done for each case and the averaged 
run time is listed in the last row. Tests on single node have the same run time 
results for both 1.10.1 and 1.8.4.


We use SLURM as our job scheduler and the submit script for the LAMMPS job is 
as below:

"#!/bin/sh
#SBATCH -N 4
#SBATCH -n 64
#SBATCH --mem=2g
#SBATCH --time=00:50:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out

module load compiler/gcc/4.7
export PATH=$PATH:/util/opt/openmpi/1.10.1/gcc/4.7/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/util/opt/openmpi/1.10.1/gcc/4.7/lib
export INCLUDE=$INCLUDE:/util/opt/openmpi/1.10.1/gcc/4.7/include

mpirun lmp_ompi_g++ < in.snr"

The "lmp_ompi_g++" binary is compiled against gcc/4.7 and openmpi/1.10.1. The 
compiler flags and MPI information can be found in the attachments. The problem 
here as you can see is the unstable performance for v-1.10.1. I wonder if this 
is a configuration issue at the compilation stage.


Below are some information I gathered according to the "Getting Help" page.

Version of Open MPI that we are using:

Open MPI version: 1.10.1

Open MPI repo revision: v1.10.0-178-gb80f802
Open MPI release date: Nov 03, 2015

"config.log" and "ompi_info --all" information are enclosed in the attachment.

Network information:
1. OpenFabrics version
Mellanox/vendor 2.4-1.0.4 
Download:<http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-2.4-1.0.4&mname=MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64.tgz>

2. Linux version
Scientific Linux release 6.6
2.6.32-504.23.4.el6.x86_64

3. subnet manager
OpenSM

4. ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.9.1000
        node_guid:                      0002:c903:0050:6190
        sys_image_guid:                 0002:c903:0050:6193
        vendor_id:                      0x02c9
        vendor_part_id:                 26428
        hw_ver:                         0xB0
        board_id:                       MT_0D90110009
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               34
                        port_lmc:               0x00
                        link_layer:             InfiniBand

5. ifconfig
em1       Link encap:Ethernet  HWaddr D0:67:E5:F9:20:76
          inet addr:10.138.25.3  Bcast:10.138.255.255  Mask:255.255.0.0
          inet6 addr: fe80::d267:e5ff:fef9:2076/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:28977969 errors:0 dropped:0 overruns:0 frame:0
          TX packets:67069501 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3588666680 (3.3 GiB)  TX bytes:8145183622 (7.5 GiB)

Ifconfig uses the ioctl access method to get the full address information, 
which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed 
correctly.
Ifconfig is obsolete! For replacement check ip.
ib0       Link encap:InfiniBand  HWaddr 
A0:00:02:20:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
          inet addr:10.137.25.3  Bcast:10.137.255.255  Mask:255.255.0.0
          inet6 addr: fe80::202:c903:50:6191/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:1776 errors:0 dropped:0 overruns:0 frame:0
          TX packets:418 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:131571 (128.4 KiB)  TX bytes:81418 (79.5 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:40310687 errors:0 dropped:0 overruns:0 frame:0
          TX packets:40310687 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:45601859442 (42.4 GiB)  TX bytes:45601859442 (42.4 GiB)

6. ulimit -l
unlimited

Please kindly let me know if more information are needed.

Thanks,
Jingchao

Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400

Attachment: MPIConfig.tar.bz2
Description: MPIConfig.tar.bz2

Reply via email to