binding is somehow involved in this, and i do not believe vader nor
openib are involved here.
Could you please run again with the two ompi versions but in the *same*
job ?
and before invoking mpirun, could you do
env | grep SLURM
per your slurm request, you are running 64 tasks on 4 nodes.
with 1.8.4, you end up running 14+14+14+22 tasks (not ideal, but quite
balanced)
with 1.10.1, you end up running 2+2+12+48 tasks (very unbalanced)
so it is quite unfair to compare these two runs.
also, still in the same job, can you add a third run with 1.10.1 and the
following options
mpirun --hetero-nodes -bind-to core -map-by core ...
and see if it helps
Cheers,
Gilles
On 12/17/2015 6:47 AM, Jingchao Zhang wrote:
Those jobs were launched with mpirun. Please see the attached files
for the binding report with OMPI_MCA_hwloc_base_report_bindings=1.
Here is a snapshot for v-1.10.1:
[c2613.tusker.hcc.unl.edu:12049] MCW rank 0 is not bound (or bound to
all available processors)
[c2613.tusker.hcc.unl.edu:12049] MCW rank 1 is not bound (or bound to
all available processors)
[c2615.tusker.hcc.unl.edu:11136] MCW rank 2 is not bound (or bound to
all available processors)
[c2615.tusker.hcc.unl.edu:11136] MCW rank 3 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 9 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 10 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 11 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 12 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 13 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 14 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 15 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 4 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 5 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 6 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 7 is not bound (or bound to
all available processors)
[c2907.tusker.hcc.unl.edu:64131] MCW rank 8 is not bound (or bound to
all available processors)
The report for 1.8.4 doesn't have this issue. Any
suggestions to resolve it?
Thanks,
Jingchao
Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400
------------------------------------------------------------------------
*From:* users <users-boun...@open-mpi.org> on behalf of Ralph Castain
<r...@open-mpi.org>
*Sent:* Wednesday, December 16, 2015 1:52 PM
*To:* Open MPI Users
*Subject:* Re: [OMPI users] performance issue with OpenMPI 1.10.1
When I see such issues, I immediately start to think about binding
patterns. How are these jobs being launched - with mpirun or srun?
What do you see if you set OMPI_MCA_hwloc_base_report_bindings=1 in
your environment?
On Dec 16, 2015, at 11:15 AM, Jingchao Zhang <zh...@unl.edu
<mailto:zh...@unl.edu>> wrote:
Hi Gilles,
The LAMMPS jobs for both versions are pure MPI. In the SLURM script,
64 cores are requested from 4 nodes. So it's 64 MPI tasks and not
necessarily evenly distributed across all the nodes. (each node is
equipped with 64 cores.)
I can reproduce the performance issue using the LAMMPS example
"VISCOSITY/in.wall.2d". The run time difference is a jaw-dropping 20
seconds (v-1.8.4) vs. 45 mins (v-1.10.1). Among the multiple tests, I
do have one job using v-1.10.1 finished in 20 seconds. Again,
unstable performance. We also tested other software packages such as
cp2k, VASP and Quantum Espresso, and they all have similar issues.
Here is the decomposed MPI time in the LAMMPS job outputs.
v-1.8.4 (Job execution time: 00:00:20)
Loop time of 8.94962 on 64 procs for 50000 steps with 1020 atoms
Pair time (%) = 0.270092 (3.01791)
Neigh time (%) = 0.0842548 (0.941435)
Comm time (%) = 3.3474 (37.4027)
Outpt time (%) = 0.00901061 (0.100682)
Other time (%) = 5.23886 (58.5373)
v-1.10.1 (Job execution time: 00:45:50)
Loop time of 2003.07 on 64 procs for 50000 steps with 1020 atoms
Pair time (%) = 0.346776 (0.0173122)
Neigh time (%) = 0.18047 (0.00900966)
Comm time (%) = 535.836 (26.7508)
Outpt time (%) = 1.68608 (0.0841748)
Other time (%) = 1465.02 (73.1387)
I wonder if you can share your config.log and ompi_info with your
v-1.10.1 compilation. Hopefully we can find a solution by
comparing the configuration differences. We had been playing with the
cma and vader parameters but with no luck.
Thanks,
Jingchao
Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400
------------------------------------------------------------------------
*From:*users <users-boun...@open-mpi.org
<mailto:users-boun...@open-mpi.org>> on behalf of Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>>
*Sent:*Tuesday, December 15, 2015 12:11 AM
*To:*Open MPI Users
*Subject:*Re: [OMPI users] performance issue with OpenMPI 1.10.1
Hi,
First, can you check how many MPI tasks and OpenMP threads are used
with both ompi versions ?
/* it should be 16 MPI tasks x no OpenMP threads */
can you also post both MPI task timing breakdown (from the output)
i tried a simple test with the VISCOSITY/in.wall.2d and i did not
observe any performance difference.
can you reproduce the performance drop with an input file from the
examples directory ?
if not, can you post your in.snr input file ?
Cheers,
Gilles
On 12/15/2015 7:18 AM, Jingchao Zhang wrote:
Hi all,
We installed the latest release of OpenMPI 1.10.1 on our Linux
cluster and find it having some performance issues. We tested the
OpenMPI performance against the MD simulation package LAMMPS
(http://lammps.sandia.gov/). Compared to our previous installation
of version 1.8.4, the 1.10.1 is nearly three times slower when
running on multiple nodes. Run time across four computing nodes have
the following results:
1.10.1 1.8.4
1 0:09:39 0:09:21
2 0:50:29 0:09:23
3 0:50:29 0:09:28
4 0:13:38 0:09:27
5 0:10:43 0:09:34
Ave 0:27:00 0:09:27
Unit is hour:minute:second. Five tests are done for each case and
the averaged run time is listed in the last row. Tests on single
node have the same run time results for both 1.10.1 and 1.8.4.
We use SLURM as our job scheduler and the submit script for the
LAMMPS job is as below:
"#!/bin/sh
#SBATCH -N 4
#SBATCH -n 64
#SBATCH --mem=2g
#SBATCH --time=00:50:00
#SBATCH --error=job.%J.err
#SBATCH --output=job.%J.out
module load compiler/gcc/4.7
export PATH=$PATH:/util/opt/openmpi/1.10.1/gcc/4.7/bin
export
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/util/opt/openmpi/1.10.1/gcc/4.7/lib
export INCLUDE=$INCLUDE:/util/opt/openmpi/1.10.1/gcc/4.7/include
mpirun lmp_ompi_g++ < in.snr"
The "lmp_ompi_g++" binary is compiled against gcc/4.7 and
openmpi/1.10.1. The compiler flags and MPI information can be found
in the attachments. The problem here as you can see is the
unstable performance for v-1.10.1. I wonder if this is a
configuration issue at the compilation stage.
Below are some information I gathered according to the "Getting
Help" page.
Version of Open MPI that we are using:
Open MPI version: 1.10.1
Open MPI repo revision: v1.10.0-178-gb80f802
Open MPI release date: Nov 03, 2015
"config.log" and "ompi_info --all" information are enclosed in the
attachment.
Network information:
1. OpenFabrics version
Mellanox/vendor 2.4-1.0.4
Download:<http://www.mellanox.com/page/mlnx_ofed_eula?mtag=linux_sw_drivers&mrequest=downloads&mtype=ofed&mver=MLNX_OFED-2.4-1.0.4&mname=MLNX_OFED_LINUX-2.4-1.0.4-rhel6.6-x86_64.tgz>
2. Linux version
Scientific Linux release 6.6
2.6.32-504.23.4.el6.x86_64
3. subnet manager
OpenSM
4. ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.9.1000
node_guid: 0002:c903:0050:6190
sys_image_guid: 0002:c903:0050:6193
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xB0
board_id: MT_0D90110009
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 34
port_lmc: 0x00
link_layer: InfiniBand
5. ifconfig
em1 Link encap:Ethernet HWaddr D0:67:E5:F9:20:76
inet addr:10.138.25.3 Bcast:10.138.255.255 Mask:255.255.0.0
inet6 addr: fe80::d267:e5ff:fef9:2076/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:28977969 errors:0 dropped:0 overruns:0 frame:0
TX packets:67069501 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3588666680 (3.3 GiB) TX bytes:8145183622 (7.5 GiB)
Ifconfig uses the ioctl access method to get the full address
information, which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are
displayed correctly.
Ifconfig is obsolete! For replacement check ip.
ib0 Link encap:InfiniBand HWaddr
A0:00:02:20:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:10.137.25.3 Bcast:10.137.255.255 Mask:255.255.0.0
inet6 addr: fe80::202:c903:50:6191/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:1776 errors:0 dropped:0 overruns:0 frame:0
TX packets:418 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1024
RX bytes:131571 (128.4 KiB) TX bytes:81418 (79.5 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:40310687 errors:0 dropped:0 overruns:0 frame:0
TX packets:40310687 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:45601859442 (42.4 GiB) TX bytes:45601859442
(42.4 GiB)
6. ulimit -l
unlimited
Please kindly let me know if more information are needed.
Thanks,
Jingchao
Dr. Jingchao Zhang
Holland Computing Center
University of Nebraska-Lincoln
402-472-6400
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2015/12/28160.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2015/12/28166.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/12/28169.php