If it is helpful, I can try to compile OpenMPI with debug information and get 
more details on the reported error. However, it would be good if someone could 
tell me the necessary compile flags (on top of -O0 -g) and it would take me 
probably 1-2 weeks to do it.

Michael


-------- Original message --------
From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
List-Post: users@lists.open-mpi.org
Date: 29/07/2015 14:17 (GMT+01:00)
To: Open MPI Users <us...@open-mpi.org>
Subject: Re: [OMPI users] Invalid read of size 4 (Valgrind error) with OpenMPI 
1.8.7

Thomas,

can you please elaborate ?
I checked the code of opal_os_dirpath_create and could not find where such a 
thing can happen

Thanks,

Gilles

On Wednesday, July 29, 2015, Thomas Jahns <ja...@dkrz.de<mailto:ja...@dkrz.de>> 
wrote:
Hello,

On 07/28/15 17:34, Schlottke-Lakemper, Michael wrote:
That’s what I suspected. Thank you for your confirmation.

you are mistaken, the allocation is 51 bytes long, i.e. valid bytes are at 
offsets 0 to 50. But since the read of 4 bytes starts at offset 48, the bytes 
at offsets 48, 49, 50 and 51 get read, the last of which is illegal. It 
probably does no harm at the moment in practice, because virtually all 
allocators always add some padding to the next multiple of some power of 2. But 
still this means the program is incorrect in terms of any programming language 
definition involved (might be C, C++ or Fortran).

Regards, Thomas

On 25 Jul 2015, at 16:10 , Ralph Castain <r...@open-mpi.org
<mailto:r...@open-mpi.org>> wrote:

Looks to me like a false positive - we do malloc some space, and do access
different parts of it. However, it looks like we are inside the space at all
times.

I’d suppress it


On Jul 23, 2015, at 12:47 AM, Schlottke-Lakemper, Michael
<m.schlottke-lakem...@aia.rwth-aachen.de
<mailto:m.schlottke-lakem...@aia.rwth-aachen.de>> wrote:

Hi folks,

recently we’ve been getting a Valgrind error in PMPI_Init for our suite of
regression tests:

==5922== Invalid read of size 4
==5922==    at 0x61CC5C0: opal_os_dirpath_create (in
/aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2)
==5922==    by 0x5F207E5: orte_session_dir (in
/aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
==5922==    by 0x5F34F04: orte_ess_base_app_setup (in
/aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
==5922==    by 0x7E96679: rte_init (in
/aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so)
==5922==    by 0x5F12A77: orte_init (in
/aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
==5922==    by 0x509883C: ompi_mpi_init (in
/aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
==5922==    by 0x50B843A: PMPI_Init (in
/aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
==5922==    by 0xEBA79C: ZFS::run() (in
/aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
==5922==    by 0x4DC243: main (in
/aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
==5922==  Address 0x710f670 is 48 bytes inside a block of size 51 alloc'd
==5922==    at 0x4C29110: malloc (in
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5922==    by 0x61CC572: opal_os_dirpath_create (in
/aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2)
==5922==    by 0x5F207E5: orte_session_dir (in
/aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
==5922==    by 0x5F34F04: orte_ess_base_app_setup (in
/aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
==5922==    by 0x7E96679: rte_init (in
/aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so)
==5922==    by 0x5F12A77: orte_init (in
/aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
==5922==    by 0x509883C: ompi_mpi_init (in
/aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
==5922==    by 0x50B843A: PMPI_Init (in
/aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
==5922==    by 0xEBA79C: ZFS::run() (in
/aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
==5922==    by 0x4DC243: main (in
/aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
==5922==

What is weird is that it seems to depend on the pbs/torque session we’re in:
sometimes the error does not occur and all and all tests run fine (this is in
fact the only Valgrind error we’re having at the moment). Other times every
single test we’re running has this error.

Has anyone seen this or might be able to offer an explanation? If it is a
false-positive, I’d be happy to suppress it :)

Thanks a lot in advance

Michael

P.S.: This error is not covered/suppressed by the default ompi suppression
file in $PREFIX/share/openmpi.


--
Michael Schlottke-Lakemper

SimLab Highly Scalable Fluids & Solids Engineering
Jülich Aachen Research Alliance (JARA-HPC)
RWTH Aachen University
Wüllnerstraße 5a
52062 Aachen
Germany

Phone: +49 (241) 80 95188
Fax: +49 (241) 80 92257
Mail: m.schlottke-lakem...@aia.rwth-aachen.de
<mailto:m.schlottke-lakem...@aia.rwth-aachen.de>
Web: http://www.jara.org/jara-hpc

_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/07/27303.php

_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/07/27328.php



_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/07/27348.php



--
Thomas Jahns
HD(CP)^2
Abteilung Anwendungssoftware

Deutsches Klimarechenzentrum GmbH
Bundesstraße 45a • D-20146 Hamburg • Germany

Phone:  +49 40 460094-151
Fax:    +49 40 460094-270
Email:  Thomas Jahns <ja...@dkrz.de>
URL:    www.dkrz.de<http://www.dkrz.de>

Geschäftsführer: Prof. Dr. Thomas Ludwig
Sitz der Gesellschaft: Hamburg
Amtsgericht Hamburg HRB 39784

Reply via email to