Hi Gilles,

ok I patched the file,  without valgrind it exploded at MPI_File_close:

*** Error in `/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/bin/Test.NormesEtProjectionChamp.dev': free(): invalid next size (normal): 0x0000000004b6c950 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7ac56)[0x7fab5692bc56]
/lib64/libc.so.6(+0x7b9d3)[0x7fab5692c9d3]
/opt/openmpi-1.8.4rc3_debug/lib64/openmpi/mca_io_romio.so(ADIOI_Free_fn+0x5f)[0x7fab4c1b9920]
/opt/openmpi-1.8.4rc3_debug/lib64/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xc6)[0x7fab4c185afa]
/opt/openmpi-1.8.4rc3_debug/lib64/openmpi/mca_io_romio.so(mca_io_romio_file_close+0x2be)[0x7fab4c180e88]
/opt/openmpi-1.8.4rc3_debug/lib64/libmpi.so.1(+0x4c09c)[0x7fab574ed09c]
/opt/openmpi-1.8.4rc3_debug/lib64/libmpi.so.1(+0x4af4b)[0x7fab574ebf4b]
/opt/openmpi-1.8.4rc3_debug/lib64/libmpi.so.1(ompi_file_close+0xd7)[0x7fab574eca0d]
/opt/openmpi-1.8.4rc3_debug/lib64/libmpi.so.1(PMPI_File_close+0xc1)[0x7fab57572e62]
/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/lib/libgiref_dev_Champs.so(_ZN18GISLectureEcritureIdE9litGISMPIESsR13GroupeInfoSurIdERSs+0x258f)[0x7fab658bb637]
/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/lib/libgiref_dev_Champs.so(_ZN5Champ16importeParalleleERKSs+0x2ae)[0x7fab65898f0e]
/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/bin/Test.NormesEtProjectionChamp.dev[0x4d0def]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab568d2a15]
/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/bin/Test.NormesEtProjectionChamp.dev[0x4b1429]


I will launch it in valgrind now...

but since it last 20 minutes, I will send the result tomorrow only...

anyway, merci beaucoup! :-)

Eric

On 12/14/2014 10:26 PM, Gilles Gouaillardet wrote:
Eric,

here is a patch for the v1.8 series, it fixes a one byte overflow.

valgrind should stop complaining, and assuming this is the root cause of
the memory corruption,
that could also fix your program.

that being said, shared_fp_fname is limited to 255 characters (this is
hard coded) so even if
it gets truncated to 255 characters (instead of 256), the behavior could
be kind of random.

/* from ADIOI_Shfp_fname :
   If the real file is /tmp/thakur/testfile, the shared-file-pointer
    file will be /tmp/thakur/.testfile.shfp.xxxx, where xxxx is

FWIW, xxxx is a random number that takes between 1 and 10 characters

could you please give this patch a try and let us know the results ?

Cheers,

Gilles

On 2014/12/15 11:52, Eric Chamberland wrote:
Hi again,

some new hints that might help:

1- With valgrind     : If I run the same test case, same data, but
moved to a shorter path+filename, then *valgrind* does *not*
complains!!!!!!
2- Without valgrind: *Sometimes*, the test case with long
path+filename passes without "segfaulting"!
3- It seems to happen at the fourth file I try to open using the
following described procedure:

Also, I was wondering about this: In this 2 processes test case
(running in the same node), I :

1- open the file collectively (which resides on the same ssd drive on
my computer)
2-  MPI_File_read_at_all a long int and 3 chars (11 bytes)
3- stop (because I detect I am not reading my MPIIO file format)
4- close the file

A guess (FWIW): Can process rank 0, for example close the file too
quickly, which destroys the string reserved for the filename that is
used by process rank 1 which could be using shared memory on the same
node?

Thanks,

Eric

On 12/14/2014 02:06 PM, Eric Chamberland wrote:
Hi,

I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for
my problem with collective MPI I/O.

A problem still there.  In this 2 processes example, process rank 1
dies with segfault while process rank 0 wait indefinitely...

Running with valgrind, I found these errors which may gives hints:

*************************************************
Rank 1:
*************************************************
On process rank 1, without valgrind it ends with either a segmentation
violation or memory corruption or invalide free without valgrind).

But running with valgrind, it tells:

==16715== Invalid write of size 2
==16715==    at 0x4C2E793: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==16715==    by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321)
==16715==    by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match
(pml_ob1_recvfrag.c:225)
==16715==    by 0x2544110C: mca_btl_vader_check_fboxes
(btl_vader_fbox.h:220)
==16715==    by 0x25443577: mca_btl_vader_component_progress
(btl_vader_component.c:695)
==16715==    by 0x1F5F0F27: opal_progress (opal_progress.c:207)
==16715==    by 0x1ACB40B3: opal_condition_wait (condition.h:93)
==16715==    by 0x1ACB4201: ompi_request_wait_completion (request.h:381)
==16715==    by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39)
==16715==    by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic
(coll_tuned_bcast.c:254)
==16715==    by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial
(coll_tuned_bcast.c:385)
==16715==    by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed
(coll_tuned_decision_fixed.c:258)
==16715==    by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110)
==16715==    by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67)
==16715==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16715==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16715==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16715==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16715==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16715==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16715==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16715==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16715==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16715==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
==16715==  Address 0x32ef3e50 is 0 bytes after a block of size 256
alloc'd
==16715==    at 0x4C2C5A4: malloc (vg_replace_malloc.c:296)
==16715==    by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50)
==16715==    by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25)
==16715==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16715==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16715==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16715==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16715==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16715==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16715==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16715==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16715==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16715==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
...
...
==16715== Invalid write of size 1
==16715==    at 0x4C2E7BB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==16715==    by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321)
==16715==    by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match
(pml_ob1_recvfrag.c:225)
==16715==    by 0x2544110C: mca_btl_vader_check_fboxes
(btl_vader_fbox.h:220)
==16715==    by 0x25443577: mca_btl_vader_component_progress
(btl_vader_component.c:695)
==16715==    by 0x1F5F0F27: opal_progress (opal_progress.c:207)
==16715==    by 0x1ACB40B3: opal_condition_wait (condition.h:93)
==16715==    by 0x1ACB4201: ompi_request_wait_completion (request.h:381)
==16715==    by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39)
==16715==    by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic
(coll_tuned_bcast.c:254)
==16715==    by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial
(coll_tuned_bcast.c:385)
==16715==    by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed
(coll_tuned_decision_fixed.c:258)
==16715==    by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110)
==16715==    by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67)
==16715==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16715==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16715==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16715==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16715==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16715==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16715==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16715==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16715==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16715==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
==16715==  Address 0x32ef3e60 is 16 bytes after a block of size 256
alloc'd
==16715==    at 0x4C2C5A4: malloc (vg_replace_malloc.c:296)
==16715==    by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50)
==16715==    by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25)
==16715==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16715==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16715==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16715==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16715==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16715==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16715==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16715==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16715==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16715==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
==16715==

*************************************************
Rank 0:
*************************************************

==16714== Invalid read of size 1
==16714==    at 0x4C2CA74: __strrchr_sse42 (vg_replace_strmem.c:194)
==16714==    by 0x2FE1CAB7: ADIOI_Shfp_fname (shfp_fname.c:51)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
==16714==  Address 0x219377d0 is 0 bytes after a block of size 256
alloc'd
==16714==    at 0x4C2C5A4: malloc (vg_replace_malloc.c:296)
==16714==    by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50)
==16714==    by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
==16714==
...
==16714== Invalid read of size 1
==16714==    at 0x4C2D034: strlen (vg_replace_strmem.c:412)
==16714==    by 0x2FE1CB81: ADIOI_Shfp_fname (shfp_fname.c:61)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
==16714==  Address 0x219377d0 is 0 bytes after a block of size 256
alloc'd
==16714==    at 0x4C2C5A4: malloc (vg_replace_malloc.c:296)
==16714==    by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50)
==16714==    by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
...
==16714== Invalid read of size 2
==16714==    at 0x4C2E79E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==16714==    by 0x2543FADC: vader_prepare_src (btl_vader_module.c:590)
==16714==    by 0x25AB17AA: mca_bml_base_prepare_src (bml.h:341)
==16714==    by 0x25AB4207: mca_pml_ob1_send_request_start_prepare
(pml_ob1_sendreq.c:620)
==16714==    by 0x25AA3519: mca_pml_ob1_send_request_start_btl
(pml_ob1_sendreq.h:397)
==16714==    by 0x25AA3766: mca_pml_ob1_send_request_start_seq
(pml_ob1_sendreq.h:460)
==16714==    by 0x25AA41E1: mca_pml_ob1_isend (pml_ob1_isend.c:171)
==16714==    by 0x26BA2AF5: ompi_coll_tuned_bcast_intra_generic
(coll_tuned_bcast.c:112)
==16714==    by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial
(coll_tuned_bcast.c:385)
==16714==    by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed
(coll_tuned_decision_fixed.c:258)
==16714==    by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110)
==16714==    by 0x2FE1CBE5: ADIOI_Shfp_fname (shfp_fname.c:63)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
==16714==  Address 0x219377d0 is 0 bytes after a block of size 256
alloc'd
==16714==    at 0x4C2C5A4: malloc (vg_replace_malloc.c:296)
==16714==    by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50)
==16714==    by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
...
==16714== Invalid read of size 2
==16714==    at 0x4C2E790: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==16714==    by 0x2543FADC: vader_prepare_src (btl_vader_module.c:590)
==16714==    by 0x25AB17AA: mca_bml_base_prepare_src (bml.h:341)
==16714==    by 0x25AB4207: mca_pml_ob1_send_request_start_prepare
(pml_ob1_sendreq.c:620)
==16714==    by 0x25AA3519: mca_pml_ob1_send_request_start_btl
(pml_ob1_sendreq.h:397)
==16714==    by 0x25AA3766: mca_pml_ob1_send_request_start_seq
(pml_ob1_sendreq.h:460)
==16714==    by 0x25AA41E1: mca_pml_ob1_isend (pml_ob1_isend.c:171)
==16714==    by 0x26BA2AF5: ompi_coll_tuned_bcast_intra_generic
(coll_tuned_bcast.c:112)
==16714==    by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial
(coll_tuned_bcast.c:385)
==16714==    by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed
(coll_tuned_decision_fixed.c:258)
==16714==    by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110)
==16714==    by 0x2FE1CBE5: ADIOI_Shfp_fname (shfp_fname.c:63)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
==16714==  Address 0x219377d2 is 2 bytes after a block of size 256
alloc'd
==16714==    at 0x4C2C5A4: malloc (vg_replace_malloc.c:296)
==16714==    by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50)
==16714==    by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
...
==16714== Invalid read of size 1
==16714==    at 0x4C2E7B8: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==16714==    by 0x2543FADC: vader_prepare_src (btl_vader_module.c:590)
==16714==    by 0x25AB17AA: mca_bml_base_prepare_src (bml.h:341)
==16714==    by 0x25AB4207: mca_pml_ob1_send_request_start_prepare
(pml_ob1_sendreq.c:620)
==16714==    by 0x25AA3519: mca_pml_ob1_send_request_start_btl
(pml_ob1_sendreq.h:397)
==16714==    by 0x25AA3766: mca_pml_ob1_send_request_start_seq
(pml_ob1_sendreq.h:460)
==16714==    by 0x25AA41E1: mca_pml_ob1_isend (pml_ob1_isend.c:171)
==16714==    by 0x26BA2AF5: ompi_coll_tuned_bcast_intra_generic
(coll_tuned_bcast.c:112)
==16714==    by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial
(coll_tuned_bcast.c:385)
==16714==    by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed
(coll_tuned_decision_fixed.c:258)
==16714==    by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110)
==16714==    by 0x2FE1CBE5: ADIOI_Shfp_fname (shfp_fname.c:63)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
==16714==  Address 0x219377e0 is 16 bytes after a block of size 256
alloc'd
==16714==    at 0x4C2C5A4: malloc (vg_replace_malloc.c:296)
==16714==    by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50)
==16714==    by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25)
==16714==    by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177)
==16714==    by 0x2FDE3B0D: mca_io_romio_file_open
(io_romio_file_open.c:40)
==16714==    by 0x1AD52344: module_init (io_base_file_select.c:455)
==16714==    by 0x1AD51DFA: mca_io_base_file_select
(io_base_file_select.c:238)
==16714==    by 0x1ACA582F: ompi_file_open (file.c:130)
==16714==    by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94)
==16714==    by 0x13F9B36F:
PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int,
ompi_file_t*&, bool) (PAIO.cc:290)
==16714==    by 0xCA44252:
GISLectureEcriture<double>::litGISMPI(std::string,
GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411)
==16714==    by 0xCA23F0D: Champ::importeParallele(std::string const&)
(Champ.cc:951)
==16714==    by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789)
...


I have to precise that with MPICH 3.1.3, I can't reproduce the same
bad behavior.

Also, the segfault is not always there: running the same code with
other inputs, gave me trouble-free results with or without valgrind.
I noticed the problem appears mors frequently with longer "paths".

Please, help!

Thanks,

Eric

ompi_info -all :
http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.184rc3.txt.gz
config.log:
http://www.giref.ulaval.ca/~ericc/ompi_bug/config.184rc3.log.gz

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25983.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2014/12/25986.php

Reply via email to