On Fri, Jun 17, 2016 at 07:03:29PM +0900, Gilles Gouaillardet wrote:
> Romio is imported from a not update mpich.
> Could you give the latest mpich a try  ?
> 
> That will be helpful to figure out whether this bug has already been fixed.

Just installed mpich-3.2 ... and results remains unchanged vs. openmpi :

njoly> /local/mpich/3.2/bin/mpicc -g -O0 -o sample sample.c
njoly> /local/mpich/3.2/bin/mpirun -n 2 ./sample ufs:data.txt
rank0 ... 000000000022222222224444444444
rank1 ... 111111111133333333335555555555
njoly> /local/mpich/3.2/bin/mpirun -n 2 ./sample nfs:data.txt

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 13961 RUNNING AT lanfeust.sis.pasteur.fr
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

njoly@lanfeust [tmp/mpinfs]> gdb sample sample.core
[...]
(gdb) bt
#0  0x00007adfc551979f in memcpy () from /usr/lib/libc.so.12
#1  0x00000000004ca8c9 in ADIOI_NFS_ReadStrided ()
#2  0x0000000000444956 in MPIOI_File_read ()
#3  0x0000000000444abe in PMPI_File_read ()
#4  0x0000000000403219 in main (argc=2, argv=0x7f7fffea4128) at sample.c:63

> Cheers,
> 
> Gilles
> 
> Nicolas Joly <nj...@pasteur.fr> wrote:
> >On Fri, Jun 17, 2016 at 10:15:28AM +0200, Vincent Huber wrote:
> >> Dear Mr. Joly,
> >> 
> >> 
> >> I have tried your code on my MacBook Pro (cf. infra for details) to detail
> >> that behavior.
> >
> >Thanks for testing.
> >
> >> Looking at openmpi-1.10.3/ompi/mca/io/romio/romio/adio/comon/ad_fstype.c to
> >> get the list of file system I can test, I have tried the following:
> >> 
> >> mpirun -np 2 ./sample ufs:data.txt
> >
> >Works.
> >
> >> mpirun -np 2 ./sample nfs:data.txt
> >
> >Crash with SIGSEGV in ADIOI_NFS_ReadStrided()
> >
> >Made a quick and dirty test by replacing ADIOI_NFS_ReadStrided by
> >ADIOI_GEN_ReadStrided() in ADIO_NFS_operations structure
> >(ad_nfs/ad_nfs.c) ... and this fixed the problem.
> >
> >> mpirun -np 2 ./sample pfs:data.txt
> >> mpirun -np 2 ./sample piofs:data.txt
> >> mpirun -np 2 ./sample panfs:data.txt
> >> mpirun -np 2 ./sample hfs:data.txt
> >> mpirun -np 2 ./sample xfs:data.txt
> >> mpirun -np 2 ./sample sfs:data.txt
> >> mpirun -np 2 ./sample pvfs:data.txt
> >> mpirun -np 2 ./sample zoidfs:data.txt
> >> mpirun -np 2 ./sample ftp:data.txt
> >> mpirun -np 2 ./sample lustre:data.txt
> >> mpirun -np 2 ./sample bgl:data.txt
> >> mpirun -np 2 ./sample bglockless:data.txt
> >
> >I don't have access to this filesystems ...  The tool fails when
> >trying to open the file, that's the corresponding assert that fire.
> >
> >> mpirun -np 2 ./sample testfs:data.txt
> >
> >This one crash with SIGSEGV but in ADIOI_Flatten().
> >
> >I also tried with ompio and it seems to work.
> >
> >mpirun --mca io ompio -np 2 ./sample data.txt
> >rank0 ... 000000000022222222224444444444
> >rank1 ... 111111111133333333335555555555
> >
> >> The only one to not crash is ufs.
> >> That is not the answer you are looking for but my two cents?
> >
> >Thanks.
> >
> >>  gcc --version
> >> Configured with:
> >> --prefix=/Applications/Xcode.app/Contents/Developer/usr
> >> --with-gxx-include-dir=/usr/include/c++/4.2.1
> >> Apple LLVM version 7.0.0 (clang-700.0.72)
> >> Target: x86_64-apple-darwin15.5.0
> >> Thread model: posix
> >> 
> >> 
> >> et
> >> 
> >> 
> >> mpirun --version
> >> mpirun (Open MPI) 1.10.2
> >> 
> >> 
> >> ?
> >> 
> >> 
> >> 2016-06-14 17:42 GMT+02:00 Nicolas Joly <nj...@pasteur.fr>:
> >> 
> >> 
> >> >
> >> > Hi,
> >> >
> >> > At work, i do have some mpi codes that make use of custom datatypes to
> >> > call MPI_File_read with MPI_BOTTOM ... It mostly works, except when
> >> > the underlying filesystem is NFS where if crash with SIGSEGV.
> >> >
> >> > The attached sample (code + data) works just fine with 1.10.1 on my
> >> > NetBSD/amd64 workstation using the UFS romio backend, but crash if
> >> > switched to NFS :
> >> >
> >> > njoly@issan [~]> mpirun --version
> >> > mpirun (Open MPI) 1.10.1
> >> > njoly@issan [~]> mpicc -g -Wall -o sample sample.c
> >> > njoly@issan [~]> mpirun -n 2 ./sample ufs:data.txt
> >> > rank1 ... 111111111133333333335555555555
> >> > rank0 ... 000000000022222222224444444444
> >> > njoly@issan [~]> mpirun -n 2 ./sample nfs:data.txt
> >> > [issan:20563] *** Process received signal ***
> >> > [issan:08879] *** Process received signal ***
> >> > [issan:20563] Signal: Segmentation fault (11)
> >> > [issan:20563] Signal code: Address not mapped (1)
> >> > [issan:20563] Failing at address: 0xffffffffb1309240
> >> > [issan:08879] Signal: Segmentation fault (11)
> >> > [issan:08879] Signal code: Address not mapped (1)
> >> > [issan:08879] Failing at address: 0xffffffff881b0420
> >> > [issan:08879] [ 0] [issan:20563] [ 0] 0x7dafb14a52b0
> >> > <__sigtramp_siginfo_2> at /usr/lib/libc.so.12
> >> > [issan:20563] *** End of error message ***
> >> > 0x78b9886a52b0 <__sigtramp_siginfo_2> at /usr/lib/libc.so.12
> >> > [issan:08879] *** End of error message ***
> >> > --------------------------------------------------------------------------
> >> > mpirun noticed that process rank 0 with PID 20563 on node issan exited on
> >> > signal 11 (Segmentation fault).
> >> > --------------------------------------------------------------------------
> >> > njoly@issan [~]> gdb sample sample.core
> >> > GNU gdb (GDB) 7.10.1
> >> > [...]
> >> > Core was generated by `sample'.
> >> > Program terminated with signal SIGSEGV, Segmentation fault.
> >> > #0  0x000078b98871971f in memcpy () from /usr/lib/libc.so.12
> >> > [Current thread is 1 (LWP 1)]
> >> > (gdb) bt
> >> > #0  0x000078b98871971f in memcpy () from /usr/lib/libc.so.12
> >> > #1  0x000078b974010edf in ADIOI_NFS_ReadStrided () from
> >> > /usr/pkg/lib/openmpi/mca_io_romio.so
> >> > #2  0x000078b97400bacf in MPIOI_File_read () from
> >> > /usr/pkg/lib/openmpi/mca_io_romio.so
> >> > #3  0x000078b97400bc72 in mca_io_romio_dist_MPI_File_read () from
> >> > /usr/pkg/lib/openmpi/mca_io_romio.so
> >> > #4  0x000078b988e72b38 in PMPI_File_read () from 
> >> > /usr/pkg/lib/libmpi.so.12
> >> > #5  0x00000000004013a4 in main (argc=2, argv=0x7f7fff7b0f00) at 
> >> > sample.c:63
> >> >
> >> > Thanks.
> >> >
> >> > --
> >> > Nicolas Joly
> >> >
> >> > Cluster & Computing Group
> >> > Biology IT Center
> >> > Institut Pasteur, Paris.
> >> >
> >> > _______________________________________________
> >> > users mailing list
> >> > us...@open-mpi.org
> >> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> >> > Link to this post:
> >> > http://www.open-mpi.org/community/lists/users/2016/06/29434.php
> >> >
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> -- 
> >> Docteur Ingénieur de recherche
> >> CeMoSiS <http://www.cemosis.fr> - vincent.hu...@cemosis.fr
> >> Tel: +33 (0)3 68 8*5 02 06*
> >> IRMA - 7, rue René Descartes
> >> 67 000 Strasbourg
> >
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post: 
> >> http://www.open-mpi.org/community/lists/users/2016/06/29476.php
> >
> >-- 
> >Nicolas Joly
> >
> >Cluster & Computing Group
> >Biology IT Center
> >Institut Pasteur, Paris.
> >_______________________________________________
> >users mailing list
> >us...@open-mpi.org
> >Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> >Link to this post: 
> >http://www.open-mpi.org/community/lists/users/2016/06/29477.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29478.php
-- 
Nicolas Joly

Cluster & Computing Group
Biology IT Center
Institut Pasteur, Paris.

Reply via email to