On Fri, Jun 17, 2016 at 07:03:29PM +0900, Gilles Gouaillardet wrote: > Romio is imported from a not update mpich. > Could you give the latest mpich a try ? > > That will be helpful to figure out whether this bug has already been fixed.
Just installed mpich-3.2 ... and results remains unchanged vs. openmpi : njoly> /local/mpich/3.2/bin/mpicc -g -O0 -o sample sample.c njoly> /local/mpich/3.2/bin/mpirun -n 2 ./sample ufs:data.txt rank0 ... 000000000022222222224444444444 rank1 ... 111111111133333333335555555555 njoly> /local/mpich/3.2/bin/mpirun -n 2 ./sample nfs:data.txt =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 13961 RUNNING AT lanfeust.sis.pasteur.fr = EXIT CODE: 139 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions njoly@lanfeust [tmp/mpinfs]> gdb sample sample.core [...] (gdb) bt #0 0x00007adfc551979f in memcpy () from /usr/lib/libc.so.12 #1 0x00000000004ca8c9 in ADIOI_NFS_ReadStrided () #2 0x0000000000444956 in MPIOI_File_read () #3 0x0000000000444abe in PMPI_File_read () #4 0x0000000000403219 in main (argc=2, argv=0x7f7fffea4128) at sample.c:63 > Cheers, > > Gilles > > Nicolas Joly <nj...@pasteur.fr> wrote: > >On Fri, Jun 17, 2016 at 10:15:28AM +0200, Vincent Huber wrote: > >> Dear Mr. Joly, > >> > >> > >> I have tried your code on my MacBook Pro (cf. infra for details) to detail > >> that behavior. > > > >Thanks for testing. > > > >> Looking at openmpi-1.10.3/ompi/mca/io/romio/romio/adio/comon/ad_fstype.c to > >> get the list of file system I can test, I have tried the following: > >> > >> mpirun -np 2 ./sample ufs:data.txt > > > >Works. > > > >> mpirun -np 2 ./sample nfs:data.txt > > > >Crash with SIGSEGV in ADIOI_NFS_ReadStrided() > > > >Made a quick and dirty test by replacing ADIOI_NFS_ReadStrided by > >ADIOI_GEN_ReadStrided() in ADIO_NFS_operations structure > >(ad_nfs/ad_nfs.c) ... and this fixed the problem. > > > >> mpirun -np 2 ./sample pfs:data.txt > >> mpirun -np 2 ./sample piofs:data.txt > >> mpirun -np 2 ./sample panfs:data.txt > >> mpirun -np 2 ./sample hfs:data.txt > >> mpirun -np 2 ./sample xfs:data.txt > >> mpirun -np 2 ./sample sfs:data.txt > >> mpirun -np 2 ./sample pvfs:data.txt > >> mpirun -np 2 ./sample zoidfs:data.txt > >> mpirun -np 2 ./sample ftp:data.txt > >> mpirun -np 2 ./sample lustre:data.txt > >> mpirun -np 2 ./sample bgl:data.txt > >> mpirun -np 2 ./sample bglockless:data.txt > > > >I don't have access to this filesystems ... The tool fails when > >trying to open the file, that's the corresponding assert that fire. > > > >> mpirun -np 2 ./sample testfs:data.txt > > > >This one crash with SIGSEGV but in ADIOI_Flatten(). > > > >I also tried with ompio and it seems to work. > > > >mpirun --mca io ompio -np 2 ./sample data.txt > >rank0 ... 000000000022222222224444444444 > >rank1 ... 111111111133333333335555555555 > > > >> The only one to not crash is ufs. > >> That is not the answer you are looking for but my two cents? > > > >Thanks. > > > >> gcc --version > >> Configured with: > >> --prefix=/Applications/Xcode.app/Contents/Developer/usr > >> --with-gxx-include-dir=/usr/include/c++/4.2.1 > >> Apple LLVM version 7.0.0 (clang-700.0.72) > >> Target: x86_64-apple-darwin15.5.0 > >> Thread model: posix > >> > >> > >> et > >> > >> > >> mpirun --version > >> mpirun (Open MPI) 1.10.2 > >> > >> > >> ? > >> > >> > >> 2016-06-14 17:42 GMT+02:00 Nicolas Joly <nj...@pasteur.fr>: > >> > >> > >> > > >> > Hi, > >> > > >> > At work, i do have some mpi codes that make use of custom datatypes to > >> > call MPI_File_read with MPI_BOTTOM ... It mostly works, except when > >> > the underlying filesystem is NFS where if crash with SIGSEGV. > >> > > >> > The attached sample (code + data) works just fine with 1.10.1 on my > >> > NetBSD/amd64 workstation using the UFS romio backend, but crash if > >> > switched to NFS : > >> > > >> > njoly@issan [~]> mpirun --version > >> > mpirun (Open MPI) 1.10.1 > >> > njoly@issan [~]> mpicc -g -Wall -o sample sample.c > >> > njoly@issan [~]> mpirun -n 2 ./sample ufs:data.txt > >> > rank1 ... 111111111133333333335555555555 > >> > rank0 ... 000000000022222222224444444444 > >> > njoly@issan [~]> mpirun -n 2 ./sample nfs:data.txt > >> > [issan:20563] *** Process received signal *** > >> > [issan:08879] *** Process received signal *** > >> > [issan:20563] Signal: Segmentation fault (11) > >> > [issan:20563] Signal code: Address not mapped (1) > >> > [issan:20563] Failing at address: 0xffffffffb1309240 > >> > [issan:08879] Signal: Segmentation fault (11) > >> > [issan:08879] Signal code: Address not mapped (1) > >> > [issan:08879] Failing at address: 0xffffffff881b0420 > >> > [issan:08879] [ 0] [issan:20563] [ 0] 0x7dafb14a52b0 > >> > <__sigtramp_siginfo_2> at /usr/lib/libc.so.12 > >> > [issan:20563] *** End of error message *** > >> > 0x78b9886a52b0 <__sigtramp_siginfo_2> at /usr/lib/libc.so.12 > >> > [issan:08879] *** End of error message *** > >> > -------------------------------------------------------------------------- > >> > mpirun noticed that process rank 0 with PID 20563 on node issan exited on > >> > signal 11 (Segmentation fault). > >> > -------------------------------------------------------------------------- > >> > njoly@issan [~]> gdb sample sample.core > >> > GNU gdb (GDB) 7.10.1 > >> > [...] > >> > Core was generated by `sample'. > >> > Program terminated with signal SIGSEGV, Segmentation fault. > >> > #0 0x000078b98871971f in memcpy () from /usr/lib/libc.so.12 > >> > [Current thread is 1 (LWP 1)] > >> > (gdb) bt > >> > #0 0x000078b98871971f in memcpy () from /usr/lib/libc.so.12 > >> > #1 0x000078b974010edf in ADIOI_NFS_ReadStrided () from > >> > /usr/pkg/lib/openmpi/mca_io_romio.so > >> > #2 0x000078b97400bacf in MPIOI_File_read () from > >> > /usr/pkg/lib/openmpi/mca_io_romio.so > >> > #3 0x000078b97400bc72 in mca_io_romio_dist_MPI_File_read () from > >> > /usr/pkg/lib/openmpi/mca_io_romio.so > >> > #4 0x000078b988e72b38 in PMPI_File_read () from > >> > /usr/pkg/lib/libmpi.so.12 > >> > #5 0x00000000004013a4 in main (argc=2, argv=0x7f7fff7b0f00) at > >> > sample.c:63 > >> > > >> > Thanks. > >> > > >> > -- > >> > Nicolas Joly > >> > > >> > Cluster & Computing Group > >> > Biology IT Center > >> > Institut Pasteur, Paris. > >> > > >> > _______________________________________________ > >> > users mailing list > >> > us...@open-mpi.org > >> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > >> > Link to this post: > >> > http://www.open-mpi.org/community/lists/users/2016/06/29434.php > >> > > >> > >> > >> > >> > >> > >> > >> -- > >> Docteur Ingénieur de recherche > >> CeMoSiS <http://www.cemosis.fr> - vincent.hu...@cemosis.fr > >> Tel: +33 (0)3 68 8*5 02 06* > >> IRMA - 7, rue René Descartes > >> 67 000 Strasbourg > > > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > >> http://www.open-mpi.org/community/lists/users/2016/06/29476.php > > > >-- > >Nicolas Joly > > > >Cluster & Computing Group > >Biology IT Center > >Institut Pasteur, Paris. > >_______________________________________________ > >users mailing list > >us...@open-mpi.org > >Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > >Link to this post: > >http://www.open-mpi.org/community/lists/users/2016/06/29477.php > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29478.php -- Nicolas Joly Cluster & Computing Group Biology IT Center Institut Pasteur, Paris.