Update:

George suggested that I try with the 1.8.2 rc3 and that one resolves the hindexed_block segfault that I was seeing with ompi. the I/O part now works with ompio, but needs the patches from Rob in ROMIO to work correctly.

The 2nd issue with collective I/O where some processes participate with 0 sized datatypes created with hindexed and hvector, is still unresolved.

Thanks,
Mohamad

On 8/6/2014 11:50 AM, Mohamad Chaarawi wrote:
Hi all,

I'm seeing some problems with dervided datatype construction and I/O with OpenMPI 1.8.1.

I have replicated them in the attached program.
The first issue is that MPI_Type_create_hindexed_block() always sefgaults. Usage of this routine is commented out in the program. (I have a separate email thread with George and Edgar about this).

The other issue is a segfault in MPI_File_set_view, when I have ranks > 0 creating the derived datatypes with count 0, and rank 0 creating a derived datatype of count NUM_BLOCKS. If I use MPI_Type_contiguous to create the 0 sized file and memory datatypes (instead of hindexed and hvector) it works fine.
To replicate, run the program with 2 or more procs:

mpirun -np 2 ./hindexed_io mpi_test_file

[jam:15566] *** Process received signal ***
[jam:15566] Signal: Segmentation fault (11)
[jam:15566] Signal code: Address not mapped (1)
[jam:15566] Failing at address: (nil)
[jam:15566] [ 0] [0xfcd440]
[jam:15566] [ 1] /scr/chaarawi/install/ompi/lib/libmpi.so.1(ADIOI_Flatten_datatype+0x17a)[0xc80f2a] [jam:15566] [ 2] /scr/chaarawi/install/ompi/lib/libmpi.so.1(ADIO_Set_view+0x1c1)[0xc72a6d] [jam:15566] [ 3] /scr/chaarawi/install/ompi/lib/libmpi.so.1(mca_io_romio_dist_MPI_File_set_view+0x69b)[0xc8d11b] [jam:15566] [ 4] /scr/chaarawi/install/ompi/lib/libmpi.so.1(mca_io_romio_file_set_view+0x7c)[0xc4f7c5] [jam:15566] [ 5] /scr/chaarawi/install/ompi/lib/libmpi.so.1(PMPI_File_set_view+0x1e6)[0xb32f7e]
[jam:15566] [ 6] ./hindexed_io[0x8048aa6]
[jam:15566] [ 7] /lib/libc.so.6(__libc_start_main+0xdc)[0x7d5ebc]
[jam:15566] [ 8] ./hindexed_io[0x80487e1]
[jam:15566] *** End of error message ***

If I use --mca io ompio with 2 or more procs, the program segfaults in write_at_all (regardless of what routine is used to construct a 0 sized datatype):

[jam:15687] *** Process received signal ***
[jam:15687] Signal: Floating point exception (8)
[jam:15687] Signal code: Integer divide-by-zero (1)
[jam:15687] Failing at address: 0x3e29b7
[jam:15687] [ 0] [0xe56440]
[jam:15687] [ 1] /scr/chaarawi/install/ompi/lib/libmpi.so.1(ompi_io_ompio_set_explicit_offset+0x9d)[0x3513bc] [jam:15687] [ 2] /scr/chaarawi/install/ompi/lib/libmpi.so.1(ompio_io_ompio_file_write_at_all+0x3e)[0x35869a] [jam:15687] [ 3] /scr/chaarawi/install/ompi/lib/libmpi.so.1(mca_io_ompio_file_write_at_all+0x66)[0x358650] [jam:15687] [ 4] /scr/chaarawi/install/ompi/lib/libmpi.so.1(MPI_File_write_at_all+0x1b3)[0x1f46f3]
[jam:15687] [ 5] ./hindexed_io[0x8048b07]
[jam:15687] [ 6] /lib/libc.so.6(__libc_start_main+0xdc)[0x7d5ebc]
[jam:15687] [ 7] ./hindexed_io[0x80487e1]
[jam:15687] *** End of error message ***

If I use mpich 3.1.2 , I don't see those issues.

Thanks,
Mohamad


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/24931.php

Reply via email to