On Mon, Aug 11, 2014 at 10:41 AM, Rob Latham <r...@mcs.anl.gov> wrote:
> > > On 08/11/2014 08:54 AM, George Bosilca wrote: > >> The patch related to ticket #4597 is zapping only the datatypes where >> the user explicitly provided a zero count. >> >> We can argue about LB and UB, but I have a hard time understanding the >> rationale of allowing zero count only for LB and UB. If it is required >> by the standard we can easily support it (the line in the patch has to >> move a little down in the code). >> > > ROMIO's type flattening code is primitive: the zero-length blocks for UB > and LB were the only way to encode the extent of the type, without calling > back into the MPI implementation's type-inquiry routines. > > > *I* don't care how OpenMPI deals with UB and LB. It was *you* who > suggested one might need to look a bit more closely at how OpenMPI type > processing handles those markers: > > http://www.open-mpi.org/community/lists/users/2014/05/24325.php I have absolutely no issue with this approach. I was basically trying to figure out if the ticket was closed too early or not. George. > > > ==rob > > >> George. >> >> >> >> On Mon, Aug 11, 2014 at 9:44 AM, Rob Latham <r...@mcs.anl.gov >> <mailto:r...@mcs.anl.gov>> wrote: >> >> >> >> On 08/10/2014 07:32 PM, Mohamad Chaarawi wrote: >> >> Update: >> >> George suggested that I try with the 1.8.2 rc3 and that one >> resolves the >> hindexed_block segfault that I was seeing with ompi. the I/O >> part now >> works with ompio, but needs the patches from Rob in ROMIO to >> work correctly. >> >> The 2nd issue with collective I/O where some processes >> participate with >> 0 sized datatypes created with hindexed and hvector, is still >> unresolved. >> >> >> I think this ticket was closed a bit too early: >> >> https://svn.open-mpi.org/trac/__ompi/ticket/4597 >> >> <https://svn.open-mpi.org/trac/ompi/ticket/4597> >> >> I don't know OpenMPI's type processing at all, but if it's like >> ROMIO, you cannot simply zap blocks of zero length: some zero >> length blocks indicate upper bound and lower bound. >> >> or maybe it's totally unrelated. There were a flurry of datatype >> bugs reported against both MPICH and OpenMPI in may of this year and >> I am sure I am confusing several issues. >> >> ==rob >> >> >> Thanks, >> Mohamad >> >> On 8/6/2014 11:50 AM, Mohamad Chaarawi wrote: >> >> Hi all, >> >> I'm seeing some problems with dervided datatype construction >> and I/O >> with OpenMPI 1.8.1. >> >> I have replicated them in the attached program. >> The first issue is that MPI_Type_create_hindexed___block() >> >> always >> sefgaults. Usage of this routine is commented out in the >> program. (I >> have a separate email thread with George and Edgar about >> this). >> >> The other issue is a segfault in MPI_File_set_view, when I >> have ranks >> > 0 creating the derived datatypes with count 0, and rank 0 >> creating a >> derived datatype of count NUM_BLOCKS. If I use >> MPI_Type_contiguous to >> create the 0 sized file and memory datatypes (instead of >> hindexed and >> hvector) it works fine. >> To replicate, run the program with 2 or more procs: >> >> mpirun -np 2 ./hindexed_io mpi_test_file >> >> [jam:15566] *** Process received signal *** >> [jam:15566] Signal: Segmentation fault (11) >> [jam:15566] Signal code: Address not mapped (1) >> [jam:15566] Failing at address: (nil) >> [jam:15566] [ 0] [0xfcd440] >> [jam:15566] [ 1] >> /scr/chaarawi/install/ompi/__lib/libmpi.so.1(ADIOI_Flatten_ >> __datatype+0x17a)[0xc80f2a] >> [jam:15566] [ 2] >> /scr/chaarawi/install/ompi/__lib/libmpi.so.1(ADIO_Set_view+ >> __0x1c1)[0xc72a6d] >> [jam:15566] [ 3] >> /scr/chaarawi/install/ompi/__lib/libmpi.so.1(mca_io_romio__ >> _dist_MPI_File_set_view+0x69b)[__0xc8d11b] >> [jam:15566] [ 4] >> /scr/chaarawi/install/ompi/__lib/libmpi.so.1(mca_io_romio__ >> _file_set_view+0x7c)[0xc4f7c5] >> [jam:15566] [ 5] >> /scr/chaarawi/install/ompi/__lib/libmpi.so.1(PMPI_File_set_ >> __view+0x1e6)[0xb32f7e] >> >> [jam:15566] [ 6] ./hindexed_io[0x8048aa6] >> [jam:15566] [ 7] >> /lib/libc.so.6(__libc_start___main+0xdc)[0x7d5ebc] >> >> [jam:15566] [ 8] ./hindexed_io[0x80487e1] >> [jam:15566] *** End of error message *** >> >> If I use --mca io ompio with 2 or more procs, the program >> segfaults in >> write_at_all (regardless of what routine is used to >> construct a 0 >> sized datatype): >> >> [jam:15687] *** Process received signal *** >> [jam:15687] Signal: Floating point exception (8) >> [jam:15687] Signal code: Integer divide-by-zero (1) >> [jam:15687] Failing at address: 0x3e29b7 >> [jam:15687] [ 0] [0xe56440] >> [jam:15687] [ 1] >> /scr/chaarawi/install/ompi/__lib/libmpi.so.1(ompi_io_ompio_ >> __set_explicit_offset+0x9d)[__0x3513bc] >> [jam:15687] [ 2] >> /scr/chaarawi/install/ompi/__lib/libmpi.so.1(ompio_io___ >> ompio_file_write_at_all+0x3e)[__0x35869a] >> [jam:15687] [ 3] >> /scr/chaarawi/install/ompi/__lib/libmpi.so.1(mca_io_ompio__ >> _file_write_at_all+0x66)[__0x358650] >> [jam:15687] [ 4] >> /scr/chaarawi/install/ompi/__lib/libmpi.so.1(MPI_File___ >> write_at_all+0x1b3)[0x1f46f3] >> >> [jam:15687] [ 5] ./hindexed_io[0x8048b07] >> [jam:15687] [ 6] >> /lib/libc.so.6(__libc_start___main+0xdc)[0x7d5ebc] >> >> [jam:15687] [ 7] ./hindexed_io[0x80487e1] >> [jam:15687] *** End of error message *** >> >> If I use mpich 3.1.2 , I don't see those issues. >> >> Thanks, >> Mohamad >> >> >> _________________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription:http://www.open-__mpi.org/mailman/listinfo.cgi/ >> __users >> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this >> post:http://www.open-mpi.org/__community/lists/users/2014/ >> 08/__24931.php >> <http://www.open-mpi.org/community/lists/users/2014/08/ >> 24931.php> >> >> >> >> >> _________________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: >> http://www.open-mpi.org/__mailman/listinfo.cgi/users >> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/__community/lists/users/2014/08/ >> __24963.php >> >> <http://www.open-mpi.org/community/lists/users/2014/08/24963.php> >> >> >> -- >> Rob Latham >> Mathematics and Computer Science Division >> Argonne National Lab, IL USA >> _________________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/__mailman/listinfo.cgi/users >> >> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >> Link to this post: >> http://www.open-mpi.org/__community/lists/users/2014/08/__24971.php >> <http://www.open-mpi.org/community/lists/users/2014/08/24971.php> >> >> >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/ >> 24973.php >> >> > -- > Rob Latham > Mathematics and Computer Science Division > Argonne National Lab, IL USA > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: http://www.open-mpi.org/community/lists/users/2014/08/ > 24974.php >