Well, the way you describe it, it sounds to me like maybe an atomic issue with 
this compiler version. What was your configure line of Open MPI, and what 
network interconnect are you using?

An easy way to test this theory would be to force OpenMPI to use the tcp 
interfaces (everything will be slow however). You can do that by creating in 
your home directory a directory called .openmpi, and add there a file called 
mca-params.conf

The file should look something like this:

btl = tcp,self



Thanks
Edgar



> -----Original Message-----
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ryan
> Novosielski
> Sent: Wednesday, February 20, 2019 12:02 PM
> To: Open MPI Users <users@lists.open-mpi.org>
> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> 3.1.3
> 
> Does it make any sense that it seems to work fine when OpenMPI and HDF5
> are built with GCC 7.4 and GCC 8.2, but /not/ when they are built with RHEL-
> supplied GCC 4.8.5? That appears to be the scenario. For the GCC 4.8.5 build,
> I did try an XFS filesystem and it didn’t help. GPFS works fine for either of 
> the
> 7.4 and 8.2 builds.
> 
> Just as a reminder, since it was reasonably far back in the thread, what I’m
> doing is running the “make check” tests in HDF5 1.10.4, in part because users
> use it, but also because it seems to have a good test suite and I can 
> therefore
> verify the compiler and MPI stack installs. I get very little information, 
> apart
> from it not working and getting that “Alarm clock” message.
> 
> I originally suspected I’d somehow built some component of this with a host-
> specific optimization that wasn’t working on some compute nodes. But I
> controlled for that and it didn’t seem to make any difference.
> 
> --
> ____
> || \\UTGERS,           
> |---------------------------*O*---------------------------
> ||_// the State        |         Ryan Novosielski - novos...@rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\    of NJ        | Office of Advanced Research Computing - MSB C630,
> Newark
>      `'
> 
> > On Feb 18, 2019, at 1:34 PM, Ryan Novosielski <novos...@rutgers.edu>
> wrote:
> >
> > It didn’t work any better with XFS, as it happens. Must be something else.
> I’m going to test some more and see if I can narrow it down any, as it seems
> to me that it did work with a different compiler.
> >
> > --
> > ____
> > || \\UTGERS,         
> > |---------------------------*O*---------------------------
> > ||_// the State      |         Ryan Novosielski - novos...@rutgers.edu
> > || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS
> Campus
> > ||  \\    of NJ      | Office of Advanced Research Computing - MSB C630,
> Newark
> >     `'
> >
> >> On Feb 18, 2019, at 12:23 PM, Gabriel, Edgar <egabr...@central.uh.edu>
> wrote:
> >>
> >> While I was working on something else, I let the tests run with Open MPI
> master (which is for parallel I/O equivalent to the upcoming v4.0.1  release),
> and here is what I found for the HDF5 1.10.4 tests on my local desktop:
> >>
> >> In the testpar directory, there is in fact one test that fails for both 
> >> ompio
> and romio321 in exactly the same manner.
> >> I used 6 processes as you did (although I used mpirun directly  instead of
> srun...) From the 13 tests in the testpar directory, 12 pass correctly 
> (t_bigio,
> t_cache, t_cache_image, testphdf5, t_filters_parallel, t_init_term, t_mpi,
> t_pflush2, t_pread, t_prestart, t_pshutdown, t_shapesame).
> >>
> >> The one tests that officially fails ( t_pflush1) actually reports that it 
> >> passed,
> but then throws message that indicates that MPI_Abort has been called, for
> both ompio and romio. I will try to investigate this test to see what is going
> on.
> >>
> >> That being said, your report shows an issue in t_mpi, which passes
> without problems for me. This is however not GPFS, this was an XFS local file
> system. Running the tests on GPFS are on my todo list as well.
> >>
> >> Thanks
> >> Edgar
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
> >>> Gabriel, Edgar
> >>> Sent: Sunday, February 17, 2019 10:34 AM
> >>> To: Open MPI Users <users@lists.open-mpi.org>
> >>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems
> >>> w/OpenMPI
> >>> 3.1.3
> >>>
> >>> I will also run our testsuite and the HDF5 testsuite on GPFS, I have
> >>> access to a GPFS file system since recently, and will report back on
> >>> that, but it will take a few days.
> >>>
> >>> Thanks
> >>> Edgar
> >>>
> >>>> -----Original Message-----
> >>>> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of
> >>>> Ryan Novosielski
> >>>> Sent: Sunday, February 17, 2019 2:37 AM
> >>>> To: users@lists.open-mpi.org
> >>>> Subject: Re: [OMPI users] HDF5 1.10.4 "make check" problems
> >>>> w/OpenMPI
> >>>> 3.1.3
> >>>>
> >>>> -----BEGIN PGP SIGNED MESSAGE-----
> >>>> Hash: SHA1
> >>>>
> >>>> This is on GPFS. I'll try it on XFS to see if it makes any difference.
> >>>>
> >>>> On 2/16/19 11:57 PM, Gilles Gouaillardet wrote:
> >>>>> Ryan,
> >>>>>
> >>>>> What filesystem are you running on ?
> >>>>>
> >>>>> Open MPI defaults to the ompio component, except on Lustre
> >>>>> filesystem where ROMIO is used. (if the issue is related to ROMIO,
> >>>>> that can explain why you did not see any difference, in that case,
> >>>>> you might want to try an other filesystem (local filesystem or NFS
> >>>>> for example)\
> >>>>>
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Gilles
> >>>>>
> >>>>> On Sun, Feb 17, 2019 at 3:08 AM Ryan Novosielski
> >>>>> <novos...@rutgers.edu> wrote:
> >>>>>>
> >>>>>> I verified that it makes it through to a bash prompt, but I’m a
> >>>>>> little less confident that something make test does doesn’t clear it.
> >>>>>> Any recommendation for a way to verify?
> >>>>>>
> >>>>>> In any case, no change, unfortunately.
> >>>>>>
> >>>>>> Sent from my iPhone
> >>>>>>
> >>>>>>> On Feb 16, 2019, at 08:13, Gabriel, Edgar
> >>>>>>> <egabr...@central.uh.edu>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> What file system are you running on?
> >>>>>>>
> >>>>>>> I will look into this, but it might be later next week. I just
> >>>>>>> wanted to emphasize that we are regularly running the parallel
> >>>>>>> hdf5 tests with ompio, and I am not aware of any outstanding
> >>>>>>> items that do not work (and are supposed to work). That being
> >>>>>>> said, I run the tests manually, and not the 'make test'
> >>>>>>> commands. Will have to check which tests are being run by that.
> >>>>>>>
> >>>>>>> Edgar
> >>>>>>>
> >>>>>>>> -----Original Message----- From: users
> >>>>>>>> [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles
> >>>>>>>> Gouaillardet Sent: Saturday, February 16, 2019 1:49 AM To: Open
> >>>>>>>> MPI Users <users@lists.open-mpi.org> Subject: Re:
> >>>>>>>> [OMPI users] HDF5 1.10.4 "make check" problems w/OpenMPI
> >>>>>>>> 3.1.3
> >>>>>>>>
> >>>>>>>> Ryan,
> >>>>>>>>
> >>>>>>>> Can you
> >>>>>>>>
> >>>>>>>> export OMPI_MCA_io=^ompio
> >>>>>>>>
> >>>>>>>> and try again after you made sure this environment variable is
> >>>>>>>> passed by srun to the MPI tasks ?
> >>>>>>>>
> >>>>>>>> We have identified and fixed several issues specific to the
> >>>>>>>> (default) ompio component, so that could be a valid workaround
> >>>>>>>> until the next release.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>>
> >>>>>>>> Gilles
> >>>>>>>>
> >>>>>>>> Ryan Novosielski <novos...@rutgers.edu> wrote:
> >>>>>>>>> Hi there,
> >>>>>>>>>
> >>>>>>>>> Honestly don’t know which piece of this puzzle to look at or
> >>>>>>>>> how to get more
> >>>>>>>> information for troubleshooting. I successfully built HDF5
> >>>>>>>> 1.10.4 with RHEL system GCC 4.8.5 and OpenMPI 3.1.3. Running
> >>>>>>>> the “make check” in HDF5 is failing at the below point; I am
> >>>>>>>> using a value of RUNPARALLEL='srun -- mpi=pmi2 -p main -t
> >>>>>>>> 1:00:00 -n6 -N1’ and have a SLURM that’s otherwise properly
> >>>>>>>> configured.
> >>>>>>>>>
> >>>>>>>>> Thanks for any help you can provide.
> >>>>>>>>>
> >>>>>>>>> make[4]: Entering directory
> >>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
> >>>>>>>>> ============================ Testing  t_mpi
> >>>>>>>>> ============================ t_mpi  Test Log
> >>>>>>>>> ============================ srun: job 84126610 queued
> and
> >>>> waiting
> >>>>>>>>> for resources srun: job 84126610 has been allocated resources
> >>>>>>>>> srun: error: slepner023: tasks 0-5: Alarm clock 0.01user
> >>>>>>>>> 0.00system 20:03.95elapsed 0%CPU (0avgtext+0avgdata
> >>>>>>>>> 5152maxresident)k 0inputs+0outputs
> >>> (0major+1529minor)pagefaults
> >>>>>>>>> 0swaps make[4]: *** [t_mpi.chkexe_] Error 1 make[4]: Leaving
> >>>>>>>>> directory
> >>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
> >>>>>>>>> make[3]: *** [build-check-p] Error 1 make[3]: Leaving
> >>>>>>>>> directory
> >>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
> >>>>>>>>> make[2]: *** [test] Error 2 make[2]: Leaving directory
> >>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
> >>>>>>>>> make[1]: *** [check-am] Error 2 make[1]: Leaving directory
> >>>>>>>>> `/scratch/novosirj/install-files/hdf5-1.10.4-build-
> >>>>>>>> gcc-4.8-openmpi-3.1.3/testpar'
> >>>>>>>>> make: *** [check-recursive] Error 1
> >>>>>>>>>
> >>>>>>>>> -- ____ || \\UTGERS,
> >>>>>>>>> |---------------------------*O*---------------------------
> >>>>>>>>> ||_// the State     |         Ryan Novosielski -
> >>>>>>>>> novos...@rutgers.edu || \\ University | Sr. Technologist -
> >>>>>>>>> 973/972.0922 (2x0922) ~*~ RBHS Campus ||  \\    of NJ     |
> >>>>>>>>> Office of Advanced Research Computing - MSB C630, Newark `'
> >>>>>>>> _______________________________________________ users
> >>> mailing
> >>>> list
> >>>>>>>> users@lists.open-mpi.org
> >>>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
> >>>>>>> _______________________________________________ users
> mailing
> >>>> list
> >>>>>>> users@lists.open-mpi.org
> >>>>>>> https://lists.open-mpi.org/mailman/listinfo/users
> >>>>>> _______________________________________________ users
> mailing
> >>> list
> >>>>>> users@lists.open-mpi.org
> >>>>>> https://lists.open-mpi.org/mailman/listinfo/users
> >>>>> _______________________________________________ users mailing
> list
> >>>>> users@lists.open-mpi.org
> >>>>> https://lists.open-mpi.org/mailman/listinfo/users
> >>>>>
> >>>>
> >>>> - --
> >>>> ____
> >>>> || \\UTGERS,     |----------------------*O*------------------------
> >>>> ||_// the State  |    Ryan Novosielski - novos...@rutgers.edu
> >>>> || \\ University | Sr. Technologist - 973/972.0922 ~*~ RBHS Campus
> >>>> ||  \\    of NJ  | Office of Advanced Res. Comp. - MSB C630, Newark
> >>>>     `'
> >>>> -----BEGIN PGP SIGNATURE-----
> >>>>
> >>>>
> >>>
> iF0EARECAB0WIQST3OUUqPn4dxGCSm6Zv6Bp0RyxvgUCXGkdJQAKCRCZv6Bp
> >>>> 0Ryx
> >>>>
> >>>
> vvO3AKChC0/SZ74xeY95WjYEgFhVz+bXlACfYZWEKe4ZDbbbafGAcCuMF04yIgs
> >>>> =
> >>>> =6QM1
> >>>> -----END PGP SIGNATURE-----
> >>>> _______________________________________________
> >>>> users mailing list
> >>>> users@lists.open-mpi.org
> >>>> https://lists.open-mpi.org/mailman/listinfo/users
> >>> _______________________________________________
> >>> users mailing list
> >>> users@lists.open-mpi.org
> >>> https://lists.open-mpi.org/mailman/listinfo/users
> >> _______________________________________________
> >> users mailing list
> >> users@lists.open-mpi.org
> >> https://lists.open-mpi.org/mailman/listinfo/users
> >

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to