On Fri, 2020-06-05 at 19:52 -0400, Stephen Siegel via users wrote:
> Sure, I’ll ask the machine admins to update and let you know how it
> goes.
> In the meantime, I was just wondering if someone has run this little
> program with an up-to-date OpenMPI and if it worked.   If so, then I
> will know the problem is with our setup.

I don't know what version of OpenMPI corresponds to spectrum-mpi-
10.3.1.2-20200121 on the ORNL Summit machine, but this test passes with
that implementation.

==rob

> Thanks
> -Steve
> 
> 
> > On Jun 5, 2020, at 7:45 PM, Jeff Squyres (jsquyres) <
> > jsquy...@cisco.com> wrote:
> > 
> > You cited Open MPI v2.1.1.  That's a pretty ancient version of Open
> > MPI.
> > 
> > Any chance you can upgrade to Open MPI 4.0.x?
> > 
> > 
> > 
> > > On Jun 5, 2020, at 7:24 PM, Stephen Siegel <sie...@udel.edu>
> > > wrote:
> > > 
> > > 
> > > 
> > > > On Jun 5, 2020, at 6:55 PM, Jeff Squyres (jsquyres) <
> > > > jsquy...@cisco.com> wrote:
> > > > 
> > > > On Jun 5, 2020, at 6:35 PM, Stephen Siegel via users <
> > > > users@lists.open-mpi.org> wrote:
> > > > > [ilyich:12946] 3 more processes have sent help message help-
> > > > > mpi-btl-base.txt / btl:no-nics
> > > > > [ilyich:12946] Set MCA parameter "orte_base_help_aggregate"
> > > > > to 0 to see all help / error messages
> > > > 
> > > > It looks like your output somehow doesn't include the actual
> > > > error message.
> > > 
> > > You’re right, on this first machine I did not include all of the
> > > output.  It is:
> > > 
> > > siegel@ilyich:~/372/code/mpi/io$ mpiexec -n 4 ./a.out
> > > ---------------------------------------------------------------
> > > -----------
> > > [[171,1],0]: A high-performance Open MPI point-to-point messaging
> > > module
> > > was unable to find any relevant network interfaces:
> > > 
> > > Module: OpenFabrics (openib)
> > > Host: ilyich
> > > 
> > > Another transport will be used instead, although this may result
> > > in
> > > lower performance.
> > > 
> > > NOTE: You can disable this warning by setting the MCA parameter
> > > btl_base_warn_component_unused to 0.
> > > —————————————————————————————————————
> > > 
> > > So, I’ll ask my people to look into how they configured this.
> > > 
> > > However, on the second machine which uses SLURM it consistently
> > > hangs on this example, although many other examples using MPI I/O
> > > work fine.
> > > 
> > > -Steve
> > > 
> > > 
> > > 
> > > 
> > > > That error message was sent to stderr, so you may not have
> > > > captured it if you only did "mpirun ... > foo.txt".  The actual
> > > > error message template is this:
> > > > 
> > > > -----
> > > > %s: A high-performance Open MPI point-to-point messaging module
> > > > was unable to find any relevant network interfaces:
> > > > 
> > > > Module: %s
> > > > Host: %s
> > > > 
> > > > Another transport will be used instead, although this may
> > > > result in
> > > > lower performance.
> > > > 
> > > > NOTE: You can disable this warning by setting the MCA parameter
> > > > btl_base_warn_component_unused to 0.
> > > > -----
> > > > 
> > > > This is not actually an error -- just a warning.  It typically
> > > > means that your Open MPI has support for HPC-class networking,
> > > > Open MPI saw some evidence of HPC-class networking on the nodes
> > > > on which your job ran, but ultimately didn't use any of those
> > > > HPC-class networking interfaces for some reason and therefore
> > > > fell back to TCP.
> > > > 
> > > > I.e., your program ran correctly, but it may have run slower
> > > > than it could have if it were able to use HPC-class networks.
> > > > 
> > > > -- 
> > > > Jeff Squyres
> > > > jsquy...@cisco.com
> > 
> > -- 
> > Jeff Squyres
> > jsquy...@cisco.com
> > 

Reply via email to