Hi Tohiko

If you compiled Open MPI in a computer with IB hardware, 
then copied the installation tree to another machine, 
or if you installed from an RPM or other package generated in a
machine with IB, your OpenMPI will have IB enabled,  I think, even if the 
machine where it is running does not have IB.

This is a matter of taste, but here is what I think,
regarding a previous question you sent.
I would rather compile open MPI from source, in the machine[s] where it will
run, and install it with the same path on all machines {or in a single NFS 
shared directory}, 
to make things simpler. 
I would use the most homogeneous set of machines possible,  to avoid too many 
headaches.
I.e. use the least common denominator, so to speak.
Say, everything x86_64, all with Ethernet only [or all with IB + Ethernet, but 
you
don't seem to have IB, at least not on all machines].

I hope this helps,
Gus Correa

On Feb 15, 2012, at 1:27 AM, Tohiko Looka wrote:

> Mm... This is really strange
> I don't have that service and there is no ib* output in 'ifconfig -a' or 
> 'Infinband' in 'lspci'
> Which makes me believe that I don't have such a network. I also checked on an 
> identical computer on the same network with the same results.
> 
> What's strange is that these messages didn't use to show up and they don't 
> show up on that identical computer; only on mine. Even though both computers 
> have the same hardware, openMPI version and on the same network.
> 
> I guess I can safely ignore these warnings and run on Ethernet, but it would 
> be nice to know what happened there, in case anybody has an idea.
> 
> Thank you,
> 
> On Wed, Feb 15, 2012 at 12:52 AM, Gustavo Correa <g...@ldeo.columbia.edu> 
> wrote:
> Hi Tohiko
> 
> OpenFabrics network a.k.a. Infiniband a.k.a. IB.
> To check if the compute nodes have IB interfaces, try:
> 
> lspci [and search the output for Infinband]
> 
> To see if the IB interface is configured try:
> 
> ifconfig -a  [and search the output for ib0, ib1, or similar]
> 
> To check if the OFED module is up try:
> 
> 'service openibd status'
> 
> 
> As an alternative, you could also try to run your program over Ethernet, 
> avoiding Infinband,
> in case you don't have IB or if somehow it is broken.
> It is slower than Infiniband, though.
> 
> Try something like this:
> 
> mpiexec -mca btl tcp,sm,self -np 4 ./my_mpi_program
> 
> I hope this helps,
> Gus Correa
> 
> On Feb 14, 2012, at 4:02 PM, Tohiko Looka wrote:
> 
> > Sorry for the noob question, but how do I check my network type and if OFED 
> > service is running correctly or not? And how do I run it
> >
> > Thank you,
> >
> > On Tue, Feb 14, 2012 at 2:14 PM, Jeff Squyres <jsquy...@cisco.com> wrote:
> > Do you have an OpenFabrics-based network?  (e.g., InfiniBand or iWarp)
> >
> > If so, this error message usually means that OFED is either installed 
> > incorrectly, or is not running properly (e.g., its services didn't get 
> > started properly upon boot).
> >
> > If you don't have an OpenFabrics-based network, then it usually means that 
> > you have OpenFabrics services running when you really shouldn't (because 
> > you don't have any OpenFabrics-based devices).
> >
> >
> > On Feb 14, 2012, at 4:48 AM, Tohiko Looka wrote:
> >
> > > Greetings,
> > >
> > > Until today I was running my openmpi applications with no errors/warnings
> > > Today I restarted my computer (possibly after an automatic openmpi 
> > > update) and got these warnings when
> > > running my program
> > > [tohiko@kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts -np 10 
> > > hello
> > > librdmacm: couldn't read ABI version.
> > > librdmacm: assuming: 4
> > > CMA: unable to get RDMA device list
> > > --------------------------------------------------------------------------
> > > [[21652,1],0]: A high-performance Open MPI point-to-point messaging module
> > > was unable to find any relevant network interfaces:
> > >
> > > Module: OpenFabrics (openib)
> > >   Host: kw12614
> > >
> > > Another transport will be used instead, although this may result in
> > > lower performance.
> > > --------------------------------------------------------------------------
> > > [kw12614:03195] 10 more processes have sent help message 
> > > help-mpi-btl-base.txt / btl:no-nics
> > > [kw12614:03195] Set MCA parameter "orte_base_help_aggregate" to 0 to see 
> > > all help / error messages
> > >
> > >
> > > Is this normal? And how come it happened now?
> > > -- Tohiko
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to