Hi Ryan,

Thanks for the confirmation that this is an issue with mixing
different transports.  I will followup by trying this with the current
trunk.  If things work okay there, then the best approach might be
for you to move forward to that, if possible.  It's at a fairly stable
point now in anticipation for branching for the upcoming 1.3 release.
However, if it's still an issue on the trunk, then I will file a defect
on it & we'll get it resolved for the 1.3 release.

--Brad


On Thu, May 15, 2008 at 3:30 PM, Ryan Buckley ; 21426 <rbuck...@mc.com>
wrote:

> Hello Brad,
>
> I removed the openib specifier from the btl lists in the appfile and the
> application ran fine over ethernet.  And yes, to confirm, if I attempt
> to mix systems with IB and systems without IB, the application
> hangs.
>
> Thanks,
>
> Ryan
>
>
>
> Hello Ryan,
>
>
> I have been running a similar heterogeneous setup in my lab; i.e., a mix
> of
> ppc64 and x86_64 systems connected by ethernet and InfiniBand. In trying
> to
> replicate your problem, what I see is that it is not an issue of
> processor
> heterogeneity, but rather an issue with heterogeneous transports. Can
> you
> remove the openib specifier from the btl lists in the appfile and try
> again? I.e., force all inter-system communications over ethernet? For
> me,
> that works. But, if I mix systems with IB with systems without IB, I,
> too,
> see a hang...even if the processor architectures are the same. If you
> could
> confirm that your case is the same, then we can make sure we're only
> chasing
> one problem and not two.
>
>
> Thanks,
> --Brad
>
>
> Brad Benton
> IBM
>
>
> On Thu, May 1, 2008 at 11:02 AM, Ryan Buckley ; 21426
> <rbuckley_at_[hidden]>
> wrote:
>
>
> > Hello,
> >
> > I am trying to run a simple Hello World MPI application in a
> > heterogeneous environment. The machines include 1 x86 machine with a
> > standard 1Gb ethernet connection and 2 ppc machines with standard 1Gb
> > ethernet as well as a 10Gb ethernet (Infiniband) switch between the
> two.
> > The Hello World program is the same hello_c.c that is included in the
> > examples directory of the Open MPI installation.
> >
> > The goal is that I would like to run heterogeneous applications
> between
> > the three aforementioned machines in the following manner:
> >
> > The x86 machine will use tcp to communicate to the 2 ppc machines,
> > while the ppc machines will communicate with one another via the
> 10GbE.
> >
> > x86 <--tcp--> ppc_1
> > x86 <--tcp--> ppc_2
> > ppc1 <--openib--> ppc_2
> >
> > I am currently using a machfile set up as follows,
> >
> > # cat machfile
> > <ppc_host_1>
> > <ppc_host_2>
> > <x86_host>
> >
> > In addition I am using an appfile set up as follows,
> >
> > # cat appfile
> > -np 1 --hostfile machfile --host <ppc_host_1> --mca btl
> > sm,self,tcp,openib /path/to/ppc/openmpi-1.2.5/examples/hello
> > -np 1 --hostfile machfile --host <ppc_host_2> --mca btl
> > sm,self,tcp,openib /path/to/ppc/openmpi-1.2.5/examples/hello
> > -np 1 --hostfile machfile --host <x86_host> --mca btl
> > sm,self,tcp /path/to/x86/openmpi-1.2.5/examples/hello
> >
> > I am running on the command line via
> >
> > # mpirun --app appfile
> >
> > I've also attached the output from 'ompi_info --all' from all
> machines.
> >
> > Any suggestions would be much appreciated.
> >
> > Thanks,
> >
> > Ryan
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to