On Wed, Aug 25, 2010 at 12:14 PM, Jeff Squyres wrote:
> It would simplify testing if you could get all the eth0's to be of one type
> and on the same subnet, and the same for eth1.
>
> Once you do that, try using just one of the networks by telling OMPI to use
> only one of
On Wed, Aug 25, 2010 at 12:14 PM, Jeff Squyres wrote:
> Once you do that, try using just one of the networks by telling OMPI to use
> only one of the devices, something like this:
>
> mpirun --mca btl_tcp_if_include eth0 ...
Thanks Jeff! Just tried the exact test that you
On Wed, Aug 25, 2010 at 6:41 AM, John Hearns wrote:
> You could sort that out with udev rules on each machine.
Sure. I'd always wanted consistent names for the eth interfaces when I
set up the cluster but I couldn't get udev to co-operate. Maybe this
time! Let me try.
>
On Aug 24, 2010, at 6:26 PM, Rahul Nabar wrote:
>> Are all the eth0's on one subnet and all the eth2's on a different subnet?
>>
>> Or are all eth0's and eth2's all on the same subnet?
>
> Thanks Jeff! Different subnets. All 10GigE's are on 192.168.x.x and
> all 1GigE's are on 10.0.x.x
It
On Thu, Aug 19, 2010 at 9:03 PM, Rahul Nabar wrote:
> --
> gather:
> NP256 hangs
> NP128 hangs
> NP64 hangs
> NP32 OK
>
> Note: "gather" always hangs at the following line of the test:
>
On 24 August 2010 18:58, Rahul Nabar wrote:
> There are a few unusual things about the cluster. We are using a
> 10GigE ethernet fabric. Each node has dual eth adapters. One 1GigE and
> the other 10GigE. These are on seperate subnets although the order of
> the eth interfaces
, Rahul Nabar <rpna...@gmail.com> wrote:
From: Rahul Nabar <rpna...@gmail.com>
Subject: Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts:
debug ideas?
To: "Open MPI Users" <us...@open-mpi.org>
Received: Wednesday, 25 August, 2010, 3:38 AM
On Mon, Aug 2
On Aug 24, 2010, at 1:58 PM, Rahul Nabar wrote:
> There are a few unusual things about the cluster. We are using a
> 10GigE ethernet fabric. Each node has dual eth adapters. One 1GigE and
> the other 10GigE. These are on seperate subnets although the order of
> the eth interfaces is variable.
On Mon, Aug 23, 2010 at 9:43 PM, Richard Treumann wrote:
> Bugs are always a possibility but unless there is something very unusual
> about the cluster and interconnect or this is an unstable version of MPI, it
> seems very unlikely this use of MPI_Bcast with so few tasks and
On Mon, Aug 23, 2010 at 8:39 PM, Randolph Pullen
wrote:
>
> I have had a similar load related problem with Bcast.
Thanks Randolph! That's interesting to know! What was the hardware you
were using? Does your bcast fail at the exact same point too?
>
> I don't know
On Mon, Aug 23, 2010 at 6:39 PM, Richard Treumann wrote:
> It is hard to imagine how a total data load of 41,943,040 bytes could be a
> problem. That is really not much data. By the time the BCAST is done, each
> task (except root) will have received a single half meg message
Network saturation could produce arbitrary long delays the total data load
we are talking about is really small. It is the responsibility of an MPI
library to do one of the following:
1) Use a reliable message protocol for each message (e.g. Infiniband RC or
TCP/IP)
2) detect lost packets and
s.ibm.com>
Subject: Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts:
debug ideas?
To: "Open MPI Users" <us...@open-mpi.org>
Received: Tuesday, 24 August, 2010, 9:39 AM
It is hard to imagine how a total data
load of 41,943,040 bytes could be a problem.
& Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
users-boun...@open-mpi.org wrote on 08/23/2010 05:09:56 PM:
> [image removed]
>
> Re: [OMPI users] IMB-MPI broadcast test stalls for large core
> cou
wrote:
From: Rahul Nabar <rpna...@gmail.com>
Subject: [OMPI users] IMB-MPI broadcast test stalls for large core counts:
debug ideas?
To: "Open MPI Users" <us...@open-mpi.org>
Received: Friday, 20 August, 2010, 12:03 PM
My Intel IMB-MPI tests stall, but only in very specif
My Intel IMB-MPI tests stall, but only in very specific cases:larger
packet sizes + large core counts. Only happens for bcast, gather and
exchange tests. Only for the larger core counts (~256 cores). Other
tests like pingpong and sendrecev run fine even with larger core
counts.
e.g. This bcast
16 matches
Mail list logo