subject:"\[OMPI users\] IMB\-MPI broadcast test stalls for large core counts\: debug ideas\?"

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-09-01 Thread Rahul Nabar

On Wed, Aug 25, 2010 at 12:14 PM, Jeff Squyres wrote: > It would simplify testing if you could get all the eth0's to be of one type > and on the same subnet, and the same for eth1. > > Once you do that, try using just one of the networks by telling OMPI to use > only one of the devices, somethin

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-26 Thread Rahul Nabar

On Wed, Aug 25, 2010 at 12:14 PM, Jeff Squyres wrote: > Once you do that, try using just one of the networks by telling OMPI to use > only one of the devices, something like this: > > mpirun --mca btl_tcp_if_include eth0 ... Thanks Jeff! Just tried the exact test that you suggested. [rpnabar

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-25 Thread Rahul Nabar

On Wed, Aug 25, 2010 at 6:41 AM, John Hearns wrote: > You could sort that out with udev rules on each machine. Sure. I'd always wanted consistent names for the eth interfaces when I set up the cluster but I couldn't get udev to co-operate. Maybe this time! Let me try. > Look in the directory /et

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-25 Thread Jeff Squyres

On Aug 24, 2010, at 6:26 PM, Rahul Nabar wrote: >> Are all the eth0's on one subnet and all the eth2's on a different subnet? >> >> Or are all eth0's and eth2's all on the same subnet? > > Thanks Jeff! Different subnets. All 10GigE's are on 192.168.x.x and > all 1GigE's are on 10.0.x.x It would

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-25 Thread Rahul Nabar

On Thu, Aug 19, 2010 at 9:03 PM, Rahul Nabar wrote: > -- > gather: > NP256 hangs > NP128 hangs > NP64 hangs > NP32 OK > > Note: "gather" always hangs at the following line of the test: > #bytes #repetitio

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-25 Thread John Hearns

On 24 August 2010 18:58, Rahul Nabar wrote: > There are a few unusual things about the cluster. We are using a > 10GigE ethernet fabric. Each node has dual eth adapters. One 1GigE and > the other 10GigE. These are on seperate subnets although the order of > the eth interfaces is variable. i.e. 10G

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-24 Thread Randolph Pullen

/8/10, Rahul Nabar wrote: From: Rahul Nabar Subject: Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas? To: "Open MPI Users" Received: Wednesday, 25 August, 2010, 3:38 AM On Mon, Aug 23, 2010 at 8:39 PM, Randolph Pullen wrote: > > I have had a

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-24 Thread Rahul Nabar

On Tue, Aug 24, 2010 at 4:58 PM, Jeff Squyres wrote: > Are all the eth0's on one subnet and all the eth2's on a different subnet? > > Or are all eth0's and eth2's all on the same subnet? Thanks Jeff! Different subnets. All 10GigE's are on 192.168.x.x and all 1GigE's are on 10.0.x.x e.g.

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-24 Thread Jeff Squyres

On Aug 24, 2010, at 1:58 PM, Rahul Nabar wrote: > There are a few unusual things about the cluster. We are using a > 10GigE ethernet fabric. Each node has dual eth adapters. One 1GigE and > the other 10GigE. These are on seperate subnets although the order of > the eth interfaces is variable. i.e.

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-24 Thread Rahul Nabar

On Mon, Aug 23, 2010 at 9:43 PM, Richard Treumann wrote: > Bugs are always a possibility but unless there is something very unusual > about the cluster and interconnect or this is an unstable version of MPI, it > seems very unlikely this use of MPI_Bcast with so few tasks and only a 1/2 > MB messa

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-24 Thread Rahul Nabar

On Mon, Aug 23, 2010 at 9:43 PM, Richard Treumann wrote: > Bugs are always a possibility but unless there is something very unusual > about the cluster and interconnect or this is an unstable version of MPI, it My MPI version is 1.4.1. This isn't the latest but still fairly recent. So I assume th

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-24 Thread Rahul Nabar

On Mon, Aug 23, 2010 at 8:39 PM, Randolph Pullen wrote: > > I have had a similar load related problem with Bcast. Thanks Randolph! That's interesting to know! What was the hardware you were using? Does your bcast fail at the exact same point too? > > I don't know what caused it though. With thi

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-24 Thread Rahul Nabar

On Mon, Aug 23, 2010 at 6:39 PM, Richard Treumann wrote: > It is hard to imagine how a total data load of 41,943,040 bytes could be a > problem. That is really not much data. By the time the BCAST is done, each > task (except root) will have received a single half meg message form one > sender. Th

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-23 Thread Richard Treumann

Network saturation could produce arbitrary long delays the total data load we are talking about is really small. It is the responsibility of an MPI library to do one of the following: 1) Use a reliable message protocol for each message (e.g. Infiniband RC or TCP/IP) 2) detect lost packets and

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-23 Thread Randolph Pullen

.org wrote on 08/23/2010 05:09:56 PM: > [image removed] > > Re: [OMPI users] IMB-MPI broadcast test stalls for large core > counts: debug ideas? > > Rahul Nabar > > to: > > Open MPI Users > > 08/23/2010 05:11 PM > > Sent by: >

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-23 Thread Richard Treumann

& Technology Group Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 08/23/2010 05:09:56 PM: > [image removed] > > Re: [OMPI users] IMB-MPI broadcast test stalls for large core > cou

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-23 Thread Rahul Nabar

On Sun, Aug 22, 2010 at 9:57 PM, Randolph Pullen < randolph_pul...@yahoo.com.au> wrote: > Its a long shot but could it be related to the total data volume ? > ie 524288 * 80 = 41943040 bytes active in the cluster > > Can you exceed this 41943040 data volume with a smaller message repeated > more

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-22 Thread Randolph Pullen

Subject: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas? To: "Open MPI Users" Received: Friday, 20 August, 2010, 12:03 PM My Intel IMB-MPI tests stall, but only in very specific cases:larger packet sizes + large core counts. Only happens for bcast,

[OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-19 Thread Rahul Nabar

My Intel IMB-MPI tests stall, but only in very specific cases:larger packet sizes + large core counts. Only happens for bcast, gather and exchange tests. Only for the larger core counts (~256 cores). Other tests like pingpong and sendrecev run fine even with larger core counts. e.g. This bcast tes

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

[OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

19 matches

Site Navigation

Mail list logo

Footer information