Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-09-23 Thread Peter Kjellstrom
On Wednesday 23 September 2009, Rahul Nabar wrote: > On Tue, Aug 18, 2009 at 5:28 PM, Gerry Creager wrote: > > Most of that bandwidth is in marketing...  Sorry, but it's not a high > > performance switch. > > Well, how does one figure out what exactly is a "hih performance > switch"? IMHO 1G Eth

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-09-23 Thread Joe Landman
Rahul Nabar wrote: On Tue, Aug 18, 2009 at 5:28 PM, Gerry Creager wrote: Most of that bandwidth is in marketing... Sorry, but it's not a high performance switch. Well, how does one figure out what exactly is a "hih performance switch"? I've found this an exceedingly hard task. Like the OP po

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-09-23 Thread Rahul Nabar
On Tue, Aug 18, 2009 at 5:28 PM, Gerry Creager wrote: > Most of that bandwidth is in marketing...  Sorry, but it's not a high > performance switch. Well, how does one figure out what exactly is a "hih performance switch"? I've found this an exceedingly hard task. Like the OP posted the Dell 6248

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-24 Thread jimkress_58
scaling problem). I do not believe VASP has done the same. Jim -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Gus Correa Sent: Tuesday, August 18, 2009 6:43 PM To: Open MPI Users Subject: Re: [OMPI users] very bad parallel scaling of

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread Gus Correa
Hi Craig, list Independent of any issues with your GigE switch, which you may need to address, you may want to take a look at the performance of the default OpenMPI MPI_Alltoall algorithm, which you say is a cornerstone of VASP. You can perhaps try alternative algorithms for different message siz

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread Gerry Creager
Most of that bandwidth is in marketing... Sorry, but it's not a high performance switch. Craig Plaisance wrote: The switch we are using (Dell Powerconnect 6248) has a switching fabric capacity of 184 Gb/s, which should be more than adequate for the 48 ports. Is this the same as backplane ban

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread Patrick Geoffray
Craig Plaisance wrote: So is this a problem with the physical switch (we need a better switch) or with the configuration of the switch (we need to configure the switch or configure the os to work with the switch)? You may want to look if you are dropping packets somewhere. You can look at the

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread Craig Plaisance
So is this a problem with the physical switch (we need a better switch) or with the configuration of the switch (we need to configure the switch or configure the os to work with the switch)?

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread Joe Landman
Craig Plaisance wrote: The switch we are using (Dell Powerconnect 6248) has a switching fabric capacity of 184 Gb/s, which should be more than adequate for the 48 ports. Is this the same as backplane bandwidth? Yes. If you are getting the behavior you describe, you are not getting all that

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread Craig Plaisance
The switch we are using (Dell Powerconnect 6248) has a switching fabric capacity of 184 Gb/s, which should be more than adequate for the 48 ports. Is this the same as backplane bandwidth?

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread Joe Landman
Craig Plaisance wrote: mpich2 now and post the results. So, does anyone know what causes the wild oscillations in the throughput at larger message sizes and higher network traffic? Thanks! Your switch can't handle this amount of traffic on its backplane. We have seen this often in similar

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread Craig Plaisance
I ran a test of tcp using NetPIPE and got throughput of 850 Mb/s at message sizes of 128 Kb. The latency was 50 us. At message sizes above 1000 Kb, the throughput oscillated wildly between 850 Mb/s and values as low as 200 Mb/s. This test was done with no other network traffic. I then ran f

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread Joe Landman
Craig Plaisance wrote: Hi - I have compiled vasp 4.6.34 using the Intel fortran compiler 11.1 with openmpi 1.3.3 on a cluster of 104 nodes running Rocks 5.2 with two quad core opterons connected by a Gbit ethernet. Running in parallel on Latency of gigabit is likely your issue. Lower qualit

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-18 Thread jimkress_58
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Monday, August 17, 2009 9:24 PM To: Open MPI Users Cc: David Hibbitts Subject: Re: [OMPI users] very bad parallel scaling of vasp using openmpi You might want to run some performance testing of yo

Re: [OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-17 Thread Jeff Squyres
You might want to run some performance testing of you TCP stacks and the switch -- use a non-MPI application such as NetPIPE (or others -- google around) and see what kind of throughput you get. Try it between individual server peers and then try running it simultaneously between a bunch o

[OMPI users] very bad parallel scaling of vasp using openmpi

2009-08-17 Thread Craig Plaisance
Hi - I have compiled vasp 4.6.34 using the Intel fortran compiler 11.1 with openmpi 1.3.3 on a cluster of 104 nodes running Rocks 5.2 with two quad core opterons connected by a Gbit ethernet. Running in parallel on one node (8 cores) runs very well, faster than any other cluster I have run it