Re: [OMPI users] dynamic rules

2010-01-16 Thread Roman Martonak
>Terribly sorry, I should checked my own notes thoroughly before giving >others advice. One needs to give the dynamic rules file location on the >command line: > >mpirun -mca coll_tuned_use_dynamic_rules 1 -mca >coll_tuned_dynamic_rules_filename /home/.openmpi/dynamic_rules_file > >That works f

Re: [OMPI users] dynamic rules

2010-01-15 Thread Roman Martonak
>I have done this according to suggestion on this list, until a fix comes >that makes it possible to change via command line: > >To choose bruck for all message sizes / mpi sizes with openmpi-1.4 > >File $HOME/.openmpi/mca-params.conf (replace /homeX) so it points to >the correct file: >coll_tu

[OMPI users] dynamic rules

2010-01-15 Thread Roman Martonak
.local:08011] coll:base:comm_select: component available: tuned, priority: 30 Is there now a way to use other alltoall algorithms instead of the basic linear algorithm in openmpi-1.4.x ? Thanks in advance for any suggestion. Best regards Roman Martonak

[OMPI users] mpi trace visualization

2009-05-30 Thread Roman Martonak
I would like to profile the MPI code using the vampir trace integrated in openmpi-1.3.2. In order to visualize the trace files, apart from commercial vampir, is there some free viewer for the OTF files ? Thanks Roman

Re: [OMPI users] scaling problem with openmpi

2009-05-25 Thread Roman Martonak
I tried the settings suggested by Peter and it indeed helps to improve much more. Running on 64 cores with the line (in dyn_rules) 8192 2 0 0 # 8k+, pairwise 2, no topo or segmentation I get the following bw for 100 x 10 B : 1.9 Mbytes/s time was: 65.4 ms bw for 100 x 20 B

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Roman Martonak
I tried to run with the first dynamic rules file that Pavel proposed and it works, the time per one MD step on 48 cores decreased from 2.8 s to 1.8 s as expected. It was clearly the basic linear algorithm that was causing the problem. I will check the performance of bruck and pairwise on my HW. It

Re: [OMPI users] scaling problem with openmpi

2009-05-20 Thread Roman Martonak
values (to what value) ? Best regards Roman On Wed, May 20, 2009 at 10:39 AM, Peter Kjellstrom wrote: > On Tuesday 19 May 2009, Peter Kjellstrom wrote: >> On Tuesday 19 May 2009, Roman Martonak wrote: >> > On Tue, May 19, 2009 at 3:29 PM, Peter Kjellstrom wrote: >>

Re: [OMPI users] scaling problem with openmpi

2009-05-19 Thread Roman Martonak
On Tue, May 19, 2009 at 3:29 PM, Peter Kjellstrom wrote: > On Tuesday 19 May 2009, Roman Martonak wrote: > ... >> openmpi-1.3.2                           time per one MD step is 3.66 s >>    ELAPSED TIME :    0 HOURS  1 MINUTES 25.90 SECONDS >>  = ALL TO ALL COMM

Re: [OMPI users] scaling problem with openmpi

2009-05-19 Thread Roman Martonak
I am using CPMD 3.11.1, not cp2k. Below are the timings for 20 steps of MD for 32 water molecules (one of standard CPMD benchmarks) with openmpi, mvapich and Intel MPI, running on 64 cores (8 blades, each has 2 quad-core 2.2 GHz AMD Barcelona CPUs). openmpi-1.3.2 time per

Re: [OMPI users] scaling problem with openmpi

2009-05-18 Thread Roman Martonak
I've been using --mca mpi_paffinity_alone 1 in all simulations. Concerning "-mca mpi_leave_pinned 1", I tried it with openmpi 1.2.X versions and it makes no difference. Best regards Roman On Mon, May 18, 2009 at 4:57 PM, Pavel Shamis (Pasha) wrote: > >> >> 1) I was told to add "-mca mpi_leave_

Re: [OMPI users] scaling problem with openmpi

2009-05-16 Thread Roman Martonak
or four times bigger), >> and then go up to a large number of processors also. >> With a larger problem size the scaling may be better too >> (but the runtimes will grow as well). >> >> >> Finally, since you are using Infiniband, and I wonder if all the >> nodes connect to each other with the same latency, or if some >> p

[OMPI users] scaling problem with openmpi

2009-05-15 Thread Roman Martonak
Hello, I observe very poor scaling with openmpi on HP blade system consisting of 8 blades (each having 2 quad-core AMD Barcelona 2.2 GHz CPU) and interconnected with Infiniband fabric. When running the standard cpmd 32 waters test, I observe the following scaling (the numbers are elapsed time) op