>Terribly sorry, I should checked my own notes thoroughly before giving
>others advice. One needs to give the dynamic rules file location on the
>command line:
>
>mpirun -mca coll_tuned_use_dynamic_rules 1 -mca
>coll_tuned_dynamic_rules_filename /home/.openmpi/dynamic_rules_file
>
>That works f
>I have done this according to suggestion on this list, until a fix comes
>that makes it possible to change via command line:
>
>To choose bruck for all message sizes / mpi sizes with openmpi-1.4
>
>File $HOME/.openmpi/mca-params.conf (replace /homeX) so it points to
>the correct file:
>coll_tu
.local:08011] coll:base:comm_select: component available:
tuned, priority: 30
Is there now a way to use other alltoall algorithms instead of the
basic linear algorithm in openmpi-1.4.x ?
Thanks in advance for any suggestion.
Best regards
Roman Martonak
I would like to profile the MPI code using the vampir trace integrated
in openmpi-1.3.2. In order to visualize
the trace files, apart from commercial vampir, is there some free
viewer for the OTF files ?
Thanks
Roman
I tried the settings suggested by Peter and it indeed helps to improve
much more. Running on 64 cores with the line (in dyn_rules)
8192 2 0 0 # 8k+, pairwise 2, no topo or segmentation
I get the following
bw for 100 x 10 B : 1.9 Mbytes/s time was: 65.4 ms
bw for 100 x 20 B
I tried to run with the first dynamic rules file that Pavel proposed
and it works, the time per one MD step on 48 cores decreased from 2.8
s to 1.8 s as expected. It was clearly the basic linear algorithm that
was causing the problem. I will check the performance of bruck and
pairwise on my HW. It
values (to
what value) ?
Best regards
Roman
On Wed, May 20, 2009 at 10:39 AM, Peter Kjellstrom wrote:
> On Tuesday 19 May 2009, Peter Kjellstrom wrote:
>> On Tuesday 19 May 2009, Roman Martonak wrote:
>> > On Tue, May 19, 2009 at 3:29 PM, Peter Kjellstrom wrote:
>>
On Tue, May 19, 2009 at 3:29 PM, Peter Kjellstrom wrote:
> On Tuesday 19 May 2009, Roman Martonak wrote:
> ...
>> openmpi-1.3.2 time per one MD step is 3.66 s
>> ELAPSED TIME : 0 HOURS 1 MINUTES 25.90 SECONDS
>> = ALL TO ALL COMM
I am using CPMD 3.11.1, not cp2k. Below are the timings for 20 steps
of MD for 32 water molecules (one of standard CPMD benchmarks) with
openmpi, mvapich and Intel MPI, running on 64 cores (8 blades, each
has 2 quad-core 2.2 GHz AMD Barcelona CPUs).
openmpi-1.3.2 time per
I've been using --mca mpi_paffinity_alone 1 in all simulations. Concerning "-mca
mpi_leave_pinned 1", I tried it with openmpi 1.2.X versions and it
makes no difference.
Best regards
Roman
On Mon, May 18, 2009 at 4:57 PM, Pavel Shamis (Pasha) wrote:
>
>>
>> 1) I was told to add "-mca mpi_leave_
or four times bigger),
>> and then go up to a large number of processors also.
>> With a larger problem size the scaling may be better too
>> (but the runtimes will grow as well).
>>
>>
>> Finally, since you are using Infiniband, and I wonder if all the
>> nodes connect to each other with the same latency, or if some
>> p
Hello,
I observe very poor scaling with openmpi on HP blade system consisting
of 8 blades (each having 2 quad-core AMD Barcelona 2.2 GHz CPU) and
interconnected with Infiniband fabric. When running the standard cpmd
32 waters test, I observe the following scaling (the numbers are
elapsed time)
op
12 matches
Mail list logo