> On May 9, 2019, at 12:37 AM, Joseph Schuchart via users
> wrote:
>
> Nathan,
>
> Over the last couple of weeks I made some more interesting observations
> regarding the latencies of accumulate operations on both Aries and InfiniBand
> systems:
>
> 1) There seems to be a sig
I will try to take a look at it today.
-Nathan
> On May 9, 2019, at 12:37 AM, Joseph Schuchart via users
> wrote:
>
> Nathan,
>
> Over the last couple of weeks I made some more interesting observations
> regarding the latencies of accumulate operations on both Aries and InfiniBand
> systems
Benson,
I just gave 4.0.1 a shot and the behavior is the same (the reason I'm
stuck with 3.1.2 is a regression with `osc_rdma_acc_single_intrinsic` on
4.0 [1]).
The IB cluster has both Mellanox ConnectX-3 (w/ Haswell CPU) and
ConnectX-4 (w/ Skylake CPU) nodes, the effect is visible on both n
Hi,
Have you tried anything with OpenMPI 4.0.1?
What are the specifications of the Infiniband system you are using?
Benson
On 5/9/19 9:37 AM, Joseph Schuchart via users wrote:
Nathan,
Over the last couple of weeks I made some more interesting
observations regarding the latencies of accumula
Nathan,
Over the last couple of weeks I made some more interesting observations
regarding the latencies of accumulate operations on both Aries and
InfiniBand systems:
1) There seems to be a significant difference between 64bit and 32bit
operations: on Aries, the average latency for compare-e
Ok, then it sounds like a regression. I will try to track it down today or
tomorrow.
-Nathan
On Nov 08, 2018, at 01:41 PM, Joseph Schuchart wrote:
Sorry for the delay, I wanted to make sure that I test the same version
on both Aries and IB: git master bbe5da4. I realized that I had
previous
Sorry for the delay, I wanted to make sure that I test the same version
on both Aries and IB: git master bbe5da4. I realized that I had
previously tested with 3.1.3 on the IB cluster, which ran fine. If I use
the same version I run into the same problem on both systems (with --mca
btl_openib_al
Quick scan of the program and it looks ok to me. I will dig deeper and see if I can determine the underlying cause.What Open MPI version are you using?-NathanOn Nov 08, 2018, at 11:10 AM, Joseph Schuchart wrote:While using the mca parameter in a real application I noticed a strange effect, which t
While using the mca parameter in a real application I noticed a strange
effect, which took me a while to figure out: It appears that on the
Aries network the accumulate operations are not atomic anymore. I am
attaching a test program that shows the problem: all but one processes
continuously in
Thanks a lot for the quick reply, setting
osc_rdma_acc_single_intrinsic=true does the trick for both shared and
exclusive locks and brings it down to <2us per operation. I hope that
the info key will make it into the next version of the standard, I
certainly have use for it :)
Cheers,
Joseph
All of this is completely expected. Due to the requirements of the standard it
is difficult to make use of network atomics even for MPI_Compare_and_swap
(MPI_Accumulate and MPI_Get_accumulate spoil the party). If you want
MPI_Fetch_and_op to be fast set this MCA parameter:
osc_rdma_acc_sing
All,
I am currently experimenting with MPI atomic operations and wanted to
share some interesting results I am observing. The numbers below are
measurements from both an IB-based cluster and our Cray XC40. The
benchmarks look like the following snippet:
```
if (rank == 1) {
uint64_t re
12 matches
Mail list logo