
just for the try - can run np 2

( Ping Pong test is for 2 processes only )

On 8/13/08, Daniël Mantione <daniel.manti...@clustervision.com> wrote:
> On Tue, 12 Aug 2008, Gus Correa wrote:
> > Hello Daniel and list
> >
> > Could it be a problem with memory bandwidth / contention in multi-core?
> Yes, I believe we are somehow limited by memory performance. Here are
> some numbers from a dual Opteron 2352 system, which has much more memory
> bandwidth:
> #---------------------------------------------------
> # Benchmarking PingPong
> # #processes = 2
> # ( 6 additional processes waiting in MPI_Barrier)
> #---------------------------------------------------
>        #bytes #repetitions      t[usec]   Mbytes/sec
>             0         1000         0.86         0.00
>             1         1000         0.97         0.98
>             2         1000         0.95         2.01
>             4         1000         0.96         3.97
>             8         1000         0.95         7.99
>            16         1000         0.96        15.85
>            32         1000         0.99        30.69
>            64         1000         0.97        63.09
>           128         1000         1.02       119.68
>           256         1000         1.18       207.25
>           512         1000         1.40       348.77
>          1024         1000         1.75       556.75
>          2048         1000         2.59       753.22
>          4096         1000         5.10       766.23
>          8192         1000         7.93       985.13
>         16384         1000        14.60      1070.57
>         32768         1000        27.92      1119.23
>         65536          640        46.67      1339.16
>        131072          320        86.03      1453.06
>        262144          160       163.16      1532.21
>        524288           80       310.01      1612.88
>       1048576           40       730.62      1368.69
>       2097152           20      1449.72      1379.57
>       4194304           10      2884.90      1386.53
> However, +/- 1200 MB/s (or +/ 1500 MB/s in case of the AMD system) is not
> even close to the memory performance limits the systems, so there
> should be room for optimization.
> After all, the openib btl manages to tranfer the data from the memory of
> oneprocess to the memory of another process just fine with more
> performance.
> > It has been reported in many mailing lists (mpich, beowulf, etc).
> > Here it seems to happen in dual-processor dual-core with our memory
> intensive
> > programs.
> MPICH2 manages to get about 5GB/s in shared memory performance on the
> Xeon 5420 system.
> > Have you checked what happens to the shared memory runs as you
> > you increase the number of active cores/processes?
> > Would it help to set the processor affinity in the shared memory runs?
> >
> > http://www.open-mpi.org/faq/?category=building#build-paffinity
> > http://www.open-mpi.org/faq/?category=tuning#using-paffinity
> Neither has any effect on the scores.
> Daniël
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to