Re: [OMPI devel] OMPI vs Scali performance comparisons

2009-03-18 Thread Tim Mattox
That might indicate the source of the bandwidth difference. Open MPI uses the compiler supplied memcpy, which may or may not be particularly fast for a given machine/architecture. Scali could very well be using its own tuned memcpy. On the hulk and tank systems at IU (16 core intel shared mem mach

Re: [OMPI devel] OMPI vs Scali performance comparisons

2009-03-18 Thread Eugene Loh
I don't have access to the machine where my colleague ran. On other machines, it appears that playing with eager or fragsize doesn't change much... and, in any case, OMPI bandwidth is up around memcpy bandwidth. So, maybe the first challenge is reproducing what he saw and/or getting access to

Re: [OMPI devel] OMPI vs Scali performance comparisons

2009-03-18 Thread Terry Dontje
George Bosilca wrote: Something like this. We can play with the eager size too, maybe 4K is too small. george. I guess I am curious why the larger buffer sizes work better? I am curious because we ran into a similar issue on one of our platforms and it turned out to be the non-temporal co

Re: [OMPI devel] OMPI vs Scali performance comparisons

2009-03-18 Thread George Bosilca
Something like this. We can play with the eager size too, maybe 4K is too small. george. On Mar 18, 2009, at 06:43 , Terry Dontje wrote: George Bosilca wrote: The default values for the large message fragments are not optimized for the new generation processors. This might be something

Re: [OMPI devel] OMPI vs Scali performance comparisons

2009-03-18 Thread Terry Dontje
George Bosilca wrote: The default values for the large message fragments are not optimized for the new generation processors. This might be something to investigate, in order to see if we can have the same bandwidth as they do or not. Are you suggesting bumping up the btl_sm_max_send_size value

Re: [OMPI devel] OMPI vs Scali performance comparisons

2009-03-17 Thread Eugene Loh
Jeff Squyres (jsquyres) wrote: Re: [OMPI devel] OMPI vs Scali performance comparisons I still think that the pml fast path fixes would be good. As do I.  Again, I think one needs to go to the BTL sendi as soon as possible after entering the PML, which raised those thorny

Re: [OMPI devel] OMPI vs Scali performance comparisons

2009-03-17 Thread George Bosilca
The default values for the large message fragments are not optimized for the new generation processors. This might be something to investigate, in order to see if we can have the same bandwidth as they do or not. george. On Mar 17, 2009, at 18:23 , Eugene Loh wrote: A colleague of mine

Re: [OMPI devel] OMPI vs Scali performance comparisons

2009-03-17 Thread Jeff Squyres (jsquyres)
I still think that the pml fast path fixes would be good. -jms Sent from my PDA. No type good. - Original Message - From: devel-boun...@open-mpi.org Sent: Tue Mar 17 18:23:18 2009 Subject: [OMPI devel] OMPI vs Scali performance comparisons A colleague of mine ran some microkernels on

[OMPI devel] OMPI vs Scali performance comparisons

2009-03-17 Thread Eugene Loh
A colleague of mine ran some microkernels on an 8-way Barcelona box (Sun x2200M2 at 2.3 GHz). Here are some performance comparisons with Scali. The performance tests are modified versions of the HPCC pingpong tests. The OMPI version is the trunk with my "single-queue" fixes... otherwise, OMP