Re: [OMPI users] Shared Memory Performance Problem.

Ralph Castain Mon, 28 Mar 2011 10:22:16 -0400

The fact that this exactly matches the time you measured with shared memory is 
suspicious. My guess is that you aren't actually using shared memory at all.


Does your "ompi_info" output show shared memory as being available? Jeff or 
others may be able to give you some params that would let you check to see if 
sm is actually being used between those procs.



On Mar 28, 2011, at 7:51 AM, Michele Marena wrote:

> What happens with 2 processes on the same node with tcp?
> With --mca btl self,tcp my app runs in 23s.
> 
> 2011/3/28 Jeff Squyres (jsquyres) <jsquy...@cisco.com>
> Ah, I didn't catch before that there were more variables than just tcp vs. 
> shmem. 
> 
> What happens with 2 processes on the same node with tcp?
> 
> Eg, when both procs are on the same node, are you thrashing caches or memory?
> 
> Sent from my phone. No type good. 
> 
> On Mar 28, 2011, at 6:27 AM, "Michele Marena" <michelemar...@gmail.com> wrote:
> 
>> However, I thank you Tim, Ralh and Jeff.
>> My sequential application runs in 24s (wall clock time).
>> My parallel application runs in 13s with two processes on different nodes.
>> With shared memory, when two processes are on the same node, my app runs in 
>> 23s.
>> I'm not understand why.
>> 
>> 2011/3/28 Jeff Squyres <jsquy...@cisco.com>
>> If your program runs faster across 3 processes, 2 of which are local to each 
>> other, with --mca btl tcp,self compared to --mca btl tcp,sm,self, then 
>> something is very, very strange.
>> 
>> Tim cites all kinds of things that can cause slowdowns, but it's still very, 
>> very odd that simply enabling using the shared memory communications channel 
>> in Open MPI *slows your overall application down*.
>> 
>> How much does your application slow down in wall clock time?  Seconds?  
>> Minutes?  Hours?  (anything less than 1 second is in the noise)
>> 
>> 
>> 
>> On Mar 27, 2011, at 10:33 AM, Ralph Castain wrote:
>> 
>> >
>> > On Mar 27, 2011, at 7:37 AM, Tim Prince wrote:
>> >
>> >> On 3/27/2011 2:26 AM, Michele Marena wrote:
>> >>> Hi,
>> >>> My application performs good without shared memory utilization, but with
>> >>> shared memory I get performance worst than without of it.
>> >>> Do I make a mistake? Don't I pay attention to something?
>> >>> I know OpenMPI uses /tmp directory to allocate shared memory and it is
>> >>> in the local filesystem.
>> >>>
>> >>
>> >> I guess you mean shared memory message passing.   Among relevant 
>> >> parameters may be the message size where your implementation switches 
>> >> from cached copy to non-temporal (if you are on a platform where that 
>> >> terminology is used).  If built with Intel compilers, for example, the 
>> >> copy may be performed by intel_fast_memcpy, with a default setting which 
>> >> uses non-temporal when the message exceeds about some preset size, e.g. 
>> >> 50% of smallest L2 cache for that architecture.
>> >> A quick search for past posts seems to indicate that OpenMPI doesn't 
>> >> itself invoke non-temporal, but there appear to be several useful 
>> >> articles not connected with OpenMPI.
>> >> In case guesses aren't sufficient, it's often necessary to profile 
>> >> (gprof, oprofile, Vtune, ....) to pin this down.
>> >> If shared message slows your application down, the question is whether 
>> >> this is due to excessive eviction of data from cache; not a simple 
>> >> question, as most recent CPUs have 3 levels of cache, and your 
>> >> application may require more or less data which was in use prior to the 
>> >> message receipt, and may use immediately only a small piece of a large 
>> >> message.
>> >
>> > There were several papers published in earlier years about shared memory 
>> > performance in the 1.2 series.There were known problems with that 
>> > implementation, which is why it was heavily revised for the 1.3/4 series.
>> >
>> > You might also look at the following links, though much of it has been 
>> > updated to the 1.3/4 series as we don't really support 1.2 any more:
>> >
>> > http://www.open-mpi.org/faq/?category=sm
>> >
>> > http://www.open-mpi.org/faq/?category=perftools
>> >
>> >
>> >>
>> >> --
>> >> Tim Prince
>> >> _______________________________________________
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Shared Memory Performance Problem.

Reply via email to