time

Jeff Squyres Fri, 14 Nov 2008 09:06:03 -0500

FWIW, I *always* report MPI application time in wall-clock secondstime. I know that some people (even among the OMPI developers)disagree with me, but to me, there's nothing else that you can measurethat makes sense.

Case in point: when using the OpenFabrics network stack, very littletime is spent in the kernel because OpenFabrics networks are designedto bypass the OS (e.g., we spin poll in userspace for OpenFabricsmessage passing progress). Similar is true for shared memory (it's a"network" because we use it to pass messages between MPI processes).But what about TCP? When not using a TOE or other similar technology(i.e., 99.99% of the time), you are making OS syscalls.

Hence, running the same program over these three different networkscan result in hugely different proportions of user vs. system time,even though it's the same app and the same algorithm. Granted, someof the networks are faster than the others, but the network shouldalways be the slowest part of your computation (assuming you have awell-coded application). So which numbers should you report?

In short: the MPI implementation is doing things for you behind thescenes. This raises some obvious questions:

1. Do you report the MPI execution times or not?

1a. If so, how do you account for the differences in networkprogression (and other issues) based on the type of network?1b. If not, how can you separate the MPI time from your applicationtime? (user/system does not make this differentiation; you needadditional tools to separate MPI vs. application time)

To me, only wall-clock execution time makes sense. The overallperformance of your application *includes* the time necessary for MPI/message passing and everything else running on the machine. One ofthe major points of parallel computing is to make things go faster.To measure that, measure the wall-clock time of the application inserial and then measure the wall-clock execution time in parallel(perhaps for various different np values). Then you can (hopefully)see clear, easy-to-understand speedup. To avoid OS-induced jitter andnegative timing effects, most people typically turn off as many OSservices as possible on the nodes that they're running, both forproduction and benchmarking codes (I typically leave such servicesenabled on my software development nodes, because they're helpful fordebugging, etc.).

Is wall-clock execution time the only / best metric? Certainly not.But I strongly prefer it over user/system time -- I just don't thinkthat user/system time tell you what most people think they're tellingyou in a parallel+MPI context.




On Nov 14, 2008, at 4:32 AM, Raymond Wan wrote:

Hi Fabian,
Thank you for clarifying things and confirming some of the thingsthat I thought. I guess I have a clearer understanding now.
Fabian Hänsel wrote:
Hmmmm, I guess user time does not matter since it is real time that
we are interested in reducing.
Right. Even if we *could* measure user time of every MPI workerprocess
correctly this was not what you are interested in: Depending on the
algorithm a significant amount of time could get spend waiting forMPI
messages to arrive - and that time would not count as user time, but
also was not 'wasted' as something important happens.
The reason why I was wondering is that some people in researchpapers compare their algorithm (system) with another one bymeasuring user time since it removes some of the effects of what thesystem does on behalf of the user's process. And some people, Iguess, see this as a fairer comparison.
On the other hand, I guess I've realized the obvious -- that OpenMPI doesn't reduce the efficiency of the algorithm. Even worse,increases in user time is an artifact of Open MPI, so it is somewhatmisleading if we are analyzing an algorithm. What MPI should do (ifproperly used) is to reduce the real time and that's what we shouldbe reporting...even if it includes other things that we did not wantpreviously, like the time spent by the OS in swapping memory, etc.
[Papers I've read with graphs that have "time" on the y-axis and"processors" on the x-axis rarely mention what time they aremeasuring...but it seems obviously now that it must be real timesince user time should [???] increase with more processors.....Ithink...of course, assuming we can total the user time acrossmachines accurately.]
Thank you for your message(s)!  Think I got it now...  :-)

Ray



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] timing + /usr/bin/time

Reply via email to