Julian Seward wrote:
> On Thursday, July 29, 2010, Barry L. Rountree wrote:
>> John Reiser wrote:
>>>>> ...  Valgrind could stop
>>>>> simulated CPU and revert back to the real CPU part way through
>>>>> program execution."
>>>>>
>>>>> I'm not seeing this mentioned anywhere else in the documentation.
>>>>> Does this capability still exist?
>>>> No .. that description I think is somewhat out of date.  There
>>>> is no provision for switching back to native execution.
>>> In case somebody is really motivated, then example code for such
>>> a feature for x86 can be found by searching for the string 'letgo' in
>>> http://**bitwagon.com/valgrind+uml/valgrind-3.3.0-2007-12-27.patch.gz
>> And reversing that process would put you back into emulation mode?
>>
>> (Yes, that question is eliding a lot of hairy details.  But if the
>> program is started under valgrind and valgrind let's it go back into
>> native mode, then switching back to valgrind should /just/ [!] be a
>> question of updating valgrind's registers and hitting the big green
>> "GO" button, right?)
> 
> But how would you do that?  Once you switch to running on the
> real CPU, you lose all control and you no longer have the 
> ability to decide when to switch back to emulation.
> 
> Additionally, as John points out, the whole point of running
> on an emulator is to collect side-data about what's going on.
> For some tools (eg profilers) the missing parts of the execution
> is not too bad, you'll just get wrong statistics.  But for the
> error checking tools, at least Memcheck and the thread checkers,
> you'll get guaranteed absolute chaos.
> 
> So .. what is it you're _really_ trying to do?
> 

I'm trying to parallelize valgrind in general, and memcheck in particular.

The big picture looks like this:  you have a buggy serial program
that takes a long time to run.  You have a few supercomputers onsite.
Run the serial program simultaneously on N cores where each core only
instruments 1/Nth of the program.  That will get you N nonsensical
error reports which, when stitched together, should reduce to one
sensical error report.

Once that's working, porting it to MPI looks to be pretty straightforward.
If you want to instrument node 0 on an MPI application, use a PMPI library
that fires off N "fake" node 0s and run the parallel version of valgrind
on those.  Any MPI messages sent to the real node 0 get duplicated to the
fake node 0s and MPI messages sent from the fake node 0s get dropped on
the floor.  Things get a little more complicated with some of the more
obscure collective communication calls, but this is similar enough to work
I've done in the past that I think it's a tractable problem.

Back to the implementation details:  I've been working my way through
the source, documentation and publications for the past couple of weeks
and I've got four different parallelization approaches sketched out.  If
John Reiser's patch works with the latest version and I can use that
as a template to go from native to valgrind, then that looks to be the
most straightforward approach.

As to how to notify that app that it's supposed to go back into valgrind,
that can be as simple as switching back in every X instructions and use
PAPI to figure out when that happens.  I could also hack something up on
the MPI layer, but I'd like this to be usable for non-MPI applications
as well.

Comments are very welcome.  When I proposed this project I thought I could
get away with a combination of turning instrumentation on and off and
ignoring address ranges.  That only gets you down to the nullgrind overhead,
and for this to be interesting the slowdown needs to be < 2x for large N.
I think that's doable, but I need to understand the internals of coregrind
first.

Your thoughts?

Thanks,

Barry Rountree

> J
> 


------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Valgrind-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to