Re: [Valgrind-developers] helgrind: various false negatives/positves

Konstantin Serebryany Wed, 30 Jan 2008 10:42:48 -0800

>
>
> My impression of MSMProp1 is that it is like a pure happens-before
> detector, except that it deals with locks using locksets rather than
> by generating happens-before edges in the graph.



I have the same impression :)


>
>
> Indeed, I wonder if it is possible to take the formal description of
> MSMProp1 and from it derive the formal description of a pure happens-
> before detector I posted earlier in this thread (Wed Jan 23 14:30:21 2008)
> by (1) removing all locksets from the MSMProp1 description, and
> (2) adding happens-before edges for lock/unlock events.  It would
> be interesting to try.


I shall try this... But after all we have exp-drd.
The problem with pure HB is (iiuc) that it has too many false negatives (not
mentioning performance and scalability):

// First:                             Second:
// 1. write
// 2. MU.Lock()
// 3. MU.Unlock()                      (sleep)
//                                    a. MU.Lock()
//                                    b. MU.Unlock();
//                                    c. write

Pure HB will not see a race between the writes because you have HB edge
between Unlock() and Lock().
Both MSMHelgrind and MSMProp1 detect this race (test47), while exp-drd does
not.


> I don't think you can have zero false positives and zero false
> negatives at the same time (holy grail ...), because all these state
> machines are different approximations to something which is NP-complete
> (and therefore which we cannot compute exactly).


Oh, sure.
I think that it makes sense to have several different machines in helgrind
so that a user can choose one that fits his application best.
For example, if an application never creates/join threads (except for
startup/shutdown) and does not use semahpores, cond vars, message queues,
etc, Eraser will be enough.
If on contrary the application never use locks (but uses barriers,
semaphores, message queues, etc) locksets are useless.
The applications I try to analyze use both locks and HB stuff, so I need
both.




> > We, however, still have this scan-all-memory in
> shadow_mem_make_NoAccess.
> > The comment says:
> >    7. Modify all shadow words, by removing ToDelete from the lockset
> >       of all ShM and ShR states.  Note this involves a complete scan
> >       over map_shmem, which is very expensive...
> >
> > Do we really need this?
> > If we don't do that and if a new lock gets allocated at the same memory
> > location we may miss some very weird race.
> > Any other reason for doing this?
>
> I only did it to be clean / safe.  Otherwise I think if you deallocate
> a lock and allocate a new one at the same address, very strange things
> will happen.
>
> Is it really needed?  I don't know.  But I don't know how to prove/argue
> that it is not needed.  Alternative question is, is it possible to do
> this cheaply/incrementally?


I thought about using unique lock id in locksets instead of Lock*.
It leads to a problem of mapping from the id to Lock* though... But we only
need this while reporting a race.



> There is also a different problem for the scaling of Helgrind to
> very large programs: the Lock structures are freed when locks are
> destroyed, but Segment and Thread are never freed.  So a program
> which generates a very large number of Segments, or Threads, will
> eventually cause Helgrind to run out of memory.  I can see that
> Helgrind's storage management strategy will need to be redesigned
> at some future point.


Yea... Perhaps we could  garbage-collect old segments and if some SVal still
refers to a deleted one, behave conservatively.
Not sure about threads... If a program creates more than 1M threads, we are
in trouble performance-wise anyway.



> Are you intending to make a revised MSMProp1 patch that can deal
> with larger programs?


Certainly. Right now it is ~60% slower than MSMHelgrind, but there are still
things to improve there.
I've been trying both machines on some of my applications. MSMProp1 reports
~30% more races.
Those extra races look similar to test10. Most (but not all!) of them look
harmless which makes me think about classifying data races somehow.
See e.g. http://www.eecs.umich.edu/~nsatish/DataRace-PLDI07.html.

Also with MSMProp1 I see less false positives on my apps.



> I would like to compare it to MSMHelgrind on OpenOffice and Firefox.



Trying to run OpenOffice and Firefox on my machine leads to this.

--1052-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) -
exiting
--1052-- si_code=1;  Faulting address: 0x0;  sp: 0x478CDF4

valgrind: the 'impossible' happened:
   Killed by fatal signal
==1052==    at 0x3801979C: vgPlain_strlen (m_libcbase.c:232)
==1052==    by 0x38010941: evh__pre_mem_read_asciiz (hg_main.c:5763)
==1052==    by 0x38037D47: vgPlain_client_syscall (syswrap-main.c:850)
==1052==    by 0x38035899: vgPlain_scheduler (scheduler.c:790)
==1052==    by 0x38048C0D: run_a_thread_NORETURN (syswrap-linux.c:89)

I have better luck with konqueror, I'll try to compare the two machines on
it.

Any set of *unit* tests won't be enough to evaluate new machines, but it
would be very useful to make the first steps.
I beg everyone interested to contribute to racecheck_unittest or similar.


--kcc

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

_______________________________________________
Valgrind-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-developers

Re: [Valgrind-developers] helgrind: various false negatives/positves

Reply via email to