Author: sewardj
Date: 2007-11-05 00:25:53 +0000 (Mon, 05 Nov 2007)
New Revision: 7089

Log:
Finish first version of the manual.  Gaaaah!

Modified:
   branches/THRCHECK/thrcheck/docs/tc-manual.xml


Modified: branches/THRCHECK/thrcheck/docs/tc-manual.xml
===================================================================
--- branches/THRCHECK/thrcheck/docs/tc-manual.xml       2007-11-04 17:11:19 UTC 
(rev 7088)
+++ branches/THRCHECK/thrcheck/docs/tc-manual.xml       2007-11-05 00:25:53 UTC 
(rev 7089)
@@ -38,26 +38,35 @@
 
 <orderedlist>
  <listitem>
-  <para>Section FIXME: Misuses of the POSIX pthreads API.</para>
+  <para><link linkend="tc-manual.api-checks">
+        Misuses of the POSIX pthreads API.</link></para>
  </listitem>
  <listitem>
-  <para>Section FIXME: Potential deadlocks arising from lock ordering
-  problems.</para>
+  <para><link linkend="tc-manual.lock-orders">
+        Potential deadlocks arising from lock
+        ordering problems.</link></para>
  </listitem>
  <listitem>
-  <para>Section FIXME: Data races -- accessing memory without adequate
-  locking.</para>
+  <para><link linkend="tc-manual.data-races">
+        Data races -- accessing memory without adequate locking.
+        </link></para>
  </listitem>
 </orderedlist>
 
-<para>Section FIXME contains guidance on how to get the best out of Thrcheck.
-This is really a discussion about debugging strategies and how to
-organise your program to enhance its verifiability.</para>
+<para>Following those is a section containing 
+<link linkend="tc-manual.effective-use">
+hints and tips on how to get the best out of Thrcheck.</link>
+</para>
 
-<para>Section FIXME contains a summary of command-line options.</para>
+<para>Then there is a
+<link linkend="tc-manual.options">summary of command-line
+options.</link>
+</para>
 
-<para>Finally, Section FIXME contains a brief list of areas in which Thrcheck
-could be improved.</para>
+<para>Finally, there is 
+<link linkend="tc-manual.todolist">a brief summary of areas in which Thrcheck
+could be improved.</link>
+</para>
 
 </sect1>
 
@@ -71,7 +80,7 @@
 is therefore able to report on various common problems.  Although
 these are unglamourous errors, their presence can lead to undefined
 program behaviour and hard-to-find bugs later in execution.  The
-detected errors include:</para>
+detected errors are:</para>
 
 <itemizedlist>
  <listitem><para>unlocking an invalid mutex</para></listitem>
@@ -87,11 +96,20 @@
                  versa</para></listitem>
  <listitem><para>when a POSIX pthread function fails with an
                  error code that must be handled</para></listitem>
+ <listitem><para>when a thread exits whilst still holding locked
+                 locks</para></listitem>
+ <listitem><para>calling <computeroutput>pthread_cond_wait</computeroutput>
+                 with a not-locked mutex, or one locked by a different
+                 thread</para></listitem>
 </itemizedlist>
 
 <para>Checks pertaining to the validity of mutexes are generally also
 performed for reader-writer locks.</para>
 
+<para>Various kinds of this-can't-possibly-happen events are also
+reported.  These usually indicate bugs in the system threading
+library.</para>
+
 <para>Reported errors always contain a primary stack trace indicating
 where the error was detected.  They may also contain auxiliary stack
 traces giving additional information.  In particular, most errors
@@ -114,8 +132,9 @@
 <para>Thrcheck has a way of summarising thread identities, as
 evidenced here by the text "<computeroutput>Thread
 #1</computeroutput>".  This is so that it can speak about threads and
-sets of threads without overwhelming you with details.  See FIXME
-below for details.</para>
+sets of threads without overwhelming you with details.  See 
+<link linkend="tc-manual.data-races.errmsgs">below</link>
+for more information on interpreting error messages.</para>
 
 </sect1>
 
@@ -219,7 +238,7 @@
 
 
 <sect2 id="tc-manual.data-races.example" xreflabel="Simple Race">
-<title>A simple data race</title>
+<title>A Simple Data Race</title>
 
 <para>About the simplest possible example of a race is as follows.  In
 this program, it is impossible to know what the value
@@ -371,13 +390,20 @@
 attempts to write to it, a race is then reported.</para>
 
 <para>This technique is known as "lockset inference" and was
-introduced in FIXME.  It has been widely implemented since then.
-Thrcheck incorporates several refinements aimed at reducing the false
-error rate generated by a naive version of the algorithm.  In section
-FIXME a summary of the complete algorithm used by Thrcheck is
-presented.  First, however, it is important to understand details of
-transitions pertaining to the Exclusive-ownership state.</para>
+introduced in: "Eraser: A Dynamic Data Race Detector for Multithreaded
+Programs" (Stefan Savage, Michael Burrows, Greg Nelson, Patrick
+Sobalvarro and Thomas Anderson, ACM Transactions on Computer Systems,
+15(4):391-411, November 1997).</para>
 
+<para>Lockset inference has since been widely implemented, studied and
+extended.  Thrcheck incorporates several refinements aimed at avoiding
+the high false error rate that naive versions of the algorithm suffer
+from.  A 
+<link linkend="tc-manual.data-races.summary">summary of the complete
+algorithm used by Thrcheck</link> is presented below.  First, however,
+it is important to understand details of transitions pertaining to the
+Exclusive-ownership state.</para>
+
 </sect2>
 
 
@@ -449,10 +475,13 @@
 pthread_create and pthread_join calls, an error is still
 reported.</para>
 
-<para>This technique was introduced in FIXME with the name Thread
-Lifetime Segments.  Thrcheck implements an extended version of it.
-Specifically, Thrcheck allows transfer of exclusive ownership in the
-following situations:</para>
+<para>This technique was introduced with the name "thread lifetime
+segments" in "Runtime Checking of Multithreaded Applications with
+Visual Threads" (Jerry J. Harrow, Jr, Proceedings of the 7th
+International SPIN Workshop on Model Checking of Software Stanford,
+California, USA, August 2000, LNCS 1885, pp331--342).  Thrcheck
+implements an extended version of it.  Specifically, Thrcheck allows
+transfer of exclusive ownership in the following situations:</para>
 
 <itemizedlist>
  <listitem><para>At thread creation: a child can acquire ownership of
@@ -464,16 +493,16 @@
   (thread that is exiting) at the point it exited.</para>
  </listitem>
  <listitem><para>At condition variable signallings and broadcasts.  A
-  thread Twait which completes a pthread_cond_wait call as a result of
+  thread Tw which completes a pthread_cond_wait call as a result of
   a signal or broadcast on the same condition variable by some other
-  thread Tsig, may acquire ownership of memory held exclusively by
-  Tsig prior to the pthread_cond_signal/broadcast
+  thread Ts, may acquire ownership of memory held exclusively by
+  Ts prior to the pthread_cond_signal/broadcast
   call.</para>
  </listitem>
- <listitem><para>At semaphore posts (sem_post) calls.  A thread Twait
+ <listitem><para>At semaphore posts (sem_post) calls.  A thread Tw
   which completes a sem_wait call call as a result of a sem_post call
-  on the same semaphore by some other thread Tpost, may acquire
-  ownership of memory held exclusively by Tpost prior to the sem_post
+  on the same semaphore by some other thread Tp, may acquire
+  ownership of memory held exclusively by Tp prior to the sem_post
   call.</para>
  </listitem>
 </itemizedlist>
@@ -809,8 +838,9 @@
 
 <para>Also, this message implies that Thrcheck did not see any
 synchronisation event between threads #4 and #5 that would have
-allowed #5 to acquire exclusive ownership from #4.  See FIXME for a
-discussion of transfers of exclusive ownership states between
+allowed #5 to acquire exclusive ownership from #4.  See
+<link linkend="tc-manual.data-races.exclusive">above</link>
+for a discussion of transfers of exclusive ownership states between
 threads.</para>
 
 </sect2>
@@ -936,9 +966,10 @@
 
     <para>The root cause of this synchronisation lossage is
     particularly hard to understand, so an example is helpful.  It was
-    discussed at length by Arndt Muehlenfeldt [FIXME].  The canonical
-    POSIX-recommended usage scheme for condition variables is as
-    follows:</para>
+    discussed at length by Arndt Muehlenfeld ("Runtime Race Detection
+    in Multi-Threaded Programs", Dissertation, TU Graz, Austria).  The
+    canonical POSIX-recommended usage scheme for condition variables
+    is as follows:</para>
 
 <programlisting><![CDATA[
 b   is a Boolean condition, which is False most of the time
@@ -981,9 +1012,9 @@
   <listitem>
     <para>Make sure you are using a supported Linux distribution.  At
     present, Thrcheck only properly supports x86-linux and amd64-linux
-    with glibc-2.3 or later.  The latter restriction really says that
-    we only support the NPTL threading library.  The old LinuxThreads
-    library is not supported.</para>
+    with glibc-2.3 or later.  The latter restriction means we only
+    support the NPTL threading library.  The old LinuxThreads library
+    is not supported.</para>
 
     <para>Unsupported targets may work to varying degrees.  In
     particular ppc32-linux and ppc64-linux running NTPL should work,
@@ -992,6 +1023,19 @@
     the lwarx/stwcx instructions.</para>
   </listitem>
 
+  <listitem>
+    <para>POSIX requires that implementations of standard I/O (printf,
+    fprintf, fwrite, fread, etc) are thread safe.  Unfortunately GNU
+    libc implements this by using internal locking primitives that
+    Thrcheck is unable to intercept.  Consequently Thrcheck generates
+    many false race reports when you use these functions.</para>
+
+    <para>Thrcheck attempts to hide these errors using the standard
+    Valgrind error-suppression mechanism.  So, at least for simple
+    test cases, you don't see any.  Nevertheless, some may slip
+    through.  Just something to be aware of.</para>
+  </listitem>
+
 </orderedlist>
 
 </sect1>
@@ -1002,48 +1046,39 @@
 <sect1 id="tc-manual.options" xreflabel="Thrcheck Options">
 <title>Thrcheck Options</title>
 
-<para>Currently there is only one Thrcheck-specific option:</para>
+<para>The following end-user options are available:</para>
 
 <!-- start of xi:include in the manpage -->
 <variablelist id="tc.opts.list">
 
   <varlistentry id="opt.happens-before" xreflabel="--happens-before">
     <term>
-      <option><![CDATA[--happens-before=none|threads|condvars
-      [default: condvars] ]]></option>
+      <option><![CDATA[--happens-before=none|threads|all
+      [default: all] ]]></option>
     </term>
     <listitem>
-      <para>This option is mostly useful for debugging Thrcheck
-       itself.  It isn't much use to end users and is a bit difficult
-       to explain.
-      </para>
       <para>Thrcheck always regards locks as the basis for
        inter-thread synchronisation.  However, by default, before
        reporting a race error, Thrcheck will also check whether
        certain other kinds of inter-thread synchronisation events
        happened.  It may be that if such events took place, then no
        race really occurred, and so no error needs to be reported.
-       This enables Thrcheck to correctly handle the
-       worker-thread and worker-thread-pool idioms.
+       See <link linkend="tc-manual.data-races.exclusive">above</link>
+       for a discussion of transfers of exclusive ownership states
+       between threads.
       </para>
-      <para>With <varname>--happens-before=condvars</varname>, both
-       thread creation/joinage, and condition variable
-       signal/broadcast/waits are regarded as sources of
-       synchronisation, and so both the worker-thread and
-       worker-thread-pool idioms are correctly handled.  "Correctly
-       handled" means that Thrcheck will not falsely report race
-       errors for correct uses of these idioms.
+      <para>With <varname>--happens-before=all</varname>, the
+       following events are regarded as sources of synchronisation:
+       thread creation/joinage, condition variable
+       signal/broadcast/waits, and semaphore posts/waits.
       </para>
       <para>With <varname>--happens-before=threads</varname>, only
        thread creation/joinage events are regarded as sources of
-       synchronisation, and so only the worker-thread idiom is
-       correctly handled.  The worker-thread-pool is not correctly
-       handled.
+       synchronisation.
       </para>
       <para>With <varname>--happens-before=none</varname>, no events
        (apart, of course, from locking) are regarded as sources of
-       synchronisation.  And so neither the worker-thread nor
-       worker-thread-pool idioms are correctly handled.
+       synchronisation.
       </para>
       <para>Changing this setting from the default will increase your
        false-error rate but give little or no gain.  The only advantage
@@ -1055,34 +1090,146 @@
     </listitem>
   </varlistentry>
 
+  <varlistentry id="opt.trace-addr" xreflabel="--trace-addr">
+    <term>
+      <option><![CDATA[--trace-addr=0xXXYYZZ
+      ]]></option> and
+      <option><![CDATA[--trace-level=0|1|2 [default: 1]
+      ]]></option>
+    </term>
+    <listitem>
+      <para>Requests that Thrcheck produces a log of all state changes
+      to location 0xXXYYZZ.  This can be helpful in tracking down
+      tricky races.  <varname>--trace-level</varname> controls the
+      verbosity of the log.  At the default setting (1), a one-line
+      summary of is printed for each state change.  At level 2 a
+      complete stack trace is printed for each state change.</para>
+    </listitem>
+  </varlistentry>
+
 </variablelist>
 <!-- end of xi:include in the manpage -->
 
-</sect1>
+<!-- start of xi:include in the manpage -->
+<para>In addition, the following debugging options are available for
+Thrcheck:</para>
 
+<variablelist id="tc.debugopts.list">
 
-<sect1 id="tc-manual.otherstuff" xreflabel="Other Stuff">
-<title>Other Stuff</title>
+  <varlistentry id="opt.trace-malloc" xreflabel="--trace-malloc">
+    <term>
+      <option><![CDATA[--trace-malloc=no|yes [no]
+      ]]></option>
+    </term>
+    <listitem>
+      <para>Show all client malloc (etc) and free (etc) requests.</para>
+    </listitem>
+  </varlistentry>
 
-<para>FIXME: this section will contain other stuff that it is
-important to document:</para>
+  <varlistentry id="opt.gen-vcg" xreflabel="--gen-vcg">
+    <term>
+      <option><![CDATA[--gen-vcg=no|yes|yes-w-vts [no]
+      ]]></option>
+    </term>
+    <listitem>
+      <para>At exit, write to stderr a dump of the happens-before
+       graph computed by Thrcheck, in a format suitable for the VCG 
+        graph visualisation tool.  A suitable command line is:</para>
+      <para><computeroutput>valgrind --tool=thrcheck 
+        --gen-vcg=yes my_app 2&gt;&amp;1
+        | grep xxxxxx | sed "s/xxxxxx//g"
+        | xvcg -</computeroutput></para>
+      <para>With <varname>--gen-vcg=yes</varname>, the basic
+        happens-before graph is shown.  With 
+        <varname>--gen-vcg=yes-w-vts</varname>, the vector timestamp 
+        for each node is also shown.</para>
+    </listitem>
+  </varlistentry>
 
-<itemizedlist>
- <listitem><para>LOCK prefixes on x86/amd64
-                 instructions</para></listitem>
- <listitem><para>Reader-writer locks, and semaphores?
-                 </para></listitem>
- <listitem><para>Other stuff I forgot?
-                 </para></listitem>
-</itemizedlist>
+  <varlistentry id="opt.cmp-race-err-addrs" 
+                xreflabel="--cmp-race-err-addrs">
+    <term>
+      <option><![CDATA[--cmp-race-err-addrs=no|yes [no]
+      ]]></option>
+    </term>
+    <listitem>
+      <para>Controls whether or not race (data) addresses should be
+        taken into account when removing duplicates of race errors.
+        With <varname>--cmp-race-err-addrs=no</varname>, two otherwise
+        identical race errors will be considered to be the same if
+        their race addresses differ.  With
+        With <varname>--cmp-race-err-addrs=yes</varname> they will be
+        considered different.  This is provided to help make certain
+        regression tests work reliably.</para>
+    </listitem>
+  </varlistentry>
 
-Inconsistent cv/mx associations
-Thread exiting whilst holding locks
+  <varlistentry id="opt.tc-sanity-flags" xreflabel="--tc-sanity-flags">
+    <term>
+      <option><![CDATA[--tc-sanity-flags=<XXXXX> (X = 0|1) [00000]
+      ]]></option>
+    </term>
+    <listitem>
+      <para>Run extensive sanity checks on Thrcheck's internal
+        data structures at events defined by the bitstring, as
+        follows:</para>
+      <para><computeroutput>10000 </computeroutput>after changes to
+        the lock order acquisition graph</para>
+      <para><computeroutput>01000 </computeroutput>after every client
+        memory access (NB: not currently used)</para>
+      <para><computeroutput>00100 </computeroutput>after every client
+        memory range permission setting of 256 bytes or greater</para>
+      <para><computeroutput>00010 </computeroutput>after every client
+        lock or unlock event</para>
+      <para><computeroutput>00001 </computeroutput>after every client
+        thread creation or joinage event</para>
+      <para>Note these will make Thrcheck run very slowly, often to
+        the point of being completely unusable.</para>
+    </listitem>
+  </varlistentry>
 
-waiting for a condition variable without holding
-                   the associated mutex
+</variablelist>
+<!-- end of xi:include in the manpage -->
 
-better printing of lock cycles
+
 </sect1>
 
+<sect1 id="tc-manual.todolist" xreflabel="To Do List">
+<title>A To-Do List for Thrcheck</title>
+
+<para>The following is a list of loose ends which should be tidied up
+some time.</para>
+
+<itemizedlist>
+  <listitem><para>Track which mutexes are associated with which
+    condition variables, and emit a warning if this becomes
+    inconsistent.</para>
+  </listitem>
+  <listitem><para>For lock order errors, print the complete lock
+    cycle, rather than only doing for size-2 cycles as at
+    present.</para>
+  </listitem>
+  <listitem><para>Document the VALGRIND_HG_CLEAN_MEMORY client
+    request.</para>
+  </listitem>
+  <listitem><para>Possibly a client request to forcibly transfer
+    ownership of memory from one thread to another.  Requires further
+    consideration.</para>
+  </listitem>
+  <listitem><para>Add a new client request that marks an address range
+    as being "shared-modified with empty lockset" (the error state),
+    and describe how to use it.</para>
+  </listitem>
+  <listitem><para>Document races caused by gcc's thread-unsafe code
+    generation for speculative stores.  In the interim see
+    <computeroutput>http://gcc.gnu.org/ml/gcc/2007-10/msg00266.html
+    </computeroutput>
+    and <computeroutput>http://lkml.org/lkml/2007/10/24/673</computeroutput>.
+    </para>
+  </listitem>
+
+</itemizedlist>
+
+</sect1>
+
 </chapter>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Valgrind-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-developers

Reply via email to