On Wed, Feb 20, 2008 at 11:51 PM, Julian Seward <[EMAIL PROTECTED]> wrote:
>  I would add that
>  POSIX pthreads is the de-facto standard a way to do shared
>  memory programming, and MPI is the de-facto standard way to do
>  message passing.
>
>  I'm sure that message-passing has some failure modes (deadlocks)
>  in common with shared memory programming, and I wouldn't be at
>  all surprised to hear it could suffer from races too.

The following failures can occur in message-passing software:
* Deadlocks. A deadlock can occur when two or more threads or
processes pass messages synchronously and are waiting in a cycle on
each other. A deadlock can also occur when synchronously sending a
message to a queue that has reached its maximal size, and the
messaging implementation is such that it waits in this case.
* Race conditions. If two or more threads or processes send a message
to the same destination thread or process, and the receiver does not
specify which sender it wants to receive from, then this is a race
condition.
* Message queue size problems. Some message passing implementations
use queues with no other bound on the queue size than the amount of
available memory. For a programmer it can be very convenient not
having to compute the maximal possible size of a message queue. I
consider this as a design bug however -- if it is not known at design
time how big a message queue can become, how can it be known that the
queue will fit in the available memory ?

A typical problem that occurs in multithreaded software that uses
message passing instead of shared memory for interthread communication
is performance degradation due to cache misses. If each thread
performs a small task, then frequently work has to be passed between
threads and there are many context switches needed. Each context
switch has a performance penalty not only because of the time needed
for the context switch but more importantly because of the cache
misses it causes.

>  One of the things I have come to realise in the past year or so
>  is what a terrible programming model explicit shared-memory parallelism
>  is.  It's simply too hard for humans to understand and reason about
>  (in all but the most trivial of applications): even small threaded
>  programs are extremely hard to make sense of.

It depends. Although understanding concurrent activities is always
hard, it is possible to write multithreaded software that is
relatively easy to read and to maintain. What I have learned during
the past ten years about writing multithreaded software is a.o. the
following:
* Encapsulate the mutex and thread concepts in an object, this makes
multithreaded software more compact and saves a lot of typing.
* Use scoped locking instead of explicit lock / unlock calls.
* Never use POSIX threads condition variables directly -- use higher
level abstractions, e.g. the Mesa-style monitor concept. Using POSIX
condition variables directly can easily introduce race conditions.
* Encapsulate all thread-shared data in classes. Make sure these
classes have a limited number of data members and a limited number of
member functions. This makes it possible for a human to verify a
locking policy via source reading.
* Make sure that the locking policy can be verified by verifying one
class at a time. This implies that classes may never return references
to their members (which violates data hiding anyway).
* With regard to deadlock avoidance, assign a locking order to mutexes
and other synchronization objects -- a locking order is the order in
which nested locking must be performed.
* Make sure the locking order is verified at runtime. This is possible
either by using a threading library that supports this or via a tool
that verifies this at runtime. Verifying the locking order reduces the
complexity of deadlock detection from a multithreaded to a
single-threaded problem. This is a huge win for reproducibility.
* Make sure the software consists of modules, and that there exists a
hierarchy between modules. This is necessary such that function calls
can be labeled as either "high-level module calls low-level module"
(downcall) or a "callback".
* Performing a downcall while a mutex is locked is OK. When performing
a callback however, make sure that no mutexes are locked by the thread
from which the callback is performed.

The above guidelines have allowed me to develop (embedded) software
that works very well: e.g. a 70 KLOC (embedded) application with about
ten threads never crashed or deadlocked at a customer site. There was
only one threading-related bug that was reported by a customer:
strange data was sometimes displayed in one specific data field. The
cause of this was that in one place shared data was not protected by
locking.

Bart.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Valgrind-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/valgrind-developers

Reply via email to