On Wed, Feb 20, 2008 at 11:51 PM, Julian Seward <[EMAIL PROTECTED]> wrote: > I would add that > POSIX pthreads is the de-facto standard a way to do shared > memory programming, and MPI is the de-facto standard way to do > message passing. > > I'm sure that message-passing has some failure modes (deadlocks) > in common with shared memory programming, and I wouldn't be at > all surprised to hear it could suffer from races too.
The following failures can occur in message-passing software: * Deadlocks. A deadlock can occur when two or more threads or processes pass messages synchronously and are waiting in a cycle on each other. A deadlock can also occur when synchronously sending a message to a queue that has reached its maximal size, and the messaging implementation is such that it waits in this case. * Race conditions. If two or more threads or processes send a message to the same destination thread or process, and the receiver does not specify which sender it wants to receive from, then this is a race condition. * Message queue size problems. Some message passing implementations use queues with no other bound on the queue size than the amount of available memory. For a programmer it can be very convenient not having to compute the maximal possible size of a message queue. I consider this as a design bug however -- if it is not known at design time how big a message queue can become, how can it be known that the queue will fit in the available memory ? A typical problem that occurs in multithreaded software that uses message passing instead of shared memory for interthread communication is performance degradation due to cache misses. If each thread performs a small task, then frequently work has to be passed between threads and there are many context switches needed. Each context switch has a performance penalty not only because of the time needed for the context switch but more importantly because of the cache misses it causes. > One of the things I have come to realise in the past year or so > is what a terrible programming model explicit shared-memory parallelism > is. It's simply too hard for humans to understand and reason about > (in all but the most trivial of applications): even small threaded > programs are extremely hard to make sense of. It depends. Although understanding concurrent activities is always hard, it is possible to write multithreaded software that is relatively easy to read and to maintain. What I have learned during the past ten years about writing multithreaded software is a.o. the following: * Encapsulate the mutex and thread concepts in an object, this makes multithreaded software more compact and saves a lot of typing. * Use scoped locking instead of explicit lock / unlock calls. * Never use POSIX threads condition variables directly -- use higher level abstractions, e.g. the Mesa-style monitor concept. Using POSIX condition variables directly can easily introduce race conditions. * Encapsulate all thread-shared data in classes. Make sure these classes have a limited number of data members and a limited number of member functions. This makes it possible for a human to verify a locking policy via source reading. * Make sure that the locking policy can be verified by verifying one class at a time. This implies that classes may never return references to their members (which violates data hiding anyway). * With regard to deadlock avoidance, assign a locking order to mutexes and other synchronization objects -- a locking order is the order in which nested locking must be performed. * Make sure the locking order is verified at runtime. This is possible either by using a threading library that supports this or via a tool that verifies this at runtime. Verifying the locking order reduces the complexity of deadlock detection from a multithreaded to a single-threaded problem. This is a huge win for reproducibility. * Make sure the software consists of modules, and that there exists a hierarchy between modules. This is necessary such that function calls can be labeled as either "high-level module calls low-level module" (downcall) or a "callback". * Performing a downcall while a mutex is locked is OK. When performing a callback however, make sure that no mutexes are locked by the thread from which the callback is performed. The above guidelines have allowed me to develop (embedded) software that works very well: e.g. a 70 KLOC (embedded) application with about ten threads never crashed or deadlocked at a customer site. There was only one threading-related bug that was reported by a customer: strange data was sometimes displayed in one specific data field. The cause of this was that in one place shared data was not protected by locking. Bart. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Valgrind-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/valgrind-developers
