[tools-compilers] pthreads and store buffer flushing

Darryl Gove Tue, 24 Mar 2009 14:08:56 -0700

Hi,

Yes, this is pretty much correct. I have some comments.

On 03/24/09 01:29 AM, Bert Miemietz wrote:
> Hallo,
> 
> the basic question after all is the question of visibility of any change by 
> one
> thread to other threads. So, what I understood and assume
> from all written (your answers, links, documention) is
> from the higl-level C perspective:
> 
> 1) Any pthread_mutex_unlock (mutex_exit) causes any modified data to be
> written / flushed to memory. Modified data is then visible to any other
> thread / cpu. If a later read access is under a mutex (after 
> pthread_mutex_lock /
> mutex_exit) the data is for sure read from memory, so volatile is not needed
> here, as it is illustrated in the programming examples for cv_wait.

volatile is probably not needed, because the compiler is likely to put 
the store in because there's a function call (to mutex_unlock) and the 
compiler doesn't know that the function mutex_unlock won't read the 
global data.

volatile = tells the compiler to load and store the data from/to memory.

memory barriers = tells the hardware to keep the memory operations in a 
consistent order.

> 2) Because of 1) visibility is not an issue for buffers or data structures 
> that are
> required to be a accessed under a lock for data consistency or program flow
> control reasons.

Yes, the mutex_unlock code should include the appropriate memory barrier 
operations to ensure that the store that unlocks the mutex is seen after 
the store to data protected by the mutex.

> 3) Updates to data types that can be read / written in a single (atomic) 
> operation
> and that are not protected by a mutex are not sure to be visible to other
> threads by simply using the volatile keyword.
Right - it tells the compiler to do the store, it doesn't tell the 
hardware to do anything.

> Instead they become visible
> after instructions that imply a flush of registers, store buffers, caches to 
> memory.
> What I read between the lines such instructions / actions are:
> - mutex_lock / unlock operations
> - spawning a new thread
> - terminating a thread

All those.

> - calling a function

No, a function call does not imply a memory barrier.

I think where you might be being confused is in the following instance:

void func1()
{
    global=1;
    func2();
}

void func2()
{
  a=global;
  printf("a=%i",a);
}

So the compiler will put the store of global into the code for func1 
because it cannot be sure that func2 does not access the variable global.

func2 happens to read global, and since it is executed on the same 
processor it will always get the correct value (unless the processor 
runs an incredibly weak memory model).

If another thread were running just func2 while this thread were 
executing func1, then there is a requirement to pass the value of global 
to thread 2. On SPARC and x86 the memory model is such that they will 
see the correct value of global after the store. For some architectures 
with weaker memory models they might need to place an explicit barrier 
in to get the data off the chip.

So when do I really need memory barriers.

Here's a bit of pseudo code for setting some data up:

mutex_lock()
   data1=1;
   data2=2;
   data3=3;
   ...
mutex_unlock();

Suppose my thread gets the lock. Writes the data out, and now wants to 
release the lock.

A release of the lock is a write of zero to the memory location of the lock:

lock->locked=0;

The issue is that I have a stream of stores pending in my store queue 
etc. I need those to be visible to other processors before I can do my 
store to the lock variable. Otherwise another thread might see the 
release of the lock before it sees the new values for the variables.

So my unlock is really:

memory_barrier();
lock->locked=0;

For example:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libc/sparc/gen/lock.s

> Visibility to a thread reading this data would require the volatile keyword.
> -----
> Things described under 3) are of course only of limited value, e.g. for a 
> single flag
> or error value. Any more complex types and operations like "flags |= ERROR_A"
> might have a more or less unexpected result. When a second thread is doing
> "flags |= ERROR_B" at the same time the final result might easily be
> "flags == ERROR_B" omitting the flag ERROR_A that was intended to be set
> by the first thread.

Exactly - these need to be protected by a mutex (or done atomically - 
man atomic_ops on S10) otherwise there's a data race.

> 
> Do you think 1) - 3) are correct?

About.

Regards,

Darryl.

-- 
Darryl Gove
Compiler Performance Engineering
Blog: http://blogs.sun.com/d/
Book: http://www.sun.com/books/catalog/solaris_app_programming.xml

[tools-compilers] pthreads and store buffer flushing

Reply via email to