Date: Sat, 22 Nov 2014 11:49:34 +0900 From: Masao Uebayashi <uebay...@gmail.com>
You are right that "key" has to be volatile. But otherwise, you are making things too complex, or too generic, which is not what I'm expecting as a TAILQ-specialized-for-pserialize(9). No, volatile is unrelated to multiprocessor memory ordering issues. It doesn't prevent the CPU from reordering loads from RAM. It only forces the C compiler to compile x = e->key; f(e->key); into code that instructs the CPU to read e->key twice. The CPU may nevertheless reorder loads from RAM, or read it from its cache instead of loading from RAM. That is, when the CPU executes the instructions movq 16(%rax),%rbx ; x = e->key movq 16(%rax),%rdi ; arg0 = e->key callq f then (a) the CPU may load from RAM first into %rdi and next into %rbx, or (b) the CPU may load once from RAM at %rax + 16 into its cache and then execute both movq instructions from the cache, or (c) the CPU may already have %rax + 16 cached and may not issue any load requests to RAM at all. In the case of pserialized queues, for (e = eq; e != NULL; e = e->next) { if (e->key == 12345) ... } the x86 code will look something more like this: movq eq(%rip),%rax ; Load eq into %rax (e). testq %rax,%rax ; Test for null. je out ; Branch if null. movq 8(%rax),%rdi ; Load e->key into %rdi. The CPU may have a cached value for the location %rax + 8 in RAM. In that case, even if it executes the instructions in order, and even if it issues a load request to RAM for eq, it may not issue a new load request to RAM for %rax + 8 because it already happens to have that location cached[*]. Volatile does not help here: the C compiler generated instructions in the right order. It's the CPU that requires a memory barrier to guarantee that it won't issue loads to RAM out of the order we need: for (e = eq; e != NULL; e = e->next) { membar_datadep_consumer(); if (e->key == 12345) ... } movq eq(%rip),%rax ; Load eq into %rax (e). testq %rax,%rax ; Test for null. je out ; Branch if null. lfence ; Force CPU to load e->key from RAM. movq 8(%rax),%rdi ; Load e->key into %rdi. [*] The x86 architecture happens to guarantee that if whoever inserted the entry issues a store barrier (membar_producer) after initializing e->key and before setting eq = e, this situation won't happen. But that is not guaranteed on every architecture, and if the code had a control-dependent load instead of a data-dependent load, say int ok[128]; int value[128]; for (i = 0; i < 128; i++) { if (ok[i]) return value[i]; }, then membar_consumer/lfence between ok[i] and value[i] would be necessary on x86 in practice -- even if we qualified ok and value with volatile. It is users' choice whether using fast-changing values as a key or not, but if you decide to change values visible by pserialize(9) readers, you'll end up with ensuring memory order. As I said, this is not about changing e->key while e is in the queue. It is about making sure readers read the correct value of e->key after a writer puts e in the queue the first time.