On 22 Nov, 2014, at 00:03 , Masao Uebayashi <uebay...@gmail.com> wrote:
> On Thu, Nov 20, 2014 at 4:45 PM, Dennis Ferguson > <dennis.c.fergu...@gmail.com> wrote: > OK, I overlooked the obvious "new elm's next" -> "prev elm's next" ordering. > >> The conclusion is that while the current TAILQ_REMOVE() macro will work >> unchanged to maintain a list with concurrent readers (given no requirement >> to run on an MP DEC Alpha) the TAILQ_INSERT_*() macros will not, even >> though they look correct. The mutex and/or the use of pserialize() does >> nothing to change this. If you want to insert things into a TAILQ list >> without blocking concurrent readers you must use alternative macros that >> have membar_producer() calls in the right spots. > > I completely disagree with your conclusion. > > Your basic idea seems that ``member_{prroducer,consumer}() are cheaper > than membar_{enter,leave}, let's use cheaper ones''. That way you end > up concluding that you need memory barrier in every iteration. Thant > is contradictory to what you previously said. You need to fix the store ordering in TAILQ_INSERT_*(). If you want to use a different barrier to do it that's fine with me. I was happy with no barrier in the reader when I thought providing support for the DEC Alpha architecture's unique cache quirk wasn't necessary. When it turned out support for the Alpha was necessary the reader needed to have a barrier inserted that looks like it gets executed after every read pointer read. Note, however, that this is a barrier which does nothing at all on any machine other than an Alpha. Nothing has changed, nothing has been contradicted, no cost has been added to readers. Taylor might like to see barrier operations paired up in C code but there is no pairing at the hardware instruction level. The reader barrier has no cost on any machine other than the Alpha, the only machine which has an architectural requirement to generate an instruction there. I understand why you are unhappy with the barrier. It seems stupid to have to execute a not-inexpensive hardware instruction over and over and over again in the reader to deal with something that a writer is almost never going to do, but since you have to do this on an Alpha (and only an Alpha) to make this code correct, why can't the conclusion be that it sucks to be a DEC Alpha, and just get on with it? This is probably a perfect example of why no processor other than the Alpha has had this quirk in its cache architecture, nor is there likely to be another in future; this quirk has a cost that is annoying. Indeed, judged from the fact that people think NetBSD runs on an MP Alpha okay without this barrier, it may be that even later DEC Alpha implementations omitted this misfeature. > point to update E_new.next in the "update" section. If E_new.next has > been updated *long enough before* E_p.next is updated, that ordering > is not a problem at all. "long enough" is not a specific length of time. If you can find a spec for the amount of time which is "long enough" you can wait that long instead. Just calling pserialize_perform() when inserting entries (where no pseralize_perform() is required) because pserialize_perform() is a fantastically expensive function and takes a long time, doesn't do anything if you don't know how long is "long enough", not to mention that you'd be executing this extra-expensive DELAY() on all archectures to eliminate a cost elsewhere that is unique to the DEC Alpha. This is only a problem for the Alpha. The barrier fixes it for the Alpha without costing other architectures anything. That seems like the right thing to do. Dennis Ferguson