Re: RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Daniel D. Daugherty Tue, 01 Sep 2015 07:07:21 -0700

On 8/31/15 11:11 PM, David Holmes wrote:

Hi Dan,


On 1/09/2015 2:58 AM, Daniel D. Daugherty wrote:

Hi David,

Replies embedded below...


Likewise ... but with some trimming (I'm getting RSI hitting page-down).


Yup. I use an Apple MagicMouse and I'm getting tired of
the scrolling drag... :-)

On 8/30/15 7:30 PM, David Holmes wrote:

On 29/08/2015 2:27 AM, Daniel D. Daugherty wrote:

On 8/28/15 8:45 AM, Tom Benson wrote:
For my shutdown use, we are transitioning from '1' -> '0' and
we need that to be seen proactively so:


Nit: OrderAccess "barriers" enforce ordering constraints but don't in
general provide any guarantees about visibility - ie they are not
necessarily "flushes". So while it may be true on some platforms by
virtue of the underlying barrier mechanism, in general they don't
change when a write becomes visible and so there is nothing
"proactive" about them.


My apologies for being sloppy with my wording.

What I'm trying to do is put up a barrier so that once the
deleting thread has changed the flag from '1' -> '0' another
thread trying to read the flag won't see the '1' when it
should see the '0'. In order words, I don't want the other
thread's read of the flag to float above the setting of
the flag to '0'.

That still suggest to me a notion that the barrier somehow forcesvisibility and that after the barrier_store returns all other threadsare now guaranteed to see the value '1' - which is not the case. Allthe barriers here do is ensure a global observability ordering: ifanother thread sees an action that happened after the barrier, then itmust see the action that happened before the barrier.


I'm not trying to say that visibility is forced. I was shooting
for something like "happens before" and "happens after", but I
didn't use those terms...


More below ...

OrderAccess::release_store(&_has_PerfData, 0);
     OrderAccess::storeload();


I agree with Tom that the release is unnecessary - though harmless.


I tried to switch the code to:

     OrderAccess::store(&_has_PerfData, 0)
     OrderAccess::storeload();

but that wouldn't compile on Solaris X64:

Error: static OrderAccess::store(volatile int*, int) is not accessible
from static PerfDataManager::destroy().


Strange - either a bug or an issue with including the .inline.hpp file.


Right. Not worth chasing since I don't really need it.

But then it occurred to me that I was trying too hard
to use OrderAccess functions...
The real ordering constraint here is that we preserve:

_has_PerfData = 0;
<do actual deallocation>
I think I can switch to a straight "_has_PerfData = 0;" here and
not use either OrderAccess::release_store() or OrderAccess::store().
Yes. Particularly as OrderAccess::store is just a plain store for32-bit types.
I also reread OrderAccess.hpp again and convinced myself that
an "OrderAccess::fence()" call is the only way to be sure here.
I'm pretty sure that achieves 'storeload|storestore' barrier
semantics... Yes, I have a headache again... :-)
Yes it does, as well as loadLoad and loadStore.
for which a storeload|storestore barrier after the write seems most
appropriate. Though with the insertion of the sleep after the write
there won't be any reordering anyway so explicit barriers seemredundant.
I didn't think that os::naked_short_sleep() would do anything
to keep another thread's read from floating above the
"_has_PerfData = 0;". What did I miss?
This is somewhat of a side-discussion but ...
First your notion of floating reads in other threads is somewhattroubling to me. I kind of get what you mean but ...
Now for the sleep. As always with ordering constraints there are twosides: the writer and the reader. Here we are dealing with the writer.The writer is executing:
_has_PerfData = 0;
for (int index = 0; index < _all->length(); index++) {
  PerfData* p = _all->at(index);
  delete p;
}
for the sake of discussion lets assume there's only one PerfDataobject and that deleting it amounts to setting an internal field toNULL. So our code becomes:
_has_PerfData = 0;
data->ref = NULL;
The ordering constraint is that the above two statements happen in theorder written - so we don't delete the data while the data is claimedto be valid. That requires both that the compiler not reorder them,nor the hardware. If we did:
_has_PerfData = 0;
membar storeload|storeStore;
data->ref = NULL;

then we achieve that goal. Now consider if we instead have:

_has_PerfData = 0;
os::naked_short_sleep(1);
data->ref = NULL;
The sleep call, leads to a library call and a syscall and ultimatelylikely a context switch as the current thread gets descheduled for1ms. In terms of machine instructions the distance between the initialwrite of _has_PerfData and the load of 'data' and so the store to datais huge. Can those stores be reordered by the hardware? Theoreticallyyes, but in practice I say not at all likely: the separation is toolarge and there are likely additional barriers already within the codethat will do the sleep. Can the stores be reordered by the compiler?Again theoretically yes, but in practice for a compiler to reorderacross the sleep requires intimate knowledge of everything thathappens within the sleep - no compiler we currently use would evenattempt to try and figure that out. So the sleep, while not aguarantee, is most likely to act to prevent the undesired reordering.
Is that lack of guarantee an issue? No - because we know we alreadyhave a race condition. No matter what we do with this section of code,in terms of barriers, we can't avoid the race.


All the way to this point, I was expecting you to say leave the
barrier in...

Hence my view that adding explicit barriers together with the sleepserves no practical purpose here. It would be different if a barrierforced the write to become visible to all other threads - but that isnot guaranteed by OrderAccess.


Not the conclusion I was expecting based on the last couple of
paragraphs. :-)

My preference is to leave the barrier in for these reasons:

- communicates intent to any future maintainer; we could also say that
  this might communicate my silliness at continuing with a lock free
  algorithm when everyone know that I hate them.
- does not make assumptions about os::naked_short_sleep(1); it might
  be turned into an in-line function someday that the compiler can
  digest and optimize around...
- does no harm; I think that's a paraphrase of what you said in the
  round 0 code review :-)

Now lets return to the reader side of this, somewhere we have codethat is effectively doing something like:
if (_has_PerfData == 1)
   cur_data = *data->ref;
This is where we see the race: the value of the flag can change theinstant after we read it and so we can access data->ref after it isNULL. Going back to the sleep, the purpose of the sleep is to givethreads that have seen _has_PerfData==1 a chance to get past the dataaccess before the PerfData object is deleted.


In the monitor subsystem, the check is little different:

  if (data != NULL && _has_PerfData == 1) {
    cur_data = *data->ref
  }

so the read of data->ref can't float above the if-statement. It looks
like most (if not all) of the PerfData usage relies on NULL checks of
the PerfData object as the trigger to determine whether to call into
the PerfData object.

In terms of ordering constraints we don't want the read of data->refto be moved/float ahead of the read of _has_PerfData, so technicallywe need a loadload barrier to prevent that:
if (_has_PerfData == 1) {
  membar loadload;
  cur_data = *data->ref;
}
From a hardware perspective I don't think a speculative load can causea problem - if we read data->ref early and see NULL then we "must" see_has_perfData==0 (due to ordering at the writer) and so the code isskipped. And I think similarly for compiler reorderings (though myhead is hurting now too :)). But in either case it doesn't reallymatter if this reasoning is wrong because even if the code is kept inthe required order the race still exists and we can't tell exactly howthings came about. So adding the loadload would achieve little, ifanything.


Agreed that a loadload would achieve little, if anything here.

Now feel free to ignore all that, go get a coffee and move on to myreply to your next webrev. :)


Will do. Needed a coffee for this one also...

Dan


Cheers,
David

My current plan:

- switch from "OrderAccess::release_store(&_has_PerfData, 0);"
   to "_has_PerfData = 0;"
- keep "OrderAccess::fence();"
- retest and send out for another round of review

Dan


Cheers,
David
-----

which is modeled after _owner field transitions from non-zeo
-> NULL in ObjectMonitor.cpp


Also, I think the comment above that release_store() could be
clarified.  It is fine as is if you're familiar with this bug report
and discussion, but...  I think it should explicitly say there is
still a very small window for the lack of true synchronization to
cause a failure.  And perhaps that the release_store() (or
store/release())  is not half of an acquire/release pair.


Here's the existing comment:

286 // Clear the flag before we free the PerfData counters.Thus

begins
    287    // the race between this thread and another thread that
has just
    288    // queried PerfDataManager::has_PerfData() and gotten back
'true'.

289 // The hope is that the other thread will finish itsPerfData

    290    // manipulation before we free the memory. The two
alternatives
    291    // are 1) leak the PerfData memory or 2) do some form of
ordered
    292    // access before every PerfData operation.

I think it pretty clearly states that there is still a race here.
And I think that option 2 covers that we're not doing completely
safe ordered access. I'm not sure how to make this comment more
clear, but if you have specific suggestions...

Dan

Tom

Re: RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Reply via email to