Re: RFR 8240902: JDI shared memory connector can use already closed Handles

Patricio Chilano Wed, 18 Mar 2020 23:19:26 -0700

Hi David,

On 3/18/20 8:10 PM, David Holmes wrote:

Hi Patricio,
On 19/03/2020 6:44 am, Patricio Chilano wrote:
Hi David,

On 3/18/20 4:27 AM, David Holmes wrote:
Hi Patricio,

On 18/03/2020 6:14 am, Patricio Chilano wrote:
Hi all,

Please review the following patch:

Bug: https://bugs.openjdk.java.net/browse/JDK-8240902
Webrev: http://cr.openjdk.java.net/~pchilanomate/8240902/v1/webrev/
Calling closeConnection() on an already created/opened connectionincludes calls to CloseHandle() on objects that can still be usedby other threads. This can lead to either undefined behavior or, asdetailed in the bug comments, changes of state of unrelated objects.
This was a really great find!
Thanks!  : )
This issue was found while debugging the reason behind some jshelltest failures seen after pushing 8230594. Not as important, butthere are also calls to closeStream() fromcreateStream()/openStream() when failing to create/open a streamthat will return after executing "CHECK_ERROR(enterMutex(stream,NULL));" without closing the intended resources. Then, callingcloseConnection() could assert if the reason of the previousfailure was that the stream's mutex failed to be created/opened.These patch aims to address these issues too.
Patch looks good in general. The internal reference count guardsdeletion of the internal resources, and is itself safe because neveractually delete the connection. Thanks for adding the comment aboutthis aspect.
A few items:

Please update copyright year before pushing.
Done.
Please align ENTER_CONNECTION/LEAVE_CONNECTION macros the same wayas STREAM_INVARIANT.
Done.
 170 unsigned int refcount;
 171     jint state;
I'm unclear about the use of stream->state and connection->state asguards - unless accessed under a mutex these would seem to at leastneed acquire/release semantics.
Additionally the reads of refcount would also seem to need to someform of memory synchronization - though the Windows docs for theInterlocked* API does not show how to simply read such a variable!Though I note that the RtlFirstEntrySList method for the"Interlocked Singly Linked Lists" API does state "Access to the listis synchronized on a multiprocessor system." which suggests a readof such a variable does require some form of memory synchronization!
In the case of the stream struct, the state field is protected by themutex field. It is set to STATE_CLOSED while holding the mutex, andthreads that read it must acquire the mutex first throughsysIPMutexEnter(). For the cases where sysIPMutexEnter() didn'tacquire the mutex, we will return something different than SYS_OK andthe call will exit anyways. All this behaves as before, I didn'tchange it.
Thanks for clarifying.
The refcount and state that I added to the SharedMemoryConnectionstruct work together. For a thread closing the connection, settingthe connection state to STATE_CLOSED has to happen before reading therefcount (more on the atomicity of that read later). That's why Iadded the MemoryBarrier() call; which I see it's better if I justmove it to after setting the connection state to closed. For thethreads accessing the connection, incrementing the refcount has tohappen before reading the connection state. That's already providedby the InterlockedIncrement() which uses a full memory barrier. Inthis way if the thread closing the connection reads a refcount of 0,then we know it's safe to release the resources, since other threadsaccessing the connection will see that the state is closed afterincrementing the refcount. If the read of refcount is not 0, then itcould be that a thread is accessing the connection or not (it couldhave read a state connection of STATE_CLOSED after incrementing therefcount), we don't know, so we can't release anything. Similarly ifthe thread accessing the connection reads that the state is notclosed, then we know it's safe to access the stream since anybodyclosing the connection will still have to read refcount which will beat least 1.As for the atomicity of the read of refcount, fromhttps://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access,it states that "simple reads and writes to properly-aligned 32-bitvariables are atomic operations". Maybe I should declare refcountexplicitly as DWORD32?
It isn't the atomicity in question with the naked read but thevisibility. Any latency in the visibility of the store done by theInterLocked*() function should be handled by the retry loop, but whatis to stop the C++ compiler from hoisting the read of refcount out ofthe loop? It isn't even volatile (which has a stronger meaning in VSthan regular C+++).

I see what you mean now, I was thinking on atomicity and order ofoperations but didn't consider the visibility of that read. Yes, if thecompiler decides to be smart and hoist the read out of the loop we mightnever notice that it is safe to release those resources and we wouldleak them for no reason. I see from the windowsdocs(https://docs.microsoft.com/en-us/cpp/c-language/type-qualifiers)that declaring it volatile as you pointed out should be enough toprevent that.

Instead of having a refcount we could have done something similar tothe stream struct and protect access to the connection through amutex. To avoid serializing all threads we could have used SRW locksand only the one closing the connection would doAcquireSRWLockExclusive(). It would change the state of theconnection to STATE_CLOSED, close all handles, and then release themutex. ENTER_CONNECTION() and LEAVE_CONNECTION() would acquire andrelease the mutex in shared mode. But other that maybe be more easyto read I don't think the change will be smaller.
 413 while (attempts>0) {

spaces around >
Done.
If the loop at 413 never encounters a zero reference_count then itdoesn't close the events or the mutex but still returns SYS_OK. Thatseems wrong but I'm not sure what the right behaviour is here.
I can change the return value to be SYS_ERR, but I don't think thereis much we can do about it unless we want to wait forever until wecan release those resources.
SYS_ERR would look better, but I see now that the return value iscompletely ignored anyway. So we're just going to leak resources ifthe loop "times out". I guess this is the best we can do.

Here is v2 with the corrections:

Full: http://cr.openjdk.java.net/~pchilanomate/8240902/v2/webrev/

Inc: http://cr.openjdk.java.net/~pchilanomate/8240902/v2/inc/webrev/<http://cr.openjdk.java.net/~pchilanomate/8240902/v2/inc/> (not surewhy the indent fixes are not highlighted as changes but the Frames viewdoes show they changed)


I'll give it a run on mach5 adding tier5 as Serguei suggested.


Thanks,
Patricio

Thanks,
David
And please wait for serviceability folk to review this.
Sounds good.
Thanks for looking at this David! I will move the MemoryBarrier() andchange the refcount to be DWORD32 if you are okay with that.
Thanks,
Patricio
Thanks,
David
-----
Tested in mach5 with the current baseline, tiers1-3 and severalruns of open/test/langtools/:tier1 which includes the jshell testswhere this connector is used. I also applied patchhttp://cr.openjdk.java.net/~pchilanomate/8240902/triggerbug/webrevmentioned in the comments of the bug, on top of the baseline andrun the langtool tests with and without this fix. Without the fixrunning around 30 repetitions already shows failures in testsjdk/jshell/FailOverExecutionControlTest.java andjdk/jshell/FailOverExecutionControlHangingLaunchTest.java. With thefix I run several hundred runs and saw no failures. Let me know ifthere is any additional testing I should do.
As a side note, I see there are a couple of open issues relatedwith jshell failures (8209848) which could be related to this bugand therefore might be fixed by this patch.
Thanks,
Patricio

Re: RFR 8240902: JDI shared memory connector can use already closed Handles

Reply via email to