Hiroshi,

I'm leaning toward making the two fields volatile and adding the
appropriate OrderAccess::XXX calls. I vaguely remember an existing
bug for events posting late. I'm going to see if I can find it.

Dan


Hiroshi Yamauchi wrote:
Hi Dan,

On Wed, Feb 11, 2009 at 12:43 PM, Daniel D. Daugherty <[email protected] <mailto:[email protected]>> wrote:

    I'll chime in on parts of this thread below.




    Tim Bell wrote:

        I don't know the precise answer to this question, so I am
        forwarding it to the Serviceability list to get a wider audience.

        -------- Original Message --------
        Subject:     Data visibility between threads in Hotspot
        Date:     Tue, 10 Feb 2009 11:14:24 -0800

        Hi Tim,

        I have a Hotspot question. Chuck Rasbold pointed me to you.
        Feel free to
        forward this message to other experts at Sun, if needed.

        If one Hotspot engineer wants to guarantee the immediate
        visibility of a
        write to a static variable to reads in other threads (assuming
        the reads
        and the writes are properly synchronized via a mutex), what
        does s/he do?

        Does the use of MutexLocker guarantee the visibility? It
        probably does not.


    I believe that MutexLocker does guarantee the visibility:

    src/share/vm/runtime/mutexLocker.hpp:

     133  // See orderAccess.hpp.  We assume throughout the VM that
    MutexLocker's
     134  // and friends constructors do a fence, a lock and an
    acquire *in that
     135  // order*.  And that their destructors do a release and
    unlock, in *that*
     136  // order.  If their implementations change such that these
    assumptions
     137  // are violated, a whole lot of code will break.


OK. That's handy.


    See src/share/vm/runtime/orderAccess.hpp for the gory details.




        There are several volatile variables in Hotpot, but it may not
        really
        work because C++ compilers do not guarantee the visibility or
        limit the
        instruction reordering.

    See the "C++ Volatility" section in
    src/share/vm/runtime/orderAccess.hpp.


It's an interesting header file to read.



        For example, there is the static variable JvmtiEnvBase:_phase
        which
        indicates the JVMTI phase in which the JVM is currently in.
        But AFAICT,
        there is no synchronization used by the callers of get_phase() and
        set_phase() and apparently they can be called by multiple
        threads. Also,
        the JvmtiEventEnabled::_enabled_bits which is a jlong variable
        that
        indicates which events are enabled for a single jvmtiEnv. The
        writes and
        reads to it are not synchronized, either. These are race
        conditions,
        which implies that some JVMTI event can go off in the dead
        phase when no
        events are supposed to go off. I'm actually looking into a bug
        in a
        JVMTI agent that I suspect is due to this.


    It certainly looks like JvmtiEnvBase::_phase should minimally be a
    volatile; it may also require some sort of synchronization to force
    the updated values to be visible...

    JvmtiEventEnabled::_enabled_bits also looks problematic. I'm going
    to have to mull on both of these and talk to some other folks.


The bug that I ran across was a memory corruption crash because a CompiledMethodLoad event happened to be sent (due to this race condition) after Agent_OnUnload call where I had already deallocated the agent's data objects. This happens very rarely (once in every more than 100k runs of a more-or-less javac invocation) but often enough to fix.

I have a patch attached, which I used to suppress the bug by avoiding the events to be sent during the dead phase by adding synchronization. Basically I make the _phase variable volatile and I use a mutex to guard event callbacks in the all the post_xxx functions and the phase transition to the dead phase in post_vm_death(). Probably we don't need the volatile as long as the mutex guarantees the visibility. The other phase transitions (onload->primodial->start->live) may not need the same degree of synchronization because events that can be sent in a earlier phase can be sent in the next phase as well.

The _enabled_bits and the callback function table will probably need a similar synchronization as well. But the attached patch does not implement it and only ensures that no events are sent during the dead phase. I'd like to get the fix into openjdk in one form or another.

Thanks,
Hiroshi



    Dan



        Thanks,
        Hiroshi


Reply via email to