Hi Keith,

I've looked at the new webrev - thanks.

Keith McGuigan said the following on 02/01/11 00:32:
On Jan 30, 2011, at 6:58 PM, David Holmes wrote:

Since Service_lock is a 'special' lock (see mutex.hpp), no locks can be acquired when holding the service-lock. I believe the locking code asserts this and I was careful to not acquire new locks when holding the Service_lock. So deadlock should not be an issue but let me know if you see somewhere I screwed this up.

Okay that makes me feel somewhat better.

I'm trying to come up with some tests for this now. This risk here ought to be low as the behavior of low-memory detection will be completely unchanged unless JVMTI compiled-method events are enabled. When they are enabled, the low-memory detection actions may be delayed by the JVMTI event native code. I don't suspect that this is a problem but I'm investigating to to make sure.

My concern is that we may introduce a previously impossible cycle between these various threads. Event processing can be delayed by low-memory processing and vice-versa, and I didn't check through to see if it was impossible to somehow get a cycle between the two.

I'm also a little concerned that changing the LowMemoryDetector (LMD) thread in to the Service thread is somewhat disruptive when doing stack analysis on running apps or core dumps or crash logs, because people have been used to seeing the LMD thread for many years and now it has changed its identity.

The thread "is_hidden_from_external_view", (which I know doesn't hide it completely), so it's visibility should be pretty limited.

But it is completely visible in the contexts I listed: stack dumps for live processes and core files. Good compromise keeping the name for JDK6. May want to give a heads-up to all JDK developers though. :)

Looking in jvmtiImpl.cpp

I'd prefer to see the locking in all the JvmtiDeferredEventQueue methods rather than having the call-sites do the locking, as it is less error-prone. But I see that the main processing loop in the service thread makes this impossible due to the dual use of the "service lock".

It is possible to massage the service thread loop to make that work. I did have it coded up that way at one point but decided to change it back and put the onus on the caller to acquire the lock instead. I think this is better since it makes it more obvious in the calling code that the lock is being acquired. Putting the lock acquisition inside the calls instead would sort of "hide" the lock and I think it's safer to make it obvious.

That "hiding" is called encapsulation and is generally considered a "good thing" :) "client-side locking" is generally considered error prone. That said, inside the VM where deadlocks are a big concern, more visibility of locks isn't necessarily a bad thing.

Where we have:

961 void JvmtiDeferredEventQueue::enqueue(const JvmtiDeferredEvent& event) {
...
967   QueueNode* node = new QueueNode(event);

and

1007 void JvmtiDeferredEventQueue::add_pending_event(
1008     const JvmtiDeferredEvent& event) {
1009
1010   QueueNode* node = new QueueNode(event);

Will an allocation failure here terminate the VM? I suspect it will but I don't think it should. Can we not add a link field to the event objects rather than have to create Nodes to hold them?

I can add a NULL check to those spots. There's no need for us to segv if we can't allocate memory for this. We do need a C-heap allocated object since we're passing these events from thread-to-thread. Adding the link in the event object itself wouldn't help - it would just mean we have to allocate the event object in C-heap instead.

As per your other email NULL check is pointless as "new" will never return null but will instead call vm_exit_out_of_memory to abort the VM - which is exactly my point. I don't believe that a failure to allocate in this code should be considered a fatal error for the VM.

I don't understand why we would have to allocate the event objects in C-heap, all we want is a link field so that they can form their own queue without the need for wrapper Nodes.

 447 class JvmtiDeferredEvent VALUE_OBJ_CLASS_SPEC {
 448   friend class JvmtiDeferredEventQueue;
 449  private:
       JvmtiDeferredEvent _next; // for chaining events together
      public:
       JvmtiDeferredEvent next() { return _next; }
       void set_next(JvmtiDeferredEvent e) {
        assert(_next == NULL, "overwriting linked event!");
        _next = e;
       }

Further this would allow you to dequeue all the events at once instead of the service thread going through pulling them off one at a time.

In dequeue, this:

988   if (_queue_head == NULL) {
989 // Just in case this happens in product; it shouldn't be let's not crash
990     return JvmtiDeferredEvent();
991   }

doesn't look right. Are we returning a stack allocated instance here? (There's also a typo in the comment.)

Yes, we're returning a stack-allocated instance, which returns to the caller as a temp and is copied into a local variable there. I'll fix the comment.

It isn't "legal" to return a stack-allocated object from a method. I expect we'll get away with it but still ...

Couldn't you just allocate a static instance to use in this case?

In jvmtiExport.cpp,  I don't understand why here:

832 void JvmtiExport::post_dynamic_code_generated(const char *name, const void *code_begin, const void *code_end) {
...
1841
1842   JvmtiDeferredEventQueue::wait_for_empty_queue();
1843

we have to wait for an empty queue, particularly as the queue may become non-empty at any moment. Normally such a wait must be atomic with respect to an action that requires the queue to be empty, but that doesn't seem to be the case here.

<snip>
that's perfectly fine. It's more of a 'flush_queue' call and I suppose I could call it that explicitly (and maybe add a 'flush marker' in the queue so that this won't wait for events that were generated while this was waiting). Actually if I do that I can get rid of the 'notify' in dequeue and instead use a special flush event. Hmmm... that might work better.

Ok so in general I like the form of the new architecture but I'm a bit unclear on the details. In general non-service threads will only call:

- enqueue() to add an event
- add_pending_event() to add an event in the special case of a safepoint being active
- flush_queue() to wait for earlier events to get posted

The service thread will:

- loop in wait() until there is an event (enqueue will do the notifyAll)
- dequeue() to pull off that event

Now it seems that flush_queue is also called by the service thread, but I can't see where from?

I'm a little concerned about all the locking/unlocking that seems to happen; and the fact that there are potentially a lot of spurious notifications. Though this is mitigated if we only expect a couple of threads to be involved, there are ways to restructure things so that event processing can be batched, including a single notifyAll once all flush events are complete.

Cheers,
David

Reply via email to