Hi Keith,
I've looked at the new webrev - thanks.
Keith McGuigan said the following on 02/01/11 00:32:
On Jan 30, 2011, at 6:58 PM, David Holmes wrote:
Since Service_lock is a 'special' lock (see mutex.hpp), no locks can be
acquired when holding the service-lock. I believe the locking code
asserts this and I was careful to not acquire new locks when holding the
Service_lock. So deadlock should not be an issue but let me know if you
see somewhere I screwed this up.
Okay that makes me feel somewhat better.
I'm trying to come up with some tests for this now. This risk here
ought to be low as the behavior of low-memory detection will be
completely unchanged unless JVMTI compiled-method events are enabled.
When they are enabled, the low-memory detection actions may be delayed
by the JVMTI event native code. I don't suspect that this is a problem
but I'm investigating to to make sure.
My concern is that we may introduce a previously impossible cycle
between these various threads. Event processing can be delayed by
low-memory processing and vice-versa, and I didn't check through to see
if it was impossible to somehow get a cycle between the two.
I'm also a little concerned that changing the LowMemoryDetector (LMD)
thread in to the Service thread is somewhat disruptive when doing
stack analysis on running apps or core dumps or crash logs, because
people have been used to seeing the LMD thread for many years and now
it has changed its identity.
The thread "is_hidden_from_external_view", (which I know doesn't hide it
completely), so it's visibility should be pretty limited.
But it is completely visible in the contexts I listed: stack dumps for
live processes and core files. Good compromise keeping the name for
JDK6. May want to give a heads-up to all JDK developers though. :)
Looking in jvmtiImpl.cpp
I'd prefer to see the locking in all the JvmtiDeferredEventQueue
methods rather than having the call-sites do the locking, as it is
less error-prone. But I see that the main processing loop in the
service thread makes this impossible due to the dual use of the
"service lock".
It is possible to massage the service thread loop to make that work. I
did have it coded up that way at one point but decided to change it back
and put the onus on the caller to acquire the lock instead. I think
this is better since it makes it more obvious in the calling code that
the lock is being acquired. Putting the lock acquisition inside the
calls instead would sort of "hide" the lock and I think it's safer to
make it obvious.
That "hiding" is called encapsulation and is generally considered a
"good thing" :) "client-side locking" is generally considered error
prone. That said, inside the VM where deadlocks are a big concern, more
visibility of locks isn't necessarily a bad thing.
Where we have:
961 void JvmtiDeferredEventQueue::enqueue(const JvmtiDeferredEvent&
event) {
...
967 QueueNode* node = new QueueNode(event);
and
1007 void JvmtiDeferredEventQueue::add_pending_event(
1008 const JvmtiDeferredEvent& event) {
1009
1010 QueueNode* node = new QueueNode(event);
Will an allocation failure here terminate the VM? I suspect it will
but I don't think it should. Can we not add a link field to the event
objects rather than have to create Nodes to hold them?
I can add a NULL check to those spots. There's no need for us to segv
if we can't allocate memory for this. We do need a C-heap allocated
object since we're passing these events from thread-to-thread. Adding
the link in the event object itself wouldn't help - it would just mean
we have to allocate the event object in C-heap instead.
As per your other email NULL check is pointless as "new" will never
return null but will instead call vm_exit_out_of_memory to abort the VM
- which is exactly my point. I don't believe that a failure to allocate
in this code should be considered a fatal error for the VM.
I don't understand why we would have to allocate the event objects in
C-heap, all we want is a link field so that they can form their own
queue without the need for wrapper Nodes.
447 class JvmtiDeferredEvent VALUE_OBJ_CLASS_SPEC {
448 friend class JvmtiDeferredEventQueue;
449 private:
JvmtiDeferredEvent _next; // for chaining events together
public:
JvmtiDeferredEvent next() { return _next; }
void set_next(JvmtiDeferredEvent e) {
assert(_next == NULL, "overwriting linked event!");
_next = e;
}
Further this would allow you to dequeue all the events at once instead
of the service thread going through pulling them off one at a time.
In dequeue, this:
988 if (_queue_head == NULL) {
989 // Just in case this happens in product; it shouldn't be let's
not crash
990 return JvmtiDeferredEvent();
991 }
doesn't look right. Are we returning a stack allocated instance here?
(There's also a typo in the comment.)
Yes, we're returning a stack-allocated instance, which returns to the
caller as a temp and is copied into a local variable there. I'll fix
the comment.
It isn't "legal" to return a stack-allocated object from a method. I
expect we'll get away with it but still ...
Couldn't you just allocate a static instance to use in this case?
In jvmtiExport.cpp, I don't understand why here:
832 void JvmtiExport::post_dynamic_code_generated(const char *name,
const void *code_begin, const void *code_end) {
...
1841
1842 JvmtiDeferredEventQueue::wait_for_empty_queue();
1843
we have to wait for an empty queue, particularly as the queue may
become non-empty at any moment. Normally such a wait must be atomic
with respect to an action that requires the queue to be empty, but
that doesn't seem to be the case here.
<snip>
that's perfectly fine. It's more of a 'flush_queue' call and I suppose
I could call it that explicitly (and maybe add a 'flush marker' in the
queue so that this won't wait for events that were generated while this
was waiting). Actually if I do that I can get rid of the 'notify' in
dequeue and instead use a special flush event. Hmmm... that might work
better.
Ok so in general I like the form of the new architecture but I'm a bit
unclear on the details. In general non-service threads will only call:
- enqueue() to add an event
- add_pending_event() to add an event in the special case of a safepoint
being active
- flush_queue() to wait for earlier events to get posted
The service thread will:
- loop in wait() until there is an event (enqueue will do the notifyAll)
- dequeue() to pull off that event
Now it seems that flush_queue is also called by the service thread, but
I can't see where from?
I'm a little concerned about all the locking/unlocking that seems to
happen; and the fact that there are potentially a lot of spurious
notifications. Though this is mitigated if we only expect a couple of
threads to be involved, there are ways to restructure things so that
event processing can be batched, including a single notifyAll once all
flush events are complete.
Cheers,
David