Thanks JC,
New patch applies cleanly. Compiles and runs (simple test programs) on
aarch64.
* Derek
*From:* JC Beyler [mailto:jcbey...@google.com]
*Sent:* Monday, April 02, 2018 1:17 PM
*To:* White, Derek <derek.wh...@cavium.com>
*Cc:* Erik Österlund <erik.osterl...@oracle.com>;
serviceability-dev@openjdk.java.net; hotspot-compiler-dev
<hotspot-compiler-...@openjdk.java.net>
*Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling
Hi Derek,
I know there were a few things that went in that provoked a merge
conflict. I worked on it and got it up to date. Sadly my lack of
knowledge makes it a full rebase instead of keeping all the history.
However, with a newly cloned jdk/hs you should now be able to use:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/
The change you are referring to was done with the others so perhaps you
were unlucky and I forgot it in a webrev and fixed it in another? I
don't know but it's been there and I checked, it is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html
I double checked that tlab_end_offset no longer appears in any
architecture (as far as I can tell :)).
Thanks for testing and let me know if you run into any other issues!
Jc
On Fri, Mar 30, 2018 at 4:24 PM White, Derek <derek.wh...@cavium.com
<mailto:derek.wh...@cavium.com>> wrote:
Hi Jc,
I’ve been having trouble getting your patch to apply correctly. I
may have based it on the wrong version.
In any case, I think there’s a missing update to
macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(),
where “JavaThread::tlab_end_offset()” should become
“JavaThread::tlab_current_end_offset()”.
This should correspond to the other port’s changes in
templateTable_<cpu>.cpp files.
Thanks!
- Derek
*From:* hotspot-compiler-dev
[mailto:hotspot-compiler-dev-boun...@openjdk.java.net
<mailto:hotspot-compiler-dev-boun...@openjdk.java.net>] *On Behalf
Of *JC Beyler
*Sent:* Wednesday, March 28, 2018 11:43 AM
*To:* Erik Österlund <erik.osterl...@oracle.com
<mailto:erik.osterl...@oracle.com>>
*Cc:* serviceability-dev@openjdk.java.net
<mailto:serviceability-dev@openjdk.java.net>; hotspot-compiler-dev
<hotspot-compiler-...@openjdk.java.net
<mailto:hotspot-compiler-...@openjdk.java.net>>
*Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling
Hi all,
I've been working on deflaking the tests mostly and the wording in
the JVMTI spec.
Here is the two incremental webrevs:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/
Here is the total webrev:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/
Here are the notes of this change:
- Currently the tests pass 100 times in a row, I am working on
checking if they pass 1000 times in a row.
- The default sampling rate is set to 512k, this is what we use
internally and having a default means that to enable the sampling
with the default, the user only has to do a enable event/disable
event via JVMTI (instead of enable + set sample rate).
- I deprecated the code that was handling the fast path tlab
refill if it happened since this is now deprecated
- Though I saw that Graal is still using it so I have to see
what needs to be done there exactly
Finally, using the Dacapo benchmark suite, I noted a 1% overhead for
when the event system is turned on and the callback to the native
agent is just empty. I got a 3% overhead with a 512k sampling rate
with the code I put in the native side of my tests.
Thanks and comments are appreciated,
Jc
On Mon, Mar 19, 2018 at 2:06 PM JC Beyler <jcbey...@google.com
<mailto:jcbey...@google.com>> wrote:
Hi all,
The incremental webrev update is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/
The full webrev is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/
Major change here is:
- I've removed the heapMonitoring.cpp code in favor of just
having the sampling events as per Serguei's request; I still
have to do some overhead measurements but the tests prove the
concept can work
- Most of the tlab code is unchanged, the only major
part is that now things get sent off to event collectors when
used and enabled.
- Added the interpreter collectors to handle interpreter
execution
- Updated the name from SetTlabHeapSampling to
SetHeapSampling to be more generic
- Added a mutex for the thread sampling so that we can
initialize an internal static array safely
- Ported the tests from the old system to this new one
I've also updated the JEP and CSR to reflect these changes:
https://bugs.openjdk.java.net/browse/JDK-8194905
https://bugs.openjdk.java.net/browse/JDK-8171119
In order to make this have some forward progress, I've removed
the heap sampling code entirely and now rely entirely on the
event sampling system. The tests reflect this by using a
simplified implementation of what an agent could do:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
(Search for anything mentioning event_storage).
I have not taken the time to port the whole code we had
originally in heapMonitoring to this. I hesitate only because
that code was in C++, I'd have to port it to C and this is for
tests so perhaps what I have now is good enough?
As far as testing goes, I've ported all the relevant tests and
then added a few:
- Turning the system on/off
- Testing using various GCs
- Testing using the interpreter
- Testing the sampling rate
- Testing with objects and arrays
- Testing with various threads
Finally, as overhead goes, I have the numbers of the system off
vs a clean build and I have 0% overhead, which is what we'd
want. This was using the Dacapo benchmarks. I am now preparing
to run a version with the events on using dacapo and will report
back here.
Any comments are welcome :)
Jc
On Thu, Mar 8, 2018 at 4:00 PM JC Beyler <jcbey...@google.com
<mailto:jcbey...@google.com>> wrote:
Hi all,
I apologize for the delay but I wanted to add an event
system and that took a bit longer than expected and I also
reworked the code to take into account the deprecation of
FastTLABRefill.
This update has four parts:
A) I moved the implementation from Thread to
ThreadHeapSampler inside of Thread. Would you prefer it as a
pointer inside of Thread or like this works for you? Second
question would be would you rather have an association
outside of Thread altogether that tries to remember when
threads are live and then we would have something like:
ThreadHeapSampler::get_sampling_size(this_thread);
I worry about the overhead of this but perhaps it is not too
too bad?
B) I also have been working on the Allocation event system
that sends out a notification at each sampled event. This
will be practical when wanting to do something at the
allocation point. I'm also looking at if the whole
heapMonitoring code could not reside in the agent code and
not in the JDK. I'm not convinced but I'm talking to Serguei
about it to see/assess :)
- Also added two tests for the new event subsystem
C) Removed the slow_path fields inside the TLAB code since
now FastTLABRefill is deprecated
D) Updated the JVMTI documentation and specification for the
methods.
So the incremental webrev is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/
and the full webrev is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10
I believe I have updated the various JIRA issues that track
this :)
Thanks for your input,
Jc
On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler
<jcbey...@google.com <mailto:jcbey...@google.com>> wrote:
Hi Erik,
I inlined my answers, which the last one seems to answer
Robbin's concerns about the same thing (adding things to
Thread).
On Wed, Feb 14, 2018 at 2:51 AM, Erik Österlund
<erik.osterl...@oracle.com
<mailto:erik.osterl...@oracle.com>> wrote:
Hi JC,
Comments are inlined below.
On 2018-02-13 06:18, JC Beyler wrote:
Hi Erik,
Thanks for your answers, I've now inlined my own
answers/comments.
I've done a new webrev here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/
<http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.08/>
The incremental is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/
<http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/>
Note to all:
- I've been integrating changes from
Erin/Serguei/David comments so this webrev
incremental is a bit an answer to all comments
in one. I apologize for that :)
On Mon, Feb 12, 2018 at 6:05 AM, Erik Österlund
<erik.osterl...@oracle.com
<mailto:erik.osterl...@oracle.com>> wrote:
Hi JC,
Sorry for the delayed reply.
Inlined answers:
On 2018-02-06 00:04, JC Beyler wrote:
Hi Erik,
(Renaming this to be folded into the
newly renamed thread :))
First off, thanks a lot for reviewing
the webrev! I appreciate it!
I updated the webrev to:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/
<http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/>
And the incremental one is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/
<http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.04_05a/>
It contains:
- The change for since from 9 to 11 for
the jvmti.xml
- The use of the OrderAccess for initialized
- Clearing the oop
I also have inlined my answers to your
comments. The biggest question
will come from the multiple *_end
variables. A bit of the logic there
is due to handling the slow path refill
vs fast path refill and
checking that the rug was not pulled
underneath the slowpath. I
believe that a previous comment was that
TlabFastRefill was going to
be deprecated.
If this is true, we could revert this
code a bit and just do a : if
TlabFastRefill is enabled, disable this.
And then deprecate that when
TlabFastRefill is deprecated.
This might simplify this webrev and I
can work on a follow-up that
either: removes TlabFastRefill if Robbin
does not have the time to do
it or add the support to the assembly
side to handle this correctly.
What do you think?
I support removing TlabFastRefill, but I
think it is good to not depend on that
happening first.
I'm slowly pushing on the FastTLABRefill
(https://bugs.openjdk.java.net/browse/JDK-8194084),
I agree on keeping both separate for now though
so that we can think of both differently
Now, below, inlined are my answers:
On Fri, Feb 2, 2018 at 8:44 AM, Erik
Österlund
<erik.osterl...@oracle.com
<mailto:erik.osterl...@oracle.com>> wrote:
Hi JC,
Hope I am reviewing the right
version of your work. Here goes...
src/hotspot/share/gc/shared/collectedHeap.inline.hpp:
159
AllocTracer::send_allocation_outside_tlab(klass, result, size *
HeapWordSize, THREAD);
160
161
THREAD->tlab().handle_sample(THREAD, result, size);
162 return result;
163 }
Should not call tlab()->X without
checking if (UseTLAB) IMO.
Done!
More about this later.
src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp:
So first of all, there seems to
quite a few ends. There is an "end",
a "hard
end", a "slow path end", and an
"actual end". Moreover, it seems
like the
"hard end" is actually further away
than the "actual end". So the "hard end"
seems like more of a "really
definitely actual end" or something.
I don't
know about you, but I think it looks
kind of messy. In particular, I don't
feel like the name "actual end"
reflects what it represents,
especially when
there is another end that is behind
the "actual end".
413 HeapWord*
ThreadLocalAllocBuffer::hard_end() {
414 // Did a fast TLAB refill
occur?
415 if (_slow_path_end != _end) {
416 // Fix up the actual end
to be now the end of this TLAB.
417 _slow_path_end = _end;
418 _actual_end = _end;
419 }
420
421 return _actual_end +
alignment_reserve();
422 }
I really do not like making getters
unexpectedly have these kind of side
effects. It is not expected that
when you ask for the "hard end", you
implicitly update the "slow path
end" and "actual end" to new values.
As I said, a lot of this is due to the
FastTlabRefill. If I make this
not supporting FastTlabRefill, this goes
away. The reason the system
needs to update itself at the get is
that you only know at that get if
things have shifted underneath the tlab
slow path. I am not sure of
really better names (naming is hard!),
perhaps we could do these
names:
- current_tlab_end // Either the
allocated tlab end or a sampling point
- last_allocation_address // The end of
the tlab allocation
- last_slowpath_allocated_end // In
case a fast refill occurred the
end might have changed, this is to
remember slow vs fast past refills
the hard_end method can be renamed to
something like:
tlab_end_pointer() // The end of
the lab including a bit of
alignment reserved bytes
Those names sound better to me. Could you
please provide a mapping from the old names
to the new names so I understand which one
is which please?
This is my current guess of what you are
proposing:
end -> current_tlab_end
actual_end -> last_allocation_address
slow_path_end -> last_slowpath_allocated_end
hard_end -> tlab_end_pointer
Yes that is correct, that was what I was proposing.
I would prefer this naming:
end -> slow_path_end // the end for taking a
slow path; either due to sampling or refilling
actual_end -> allocation_end // the end for
allocations
slow_path_end -> last_slow_path_end // last
address for slow_path_end (as opposed to
allocation_end)
hard_end -> reserved_end // the end of the
reserved space of the TLAB
About setting things in the getter... that
still seems like a very unpleasant thing to
me. It would be better to inspect the call
hierarchy and explicitly update the ends
where they need updating, and assert in the
getter that they are in sync, rather than
implicitly setting various ends as a
surprising side effect in a getter. It looks
like the call hierarchy is very small. With
my new naming convention, reserved_end()
would presumably return _allocation_end +
alignment_reserve(), and have an assert
checking that _allocation_end ==
_last_slow_path_allocation_end, complaining
that this invariant must hold, and that a
caller to this function, such as
make_parsable(), must first explicitly
synchronize the ends as required, to honor
that invariant.
I've renamed the variables to how you preferred
it except for the _end one. I did:
current_end
last_allocation_address
tlab_end_ptr
The reason is that the architecture dependent
code use the thread.hpp API and it already has
tlab included into the name so it becomes
tlab_current_end (which is better that
tlab_current_tlab_end in my opinion).
I also moved the update into a separate method
with a TODO that says to remove it when
FastTLABRefill is deprecated
This looks a lot better now. Thanks.
Note that the following comment now needs updating
accordingly in threadLocalAllocBuffer.hpp:
41 // Heap sampling is performed via
the end/actual_end fields.
42 // actual_end contains the real end
of the tlab allocation,
43 // whereas end can be set to an
arbitrary spot in the tlab to
44 // trip the return and sample the
allocation.
45 // slow_path_end is used to track
if a fast tlab refill occured
46 // between slowpath calls.
There might be other comments too, I have not looked
in detail.
This was the only spot that still had an actual_end, I
fixed it now. I'll do a sweep to double check other
comments.
Not sure it's better but before updating
the webrev, I wanted to try
to get input/consensus :)
(Note hard_end was always further off
than end).
src/hotspot/share/prims/jvmti.xml:
10357 <capabilityfield
id="can_sample_heap" since="9">
10358 <description>
10359 Can sample the heap.
10360 If this capability
is enabled then the heap sampling
methods
can be called.
10361 </description>
10362 </capabilityfield>
Looks like this capability should
not be "since 9" if it gets integrated
now.
Updated now to 11, crossing my fingers :)
src/hotspot/share/runtime/heapMonitoring.cpp:
448 if
(is_alive->do_object_b(value)) {
449 // Update the oop to
point to the new object if it is still
alive.
450 f->do_oop(&(trace.obj));
451
452 // Copy the old
trace, if it is still live.
453
_allocated_traces->at_put(curr_pos++, trace);
454
455 // Store the live
trace in a cache, to be served up on
/heapz.
456
_traces_on_last_full_gc->append(trace);
457
458 count++;
459 } else {
460 // If the old trace
is no longer live, add it to the list of
461 // recently collected
garbage.
462
store_garbage_trace(trace);
463 }
In the case where the oop was not
live, I would like it to be explicitly
cleared.
Done I think how you wanted it. Let me
know because I'm not familiar
with the RootAccess API. I'm unclear if
I'm doing this right or not so
reviews of these parts are highly
appreciated. Robbin had talked of
perhaps later pushing this all into a
OopStorage, should I do this now
do you think? Or can that wait a second
webrev later down the road?
I think using handles can and should be done
later. You can use the Access API now.
I noticed that you are missing an #include
"oops/access.inline.hpp" in your
heapMonitoring.cpp file.
The missing header is there for me so I don't
know, I made sure it is present in the latest
webrev. Sorry about that.
+ Did I clear it the way you wanted me
to or were you thinking of
something else?
That is precisely how I wanted it to be
cleared. Thanks.
+ Final question here, seems like if I
were to want to not do the
f->do_oop directly on the trace.obj, I'd
need to do something like:
f->do_oop(&value);
...
trace->store_oop(value);
to update the oop internally. Is that
right/is that one of the
advantages of going to the Oopstorage
sooner than later?
I think you really want to do the do_oop on
the root directly. Is there a particular
reason why you would not want to do that?
Otherwise, yes - the benefit with using the
handle approach is that you do not need to
call do_oop explicitly in your code.
There is no reason except that now we have a
load_oop and a get_oop_addr, I was not sure what
you would think of that.
That's fine.
Also I see a lot of
concurrent-looking use of the
following field:
267 volatile bool _initialized;
Please note that the "volatile"
qualifier does not help with reordering
here. Reordering between volatile
and non-volatile fields is
completely free
for both compiler and hardware,
except for windows with MSVC, where
volatile
semantics is defined to use
acquire/release semantics, and the
hardware is
TSO. But for the general case, I
would expect this field to be stored
with
OrderAccess::release_store and
loaded with OrderAccess::load_acquire.
Otherwise it is not thread safe.
Because everything is behind a mutex, I
wasn't really worried about
this. I have a test that has multiple
threads trying to hit this
corner case and it passes.
However, to be paranoid, I updated it to
using the OrderAccess API
now, thanks! Let me know what you think
there too!
If it is indeed always supposed to be read
and written under a mutex, then I would
strongly prefer to have it accessed as a
normal non-volatile member, and have an
assertion that given lock is held or we are
in a safepoint, as we do in many other
places. Something like this:
assert(HeapMonitorStorage_lock->owned_by_self()
|| (SafepointSynchronize::is_at_safepoint()
&& Thread::current()->is_VM_thread()), "this
should not be accessed concurrently");
It would be confusing to people reading the
code if there are uses of OrderAccess that
are actually always protected under a mutex.
Thank you for the exact example to be put in the
code! I put it around each access/assignment of
the _initialized method and found one case where
yes you can touch it and not have the lock. It
actually is "ok" because you don't act on the
storage until later and only when you really
want to modify the storage (see the
object_alloc_do_sample method which calls the
add_trace method).
But, because of this, I'm going to put the
OrderAccess here, I'll do some performance
numbers later and if there are issues, I might
add a "unsafe" read and a "safe" one to make it
explicit to the reader. But I don't think it
will come to that.
Okay. This double return in heapMonitoring.cpp looks
wrong:
283 bool initialized() {
284 return
OrderAccess::load_acquire(&_initialized) != 0;
285 return _initialized;
286 }
Since you said object_alloc_do_sample() is the only
place where you do not hold the mutex while reading
initialized(), I had a closer look at that. It looks
like in its current shape, the lack of a mutex may
lead to a memory leak. In particular, it first
checks if (initialized()). Let's assume this is now
true. It then allocates a bunch of stuff, and checks
if the number of frames were over 0. If they were,
it calls StackTraceStorage::storage()->add_trace()
seemingly hoping that after grabbing the lock in
there, initialized() will still return true. But it
could now return false and skip doing anything, in
which case the allocated stuff will never be freed.
I fixed this now by making add_trace return a boolean
and checking for that. It will be in the next webrev.
Thanks, the truth is that in our implementation the
system is always on or off, so this never really occurs
:). In this version though, that is not true and it's
important to handle so thanks again!
So the analysis seems to be that _initialized is
only used outside of the mutex in once instance,
where it is used to perform double-checked locking,
that actually causes a memory leak.
I am not proposing how to fix that, just raising the
issue. If you still want to perform this
double-checked locking somehow, then the use of
acquire/release still seems odd. Because the memory
ordering restrictions of it never comes into play in
this particular case. If it ever did, then the use
of destroy_stuff(); release_store(_initialized, 0)
would be broken anyway as that would imply that
whatever concurrent reader there ever was would
after reading _initialized with load_acquire() could
*never* read the data that is concurrently destroyed
anyway. I would be biased to think that
RawAccess<MO_RELAXED>::load/store looks like a more
appropriate solution, given that the memory leak
issue is resolved. I do not know how painful it
would be to not perform this double-checked locking.
So I agree with this entirely. I looked also a bit more
and the difference and code really stems from our
internal version. In this version however, there are
actually a lot of things going on that I did not go
entirely through in my head but this comment made me
ponder a bit more on it.
Since every object_alloc_do_sample is protected by a
check to HeapMonitoring::enabled(), there is only a
small chance that the call is happening when things have
been disabled. So there is no real need to do a first
check on the initialized, it is a rare occurence that a
call happens to object_alloc_do_sample and the
initialized of the storage returns false.
(By the way, even if you did call object_alloc_do_sample
without looking at HeapMonitoring::enabled(), that would
be ok too. You would gather the stacktrace and get
nowhere at the add_trace call, which would return false;
so though not optimal performance wise, nothing would
break).
Furthermore, the add_trace is really the moment of no
return and we have the mutex lock and then the
initialized check. So, in the end, I did two things: I
removed that first check and then I removed the
OrderAccess for the storage initialized. I think now I
have a better grasp and understanding why it was done in
our code and why it is not needed here. Thanks for
pointing it out :). This now still passes my JTREG
tests, especially the threaded one.
As a kind of meta comment, I wonder
if it would make sense to add sampling
for non-TLAB allocations. Seems like
if someone is rapidly allocating a
whole bunch of 1 MB objects that
never fit in a TLAB, I might still be
interested in seeing that in my
traces, and not get surprised that the
allocation rate is very high yet not
showing up in any profiles.
That is handled by the handle_sample
where you wanted me to put a
UseTlab because you hit that case if the
allocation is too big.
I see. It was not obvious to me that
non-TLAB sampling is done in the TLAB class.
That seems like an abstraction crime.
What I wanted in my previous comment was
that we do not call into the TLAB when we
are not using TLABs. If there is sampling
logic in the TLAB that is used for something
else than TLABs, then it seems like that
logic simply does not belong inside of the
TLAB. It should be moved out of the TLAB,
and instead have the TLAB call this common
abstraction that makes sense.
So in the incremental version:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/
<http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.07_08/>,
this is still a "crime". The reason is that the
system has to have the bytes_until_sample on a
per-thread level and it made "sense" to have it
with the TLAB implementation. Also, I was not
sure how people felt about adding something to
the thread instance instead.
Do you think it fits better at the Thread level?
I can see how difficult it is to make it happen
there and add some logic there. Let me know what
you think.
We have an unfortunate situation where everyone that
has some fields that are thread local tend to dump
them right into Thread, making the size and
complexity of Thread grow as it becomes tightly
coupled with various unrelated subsystems. It would
be desirable to have a separate class for this
instead that encapsulates the sampling logic. That
class could possibly reside in Thread though as a
value object of Thread.
I imagined that would be the case but was not sure. I
will look at the example that Robbin is talking about
(ThreadSMR) and will see how to refactor my code to use
that.
Thanks again for your help,
Jc
Hope I have answered your questions and that
my feedback makes sense to you.
You have and thank you for them, I think we are
getting to a cleaner implementation and things
are getting better and more readable :)
Yes it is getting better.
Thanks,
/Erik
Thanks for your help!
Jc
Thanks,
/Erik
I double checked by changing the test
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java
<http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java>
to use a smaller Tlab (2048) and made
the object bigger and it goes
through that and passes.
Thanks again for your review and I look
forward to your pointers for
the questions I now have raised!
Jc
Thanks,
/Erik
On 2018-01-26 06:45, JC Beyler wrote:
Thanks Robbin for the reviews :)
The new full webrev is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/
<http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.03/>
The incremental webrev is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/
<http://cr.openjdk.java.net/%7Ejcbeyler/8171119/webrev.02_03/>
I inlined my answers:
On Thu, Jan 25, 2018 at 1:15 AM,
Robbin Ehn
<robbin....@oracle.com
<mailto:robbin....@oracle.com>>
wrote:
Hi JC, great to see another
revision!
####
heapMonitoring.cpp
StackTraceData should not
contain the oop for 'safety'
reasons.
When StackTraceData is moved
from _allocated_traces:
L452 store_garbage_trace(trace);
it contains a dead oop.
_allocated_traces could
instead be a tupel of oop
and StackTraceData thus
dead oops are not kept.
Done I used inheritance to make
the copier work regardless but the
idea is the same.
You should use the new
Access API for loading the
oop, something like
this:
RootAccess<ON_PHANTOM_OOP_REF |
AS_NO_KEEPALIVE>::load(...)
I don't think you need to
use Access API for clearing
the oop, but it
would
look nicer. And you
shouldn't probably be using:
Universe::heap()->is_in_reserved(value)
I am unfamiliar with this but I
think I did do it like you wanted me
to (all tests pass so that's a
start). I'm not sure how to
clear the
oop exactly, is there somewhere
that does that, which I can use
to do
the same?
I removed the is_in_reserved,
this came from our internal
version, I
don't know why it was there but
my tests work without so I
removed it
:)
The lock:
L424 MutexLocker
mu(HeapMonitorStorage_lock);
Is not needed as far as I
can see.
weak_oops_do is called in a
safepoint, no TLAB
allocation can happen and
JVMTI thread can't access
these data-structures. Is
there something more
to
this lock that I'm missing?
Since a thread can call the
JVMTI getLiveTraces (or any of
the other
ones), it can get to the point
of trying to copying the
_allocated_traces. I imagine it
is possible that this is happening
during a GC or that it can be
started and a GC happens afterwards.
Therefore, it seems to me that
you want this protected, no?
####
You have 6 files without any
changes in them (any more):
g1CollectedHeap.cpp
psMarkSweep.cpp
psParallelCompact.cpp
genCollectedHeap.cpp
referenceProcessor.cpp
thread.hpp
Done.
####
I have not looked closely,
but is it possible to hide
heap sampling in
AllocTracer ? (with some
minor changes to the
AllocTracer API)
I am imagining that you are
saying to move the code that
does the
sampling code (change the tlab
end, do the call to HeapMonitoring,
etc.) into the AllocTracer code
itself? I think that is right
and I'll
look if that is possible and
prepare a webrev to show what
would be
needed to make that happen.
####
Minor nit, when declaring
pointer there is a little
mix of having the
pointer adjacent by type
name and data name. (Most
hotspot code is by
type
name)
E.g.
heapMonitoring.cpp:711
jvmtiStackTrace *trace = ....
heapMonitoring.cpp:733
Method* m = vfst.method();
(not just this file)
Done!
####
HeapMonitorThreadOnOffTest.java:77
I would make g_tmp volatile,
otherwise the assignment in
loop may
theoretical be skipped.
Also done!
Thanks again!
Jc