Re: JDK-8171119: Low-Overhead Heap Profiling

JC Beyler Mon, 02 Apr 2018 10:18:07 -0700

Hi Derek,

I know there were a few things that went in that provoked a merge conflict.
I worked on it and got it up to date. Sadly my lack of knowledge makes it a
full rebase instead of keeping all the history. However, with a newly
cloned jdk/hs you should now be able to use:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/


The change you are referring to was done with the others so perhaps you
were unlucky and I forgot it in a webrev and fixed it in another? I don't
know but it's been there and I checked, it is here:
http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html

I double checked that tlab_end_offset no longer appears in any architecture
(as far as I can tell :)).

Thanks for testing and let me know if you run into any other issues!
Jc


On Fri, Mar 30, 2018 at 4:24 PM White, Derek <derek.wh...@cavium.com> wrote:

> Hi Jc,
>
>
>
> I’ve been having trouble getting your patch to apply correctly. I may have
> based it on the wrong version.
>
>
>
> In any case, I think there’s a missing update to
> macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), where
> “JavaThread::tlab_end_offset()” should become
> “JavaThread::tlab_current_end_offset()”.
>
>
>
> This should correspond to the other port’s changes in
> templateTable_<cpu>.cpp files.
>
>
>
> Thanks!
> - Derek
>
>
>
> *From:* hotspot-compiler-dev [mailto:
> hotspot-compiler-dev-boun...@openjdk.java.net] *On Behalf Of *JC Beyler
> *Sent:* Wednesday, March 28, 2018 11:43 AM
> *To:* Erik Österlund <erik.osterl...@oracle.com>
> *Cc:* serviceability-dev@openjdk.java.net; hotspot-compiler-dev <
> hotspot-compiler-...@openjdk.java.net>
> *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling
>
>
>
> Hi all,
>
>
>
> I've been working on deflaking the tests mostly and the wording in the
> JVMTI spec.
>
>
>
> Here is the two incremental webrevs:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/
>
>
>
> Here is the total webrev:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/
>
>
>
> Here are the notes of this change:
>
>   - Currently the tests pass 100 times in a row, I am working on checking
> if they pass 1000 times in a row.
>
>   - The default sampling rate is set to 512k, this is what we use
> internally and having a default means that to enable the sampling with the
> default, the user only has to do a enable event/disable event via JVMTI
> (instead of enable + set sample rate).
>
>   - I deprecated the code that was handling the fast path tlab refill if
> it happened since this is now deprecated
>
>       - Though I saw that Graal is still using it so I have to see what
> needs to be done there exactly
>
>
>
> Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when
> the event system is turned on and the callback to the native agent is just
> empty. I got a 3% overhead with a 512k sampling rate with the code I put in
> the native side of my tests.
>
>
>
> Thanks and comments are appreciated,
>
> Jc
>
>
>
>
>
> On Mon, Mar 19, 2018 at 2:06 PM JC Beyler <jcbey...@google.com> wrote:
>
> Hi all,
>
>
>
> The incremental webrev update is here:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/
>
>
>
> The full webrev is here:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/
>
>
>
> Major change here is:
>
>   - I've removed the heapMonitoring.cpp code in favor of just having the
> sampling events as per Serguei's request; I still have to do some overhead
> measurements but the tests prove the concept can work
>
>        - Most of the tlab code is unchanged, the only major part is that
> now things get sent off to event collectors when used and enabled.
>
>   - Added the interpreter collectors to handle interpreter execution
>
>   - Updated the name from SetTlabHeapSampling to SetHeapSampling to be
> more generic
>
>   - Added a mutex for the thread sampling so that we can initialize an
> internal static array safely
>
>   - Ported the tests from the old system to this new one
>
>
>
> I've also updated the JEP and CSR to reflect these changes:
>
>  https://bugs.openjdk.java.net/browse/JDK-8194905
>
>  https://bugs.openjdk.java.net/browse/JDK-8171119
>
>
>
> In order to make this have some forward progress, I've removed the heap
> sampling code entirely and now rely entirely on the event sampling system.
> The tests reflect this by using a simplified implementation of what an
> agent could do:
>
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c
>
> (Search for anything mentioning event_storage).
>
>
>
> I have not taken the time to port the whole code we had originally in
> heapMonitoring to this. I hesitate only because that code was in C++, I'd
> have to port it to C and this is for tests so perhaps what I have now is
> good enough?
>
>
>
> As far as testing goes, I've ported all the relevant tests and then added
> a few:
>
>    - Turning the system on/off
>
>    - Testing using various GCs
>
>    - Testing using the interpreter
>
>    - Testing the sampling rate
>
>    - Testing with objects and arrays
>
>    - Testing with various threads
>
>
>
> Finally, as overhead goes, I have the numbers of the system off vs a clean
> build and I have 0% overhead, which is what we'd want. This was using the
> Dacapo benchmarks. I am now preparing to run a version with the events on
> using dacapo and will report back here.
>
>
>
> Any comments are welcome :)
>
> Jc
>
>
>
>
>
>
>
> On Thu, Mar 8, 2018 at 4:00 PM JC Beyler <jcbey...@google.com> wrote:
>
> Hi all,
>
>
>
> I apologize for the delay but I wanted to add an event system and that
> took a bit longer than expected and I also reworked the code to take into
> account the deprecation of FastTLABRefill.
>
>
>
> This update has four parts:
>
>
>
> A) I moved the implementation from Thread to ThreadHeapSampler inside of
> Thread. Would you prefer it as a pointer inside of Thread or like this
> works for you? Second question would be would you rather have an
> association outside of Thread altogether that tries to remember when
> threads are live and then we would have something like:
>
> ThreadHeapSampler::get_sampling_size(this_thread);
>
>
>
> I worry about the overhead of this but perhaps it is not too too bad?
>
>
>
> B) I also have been working on the Allocation event system that sends out
> a notification at each sampled event. This will be practical when wanting
> to do something at the allocation point. I'm also looking at if the whole
> heapMonitoring code could not reside in the agent code and not in the JDK.
> I'm not convinced but I'm talking to Serguei about it to see/assess :)
>
>    - Also added two tests for the new event subsystem
>
>
>
> C) Removed the slow_path fields inside the TLAB code since now
> FastTLABRefill is deprecated
>
>
>
> D) Updated the JVMTI documentation and specification for the methods.
>
>
>
> So the incremental webrev is here:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/
>
>
>
> and the full webrev is here:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10
>
>
>
> I believe I have updated the various JIRA issues that track this :)
>
>
>
> Thanks for your input,
>
> Jc
>
>
>
>
>
> On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler <jcbey...@google.com> wrote:
>
> Hi Erik,
>
>
>
> I inlined my answers, which the last one seems to answer Robbin's concerns
> about the same thing (adding things to Thread).
>
>
>
> On Wed, Feb 14, 2018 at 2:51 AM, Erik Österlund <erik.osterl...@oracle.com>
> wrote:
>
> Hi JC,
>
> Comments are inlined below.
>
>
>
> On 2018-02-13 06:18, JC Beyler wrote:
>
> Hi Erik,
>
>
>
> Thanks for your answers, I've now inlined my own answers/comments.
>
>
>
> I've done a new webrev here:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/
>
>
>
> The incremental is here:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/
>
>
>
> Note to all:
>
>   - I've been integrating changes from Erin/Serguei/David comments so this
> webrev incremental is a bit an answer to all comments in one. I apologize
> for that :)
>
>
>
>
>
> On Mon, Feb 12, 2018 at 6:05 AM, Erik Österlund <erik.osterl...@oracle.com>
> wrote:
>
> Hi JC,
>
> Sorry for the delayed reply.
>
> Inlined answers:
>
>
>
> On 2018-02-06 00:04, JC Beyler wrote:
>
> Hi Erik,
>
> (Renaming this to be folded into the newly renamed thread :))
>
> First off, thanks a lot for reviewing the webrev! I appreciate it!
>
> I updated the webrev to:
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/
>
> And the incremental one is here:
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/
>
> It contains:
> - The change for since from 9 to 11 for the jvmti.xml
> - The use of the OrderAccess for initialized
> - Clearing the oop
>
> I also have inlined my answers to your comments. The biggest question
> will come from the multiple *_end variables. A bit of the logic there
> is due to handling the slow path refill vs fast path refill and
> checking that the rug was not pulled underneath the slowpath. I
> believe that a previous comment was that TlabFastRefill was going to
> be deprecated.
>
> If this is true, we could revert this code a bit and just do a : if
> TlabFastRefill is enabled, disable this. And then deprecate that when
> TlabFastRefill is deprecated.
>
> This might simplify this webrev and I can work on a follow-up that
> either: removes TlabFastRefill if Robbin does not have the time to do
> it or add the support to the assembly side to handle this correctly.
> What do you think?
>
>
>
> I support removing TlabFastRefill, but I think it is good to not depend on
> that happening first.
>
>
>
>
> I'm slowly pushing on the FastTLABRefill 
> (https://bugs.openjdk.java.net/browse/JDK-8194084),
> I agree on keeping both separate for now though so that we can think of
> both differently
>
>
>
>
>
> Now, below, inlined are my answers:
>
> On Fri, Feb 2, 2018 at 8:44 AM, Erik Österlund
> <erik.osterl...@oracle.com> wrote:
>
> Hi JC,
>
> Hope I am reviewing the right version of your work. Here goes...
>
> src/hotspot/share/gc/shared/collectedHeap.inline.hpp:
>
>   159     AllocTracer::send_allocation_outside_tlab(klass, result, size *
> HeapWordSize, THREAD);
>   160
>   161     THREAD->tlab().handle_sample(THREAD, result, size);
>   162     return result;
>   163   }
>
> Should not call tlab()->X without checking if (UseTLAB) IMO.
>
> Done!
>
>
> More about this later.
>
>
>
>
>
> src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp:
>
> So first of all, there seems to quite a few ends. There is an "end", a
> "hard
> end", a "slow path end", and an "actual end". Moreover, it seems like the
> "hard end" is actually further away than the "actual end". So the "hard
> end"
> seems like more of a "really definitely actual end" or something. I don't
> know about you, but I think it looks kind of messy. In particular, I don't
> feel like the name "actual end" reflects what it represents, especially
> when
> there is another end that is behind the "actual end".
>
>   413 HeapWord* ThreadLocalAllocBuffer::hard_end() {
>   414   // Did a fast TLAB refill occur?
>   415   if (_slow_path_end != _end) {
>   416     // Fix up the actual end to be now the end of this TLAB.
>   417     _slow_path_end = _end;
>   418     _actual_end = _end;
>   419   }
>   420
>   421   return _actual_end + alignment_reserve();
>   422 }
>
> I really do not like making getters unexpectedly have these kind of side
> effects. It is not expected that when you ask for the "hard end", you
> implicitly update the "slow path end" and "actual end" to new values.
>
> As I said, a lot of this is due to the FastTlabRefill. If I make this
> not supporting FastTlabRefill, this goes away. The reason the system
> needs to update itself at the get is that you only know at that get if
> things have shifted underneath the tlab slow path. I am not sure of
> really better names (naming is hard!), perhaps we could do these
> names:
>
> - current_tlab_end       // Either the allocated tlab end or a sampling
> point
> - last_allocation_address  // The end of the tlab allocation
> - last_slowpath_allocated_end  // In case a fast refill occurred the
> end might have changed, this is to remember slow vs fast past refills
>
> the hard_end method can be renamed to something like:
> tlab_end_pointer()        // The end of the lab including a bit of
> alignment reserved bytes
>
>
>
> Those names sound better to me. Could you please provide a mapping from
> the old names to the new names so I understand which one is which please?
>
> This is my current guess of what you are proposing:
>
> end -> current_tlab_end
> actual_end -> last_allocation_address
> slow_path_end -> last_slowpath_allocated_end
> hard_end -> tlab_end_pointer
>
>
>
> Yes that is correct, that was what I was proposing.
>
>
>
> I would prefer this naming:
>
> end -> slow_path_end // the end for taking a slow path; either due to
> sampling or refilling
> actual_end -> allocation_end // the end for allocations
> slow_path_end -> last_slow_path_end // last address for slow_path_end (as
> opposed to allocation_end)
> hard_end -> reserved_end // the end of the reserved space of the TLAB
>
> About setting things in the getter... that still seems like a very
> unpleasant thing to me. It would be better to inspect the call hierarchy
> and explicitly update the ends where they need updating, and assert in the
> getter that they are in sync, rather than implicitly setting various ends
> as a surprising side effect in a getter. It looks like the call hierarchy
> is very small. With my new naming convention, reserved_end() would
> presumably return _allocation_end + alignment_reserve(), and have an assert
> checking that _allocation_end == _last_slow_path_allocation_end,
> complaining that this invariant must hold, and that a caller to this
> function, such as make_parsable(), must first explicitly synchronize the
> ends as required, to honor that invariant.
>
>
>
>
>
>
> I've renamed the variables to how you preferred it except for the _end
> one. I did:
>
> current_end
>
> last_allocation_address
>
> tlab_end_ptr
>
>
>
> The reason is that the architecture dependent code use the thread.hpp API
> and it already has tlab included into the name so it becomes
> tlab_current_end (which is better that tlab_current_tlab_end in my opinion).
>
>
>
> I also moved the update into a separate method with a TODO that says to
> remove it when FastTLABRefill is deprecated
>
>
>
> This looks a lot better now. Thanks.
>
> Note that the following comment now needs updating accordingly in
> threadLocalAllocBuffer.hpp:
>
>   41 //            Heap sampling is performed via the end/actual_end fields.
>
>   42 //            actual_end contains the real end of the tlab allocation,
>
>   43 //            whereas end can be set to an arbitrary spot in the tlab to
>
>   44 //            trip the return and sample the allocation.
>
>   45 //            slow_path_end is used to track if a fast tlab refill 
> occured
>
>   46 //            between slowpath calls.
>
> There might be other comments too, I have not looked in detail.
>
>
>
> This was the only spot that still had an actual_end, I fixed it now. I'll
> do a sweep to double check other comments.
>
>
>
>
>
>
>
>
>
>
>
>
> Not sure it's better but before updating the webrev, I wanted to try
> to get input/consensus :)
>
> (Note hard_end was always further off than end).
>
> src/hotspot/share/prims/jvmti.xml:
>
> 10357       <capabilityfield id="can_sample_heap" since="9">
> 10358         <description>
> 10359           Can sample the heap.
> 10360           If this capability is enabled then the heap sampling
> methods
> can be called.
> 10361         </description>
> 10362       </capabilityfield>
>
> Looks like this capability should not be "since 9" if it gets integrated
> now.
>
> Updated now to 11, crossing my fingers :)
>
> src/hotspot/share/runtime/heapMonitoring.cpp:
>
>   448       if (is_alive->do_object_b(value)) {
>   449         // Update the oop to point to the new object if it is still
> alive.
>   450         f->do_oop(&(trace.obj));
>   451
>   452         // Copy the old trace, if it is still live.
>   453         _allocated_traces->at_put(curr_pos++, trace);
>   454
>   455         // Store the live trace in a cache, to be served up on
> /heapz.
>   456         _traces_on_last_full_gc->append(trace);
>   457
>   458         count++;
>   459       } else {
>   460         // If the old trace is no longer live, add it to the list of
>   461         // recently collected garbage.
>   462         store_garbage_trace(trace);
>   463       }
>
> In the case where the oop was not live, I would like it to be explicitly
> cleared.
>
> Done I think how you wanted it. Let me know because I'm not familiar
> with the RootAccess API. I'm unclear if I'm doing this right or not so
> reviews of these parts are highly appreciated. Robbin had talked of
> perhaps later pushing this all into a OopStorage, should I do this now
> do you think? Or can that wait a second webrev later down the road?
>
>
>
> I think using handles can and should be done later. You can use the Access
> API now.
> I noticed that you are missing an #include "oops/access.inline.hpp" in
> your heapMonitoring.cpp file.
>
>
>
> The missing header is there for me so I don't know, I made sure it is
> present in the latest webrev. Sorry about that.
>
>
>
> + Did I clear it the way you wanted me to or were you thinking of
> something else?
>
>
> That is precisely how I wanted it to be cleared. Thanks.
>
> + Final question here, seems like if I were to want to not do the
> f->do_oop directly on the trace.obj, I'd need to do something like:
>
>     f->do_oop(&value);
>     ...
>     trace->store_oop(value);
>
> to update the oop internally. Is that right/is that one of the
> advantages of going to the Oopstorage sooner than later?
>
>
> I think you really want to do the do_oop on the root directly. Is there a
> particular reason why you would not want to do that?
> Otherwise, yes - the benefit with using the handle approach is that you do
> not need to call do_oop explicitly in your code.
>
>
>
> There is no reason except that now we have a load_oop and a get_oop_addr,
> I was not sure what you would think of that.
>
>
>
>
>
> That's fine.
>
>
>
>
>
>
> Also I see a lot of concurrent-looking use of the following field:
>   267   volatile bool _initialized;
>
> Please note that the "volatile" qualifier does not help with reordering
> here. Reordering between volatile and non-volatile fields is completely
> free
> for both compiler and hardware, except for windows with MSVC, where
> volatile
> semantics is defined to use acquire/release semantics, and the hardware is
> TSO. But for the general case, I would expect this field to be stored with
> OrderAccess::release_store and loaded with OrderAccess::load_acquire.
> Otherwise it is not thread safe.
>
> Because everything is behind a mutex, I wasn't really worried about
> this. I have a test that has multiple threads trying to hit this
> corner case and it passes.
>
> However, to be paranoid, I updated it to using the OrderAccess API
> now, thanks! Let me know what you think there too!
>
>
> If it is indeed always supposed to be read and written under a mutex, then
> I would strongly prefer to have it accessed as a normal non-volatile
> member, and have an assertion that given lock is held or we are in a
> safepoint, as we do in many other places. Something like this:
>
> assert(HeapMonitorStorage_lock->owned_by_self() ||
> (SafepointSynchronize::is_at_safepoint() &&
> Thread::current()->is_VM_thread()), "this should not be accessed
> concurrently");
>
> It would be confusing to people reading the code if there are uses of
> OrderAccess that are actually always protected under a mutex.
>
>
>
> Thank you for the exact example to be put in the code! I put it around
> each access/assignment of the _initialized method and found one case where
> yes you can touch it and not have the lock. It actually is "ok" because you
> don't act on the storage until later and only when you really want to
> modify the storage (see the object_alloc_do_sample method which calls the
> add_trace method).
>
>
>
> But, because of this, I'm going to put the OrderAccess here, I'll do some
> performance numbers later and if there are issues, I might add a "unsafe"
> read and a "safe" one to make it explicit to the reader. But I don't think
> it will come to that.
>
>
> Okay. This double return in heapMonitoring.cpp looks wrong:
>
>  283   bool initialized() {
>  284     return OrderAccess::load_acquire(&_initialized) != 0;
>  285     return _initialized;
>  286   }
>
> Since you said object_alloc_do_sample() is the only place where you do not
> hold the mutex while reading initialized(), I had a closer look at that. It
> looks like in its current shape, the lack of a mutex may lead to a memory
> leak. In particular, it first checks if (initialized()). Let's assume this
> is now true. It then allocates a bunch of stuff, and checks if the number
> of frames were over 0. If they were, it calls
> StackTraceStorage::storage()->add_trace() seemingly hoping that after
> grabbing the lock in there, initialized() will still return true. But it
> could now return false and skip doing anything, in which case the allocated
> stuff will never be freed.
>
>
>
> I fixed this now by making add_trace return a boolean and checking for
> that. It will be in the next webrev. Thanks, the truth is that in our
> implementation the system is always on or off, so this never really occurs
> :). In this version though, that is not true and it's important to handle
> so thanks again!
>
>
>
>
>
>
> So the analysis seems to be that _initialized is only used outside of the
> mutex in once instance, where it is used to perform double-checked locking,
> that actually causes a memory leak.
>
> I am not proposing how to fix that, just raising the issue. If you still
> want to perform this double-checked locking somehow, then the use of
> acquire/release still seems odd. Because the memory ordering restrictions
> of it never comes into play in this particular case. If it ever did, then
> the use of destroy_stuff(); release_store(_initialized, 0) would be broken
> anyway as that would imply that whatever concurrent reader there ever was
> would after reading _initialized with load_acquire() could *never* read the
> data that is concurrently destroyed anyway. I would be biased to think that
> RawAccess<MO_RELAXED>::load/store looks like a more appropriate solution,
> given that the memory leak issue is resolved. I do not know how painful it
> would be to not perform this double-checked locking.
>
>
>
> So I agree with this entirely. I looked also a bit more and the difference
> and code really stems from our internal version. In this version however,
> there are actually a lot of things going on that I did not go entirely
> through in my head but this comment made me ponder a bit more on it.
>
>
>
> Since every object_alloc_do_sample is protected by a check to
> HeapMonitoring::enabled(), there is only a small chance that the call is
> happening when things have been disabled. So there is no real need to do a
> first check on the initialized, it is a rare occurence that a call happens
> to object_alloc_do_sample and the initialized of the storage returns false.
>
>
>
> (By the way, even if you did call object_alloc_do_sample without looking
> at HeapMonitoring::enabled(), that would be ok too. You would gather the
> stacktrace and get nowhere at the add_trace call, which would return false;
> so though not optimal performance wise, nothing would break).
>
>
>
> Furthermore, the add_trace is really the moment of no return and we have
> the mutex lock and then the initialized check. So, in the end, I did two
> things: I removed that first check and then I removed the OrderAccess for
> the storage initialized. I think now I have a better grasp and
> understanding why it was done in our code and why it is not needed here.
> Thanks for pointing it out :). This now still passes my JTREG tests,
> especially the threaded one.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> As a kind of meta comment, I wonder if it would make sense to add sampling
> for non-TLAB allocations. Seems like if someone is rapidly allocating a
> whole bunch of 1 MB objects that never fit in a TLAB, I might still be
> interested in seeing that in my traces, and not get surprised that the
> allocation rate is very high yet not showing up in any profiles.
>
> That is handled by the handle_sample where you wanted me to put a
> UseTlab because you hit that case if the allocation is too big.
>
>
> I see. It was not obvious to me that non-TLAB sampling is done in the TLAB
> class. That seems like an abstraction crime.
> What I wanted in my previous comment was that we do not call into the TLAB
> when we are not using TLABs. If there is sampling logic in the TLAB that is
> used for something else than TLABs, then it seems like that logic simply
> does not belong inside of the TLAB. It should be moved out of the TLAB, and
> instead have the TLAB call this common abstraction that makes sense.
>
>
>
> So in the incremental version:
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is still
> a "crime". The reason is that the system has to have the bytes_until_sample
> on a per-thread level and it made "sense" to have it with the TLAB
> implementation. Also, I was not sure how people felt about adding something
> to the thread instance instead.
>
>
>
> Do you think it fits better at the Thread level? I can see how difficult
> it is to make it happen there and add some logic there. Let me know what
> you think.
>
>
> We have an unfortunate situation where everyone that has some fields that
> are thread local tend to dump them right into Thread, making the size and
> complexity of Thread grow as it becomes tightly coupled with various
> unrelated subsystems. It would be desirable to have a separate class for
> this instead that encapsulates the sampling logic. That class could
> possibly reside in Thread though as a value object of Thread.
>
>
>
> I imagined that would be the case but was not sure. I will look at the
> example that Robbin is talking about (ThreadSMR) and will see how to
> refactor my code to use that.
>
>
>
> Thanks again for your help,
>
> Jc
>
>
>
>
>
>
>
>
>
>
> Hope I have answered your questions and that my feedback makes sense to
> you.
>
>
>
> You have and thank you for them, I think we are getting to a cleaner
> implementation and things are getting better and more readable :)
>
>
> Yes it is getting better.
>
> Thanks,
> /Erik
>
>
>
>
> Thanks for your help!
>
> Jc
>
>
>
>
>
> Thanks,
> /Erik
>
>
>
> I double checked by changing the test
>
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java
>
> to use a smaller Tlab (2048) and made the object bigger and it goes
> through that and passes.
>
> Thanks again for your review and I look forward to your pointers for
> the questions I now have raised!
> Jc
>
>
>
>
>
>
>
>
> Thanks,
> /Erik
>
>
> On 2018-01-26 06:45, JC Beyler wrote:
>
> Thanks Robbin for the reviews :)
>
> The new full webrev is here:
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/
> The incremental webrev is here:
> http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/
>
> I inlined my answers:
>
> On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn <robbin....@oracle.com> wrote:
>
> Hi JC, great to see another revision!
>
> ####
> heapMonitoring.cpp
>
> StackTraceData should not contain the oop for 'safety' reasons.
> When StackTraceData is moved from _allocated_traces:
> L452 store_garbage_trace(trace);
> it contains a dead oop.
> _allocated_traces could instead be a tupel of oop and StackTraceData thus
> dead oops are not kept.
>
> Done I used inheritance to make the copier work regardless but the
> idea is the same.
>
> You should use the new Access API for loading the oop, something like
> this:
> RootAccess<ON_PHANTOM_OOP_REF | AS_NO_KEEPALIVE>::load(...)
> I don't think you need to use Access API for clearing the oop, but it
> would
> look nicer. And you shouldn't probably be using:
> Universe::heap()->is_in_reserved(value)
>
> I am unfamiliar with this but I think I did do it like you wanted me
> to (all tests pass so that's a start). I'm not sure how to clear the
> oop exactly, is there somewhere that does that, which I can use to do
> the same?
>
> I removed the is_in_reserved, this came from our internal version, I
> don't know why it was there but my tests work without so I removed it
> :)
>
> The lock:
> L424   MutexLocker mu(HeapMonitorStorage_lock);
> Is not needed as far as I can see.
> weak_oops_do is called in a safepoint, no TLAB allocation can happen and
> JVMTI thread can't access these data-structures. Is there something more
> to
> this lock that I'm missing?
>
> Since a thread can call the JVMTI getLiveTraces (or any of the other
> ones), it can get to the point of trying to copying the
> _allocated_traces. I imagine it is possible that this is happening
> during a GC or that it can be started and a GC happens afterwards.
> Therefore, it seems to me that you want this protected, no?
>
> ####
> You have 6 files without any changes in them (any more):
> g1CollectedHeap.cpp
> psMarkSweep.cpp
> psParallelCompact.cpp
> genCollectedHeap.cpp
> referenceProcessor.cpp
> thread.hpp
>
> Done.
>
> ####
> I have not looked closely, but is it possible to hide heap sampling in
> AllocTracer ? (with some minor changes to the AllocTracer API)
>
> I am imagining that you are saying to move the code that does the
> sampling code (change the tlab end, do the call to HeapMonitoring,
> etc.) into the AllocTracer code itself? I think that is right and I'll
> look if that is possible and prepare a webrev to show what would be
> needed to make that happen.
>
> ####
> Minor nit, when declaring pointer there is a little mix of having the
> pointer adjacent by type name and data name. (Most hotspot code is by
> type
> name)
> E.g.
> heapMonitoring.cpp:711     jvmtiStackTrace *trace = ....
> heapMonitoring.cpp:733         Method* m = vfst.method();
> (not just this file)
>
> Done!
>
> ####
> HeapMonitorThreadOnOffTest.java:77
> I would make g_tmp volatile, otherwise the assignment in loop may
> theoretical be skipped.
>
> Also done!
>
> Thanks again!
> Jc
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: JDK-8171119: Low-Overhead Heap Profiling

Reply via email to