Hi Derek, I know there were a few things that went in that provoked a merge conflict. I worked on it and got it up to date. Sadly my lack of knowledge makes it a full rebase instead of keeping all the history. However, with a newly cloned jdk/hs you should now be able to use: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/
The change you are referring to was done with the others so perhaps you were unlucky and I forgot it in a webrev and fixed it in another? I don't know but it's been there and I checked, it is here: http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.10/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp.udiff.html I double checked that tlab_end_offset no longer appears in any architecture (as far as I can tell :)). Thanks for testing and let me know if you run into any other issues! Jc On Fri, Mar 30, 2018 at 4:24 PM White, Derek <derek.wh...@cavium.com> wrote: > Hi Jc, > > > > I’ve been having trouble getting your patch to apply correctly. I may have > based it on the wrong version. > > > > In any case, I think there’s a missing update to > macroAssembler_aarch64.cpp, in MacroAssembler::tlab_allocate(), where > “JavaThread::tlab_end_offset()” should become > “JavaThread::tlab_current_end_offset()”. > > > > This should correspond to the other port’s changes in > templateTable_<cpu>.cpp files. > > > > Thanks! > - Derek > > > > *From:* hotspot-compiler-dev [mailto: > hotspot-compiler-dev-boun...@openjdk.java.net] *On Behalf Of *JC Beyler > *Sent:* Wednesday, March 28, 2018 11:43 AM > *To:* Erik Österlund <erik.osterl...@oracle.com> > *Cc:* serviceability-dev@openjdk.java.net; hotspot-compiler-dev < > hotspot-compiler-...@openjdk.java.net> > *Subject:* Re: JDK-8171119: Low-Overhead Heap Profiling > > > > Hi all, > > > > I've been working on deflaking the tests mostly and the wording in the > JVMTI spec. > > > > Here is the two incremental webrevs: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.5_6/ > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.06_07/ > > > > Here is the total webrev: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event.07/ > > > > Here are the notes of this change: > > - Currently the tests pass 100 times in a row, I am working on checking > if they pass 1000 times in a row. > > - The default sampling rate is set to 512k, this is what we use > internally and having a default means that to enable the sampling with the > default, the user only has to do a enable event/disable event via JVMTI > (instead of enable + set sample rate). > > - I deprecated the code that was handling the fast path tlab refill if > it happened since this is now deprecated > > - Though I saw that Graal is still using it so I have to see what > needs to be done there exactly > > > > Finally, using the Dacapo benchmark suite, I noted a 1% overhead for when > the event system is turned on and the callback to the native agent is just > empty. I got a 3% overhead with a 512k sampling rate with the code I put in > the native side of my tests. > > > > Thanks and comments are appreciated, > > Jc > > > > > > On Mon, Mar 19, 2018 at 2:06 PM JC Beyler <jcbey...@google.com> wrote: > > Hi all, > > > > The incremental webrev update is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event4_5/ > > > > The full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/ > > > > Major change here is: > > - I've removed the heapMonitoring.cpp code in favor of just having the > sampling events as per Serguei's request; I still have to do some overhead > measurements but the tests prove the concept can work > > - Most of the tlab code is unchanged, the only major part is that > now things get sent off to event collectors when used and enabled. > > - Added the interpreter collectors to handle interpreter execution > > - Updated the name from SetTlabHeapSampling to SetHeapSampling to be > more generic > > - Added a mutex for the thread sampling so that we can initialize an > internal static array safely > > - Ported the tests from the old system to this new one > > > > I've also updated the JEP and CSR to reflect these changes: > > https://bugs.openjdk.java.net/browse/JDK-8194905 > > https://bugs.openjdk.java.net/browse/JDK-8171119 > > > > In order to make this have some forward progress, I've removed the heap > sampling code entirely and now rely entirely on the event sampling system. > The tests reflect this by using a simplified implementation of what an > agent could do: > > > http://cr.openjdk.java.net/~jcbeyler/8171119/heap_event5/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitor.c > > (Search for anything mentioning event_storage). > > > > I have not taken the time to port the whole code we had originally in > heapMonitoring to this. I hesitate only because that code was in C++, I'd > have to port it to C and this is for tests so perhaps what I have now is > good enough? > > > > As far as testing goes, I've ported all the relevant tests and then added > a few: > > - Turning the system on/off > > - Testing using various GCs > > - Testing using the interpreter > > - Testing the sampling rate > > - Testing with objects and arrays > > - Testing with various threads > > > > Finally, as overhead goes, I have the numbers of the system off vs a clean > build and I have 0% overhead, which is what we'd want. This was using the > Dacapo benchmarks. I am now preparing to run a version with the events on > using dacapo and will report back here. > > > > Any comments are welcome :) > > Jc > > > > > > > > On Thu, Mar 8, 2018 at 4:00 PM JC Beyler <jcbey...@google.com> wrote: > > Hi all, > > > > I apologize for the delay but I wanted to add an event system and that > took a bit longer than expected and I also reworked the code to take into > account the deprecation of FastTLABRefill. > > > > This update has four parts: > > > > A) I moved the implementation from Thread to ThreadHeapSampler inside of > Thread. Would you prefer it as a pointer inside of Thread or like this > works for you? Second question would be would you rather have an > association outside of Thread altogether that tries to remember when > threads are live and then we would have something like: > > ThreadHeapSampler::get_sampling_size(this_thread); > > > > I worry about the overhead of this but perhaps it is not too too bad? > > > > B) I also have been working on the Allocation event system that sends out > a notification at each sampled event. This will be practical when wanting > to do something at the allocation point. I'm also looking at if the whole > heapMonitoring code could not reside in the agent code and not in the JDK. > I'm not convinced but I'm talking to Serguei about it to see/assess :) > > - Also added two tests for the new event subsystem > > > > C) Removed the slow_path fields inside the TLAB code since now > FastTLABRefill is deprecated > > > > D) Updated the JVMTI documentation and specification for the methods. > > > > So the incremental webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.09_10/ > > > > and the full webrev is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.10 > > > > I believe I have updated the various JIRA issues that track this :) > > > > Thanks for your input, > > Jc > > > > > > On Wed, Feb 14, 2018 at 10:34 PM, JC Beyler <jcbey...@google.com> wrote: > > Hi Erik, > > > > I inlined my answers, which the last one seems to answer Robbin's concerns > about the same thing (adding things to Thread). > > > > On Wed, Feb 14, 2018 at 2:51 AM, Erik Österlund <erik.osterl...@oracle.com> > wrote: > > Hi JC, > > Comments are inlined below. > > > > On 2018-02-13 06:18, JC Beyler wrote: > > Hi Erik, > > > > Thanks for your answers, I've now inlined my own answers/comments. > > > > I've done a new webrev here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.08/ > > > > The incremental is here: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/ > > > > Note to all: > > - I've been integrating changes from Erin/Serguei/David comments so this > webrev incremental is a bit an answer to all comments in one. I apologize > for that :) > > > > > > On Mon, Feb 12, 2018 at 6:05 AM, Erik Österlund <erik.osterl...@oracle.com> > wrote: > > Hi JC, > > Sorry for the delayed reply. > > Inlined answers: > > > > On 2018-02-06 00:04, JC Beyler wrote: > > Hi Erik, > > (Renaming this to be folded into the newly renamed thread :)) > > First off, thanks a lot for reviewing the webrev! I appreciate it! > > I updated the webrev to: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/ > > And the incremental one is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.04_05a/ > > It contains: > - The change for since from 9 to 11 for the jvmti.xml > - The use of the OrderAccess for initialized > - Clearing the oop > > I also have inlined my answers to your comments. The biggest question > will come from the multiple *_end variables. A bit of the logic there > is due to handling the slow path refill vs fast path refill and > checking that the rug was not pulled underneath the slowpath. I > believe that a previous comment was that TlabFastRefill was going to > be deprecated. > > If this is true, we could revert this code a bit and just do a : if > TlabFastRefill is enabled, disable this. And then deprecate that when > TlabFastRefill is deprecated. > > This might simplify this webrev and I can work on a follow-up that > either: removes TlabFastRefill if Robbin does not have the time to do > it or add the support to the assembly side to handle this correctly. > What do you think? > > > > I support removing TlabFastRefill, but I think it is good to not depend on > that happening first. > > > > > I'm slowly pushing on the FastTLABRefill > (https://bugs.openjdk.java.net/browse/JDK-8194084), > I agree on keeping both separate for now though so that we can think of > both differently > > > > > > Now, below, inlined are my answers: > > On Fri, Feb 2, 2018 at 8:44 AM, Erik Österlund > <erik.osterl...@oracle.com> wrote: > > Hi JC, > > Hope I am reviewing the right version of your work. Here goes... > > src/hotspot/share/gc/shared/collectedHeap.inline.hpp: > > 159 AllocTracer::send_allocation_outside_tlab(klass, result, size * > HeapWordSize, THREAD); > 160 > 161 THREAD->tlab().handle_sample(THREAD, result, size); > 162 return result; > 163 } > > Should not call tlab()->X without checking if (UseTLAB) IMO. > > Done! > > > More about this later. > > > > > > src/hotspot/share/gc/shared/threadLocalAllocBuffer.cpp: > > So first of all, there seems to quite a few ends. There is an "end", a > "hard > end", a "slow path end", and an "actual end". Moreover, it seems like the > "hard end" is actually further away than the "actual end". So the "hard > end" > seems like more of a "really definitely actual end" or something. I don't > know about you, but I think it looks kind of messy. In particular, I don't > feel like the name "actual end" reflects what it represents, especially > when > there is another end that is behind the "actual end". > > 413 HeapWord* ThreadLocalAllocBuffer::hard_end() { > 414 // Did a fast TLAB refill occur? > 415 if (_slow_path_end != _end) { > 416 // Fix up the actual end to be now the end of this TLAB. > 417 _slow_path_end = _end; > 418 _actual_end = _end; > 419 } > 420 > 421 return _actual_end + alignment_reserve(); > 422 } > > I really do not like making getters unexpectedly have these kind of side > effects. It is not expected that when you ask for the "hard end", you > implicitly update the "slow path end" and "actual end" to new values. > > As I said, a lot of this is due to the FastTlabRefill. If I make this > not supporting FastTlabRefill, this goes away. The reason the system > needs to update itself at the get is that you only know at that get if > things have shifted underneath the tlab slow path. I am not sure of > really better names (naming is hard!), perhaps we could do these > names: > > - current_tlab_end // Either the allocated tlab end or a sampling > point > - last_allocation_address // The end of the tlab allocation > - last_slowpath_allocated_end // In case a fast refill occurred the > end might have changed, this is to remember slow vs fast past refills > > the hard_end method can be renamed to something like: > tlab_end_pointer() // The end of the lab including a bit of > alignment reserved bytes > > > > Those names sound better to me. Could you please provide a mapping from > the old names to the new names so I understand which one is which please? > > This is my current guess of what you are proposing: > > end -> current_tlab_end > actual_end -> last_allocation_address > slow_path_end -> last_slowpath_allocated_end > hard_end -> tlab_end_pointer > > > > Yes that is correct, that was what I was proposing. > > > > I would prefer this naming: > > end -> slow_path_end // the end for taking a slow path; either due to > sampling or refilling > actual_end -> allocation_end // the end for allocations > slow_path_end -> last_slow_path_end // last address for slow_path_end (as > opposed to allocation_end) > hard_end -> reserved_end // the end of the reserved space of the TLAB > > About setting things in the getter... that still seems like a very > unpleasant thing to me. It would be better to inspect the call hierarchy > and explicitly update the ends where they need updating, and assert in the > getter that they are in sync, rather than implicitly setting various ends > as a surprising side effect in a getter. It looks like the call hierarchy > is very small. With my new naming convention, reserved_end() would > presumably return _allocation_end + alignment_reserve(), and have an assert > checking that _allocation_end == _last_slow_path_allocation_end, > complaining that this invariant must hold, and that a caller to this > function, such as make_parsable(), must first explicitly synchronize the > ends as required, to honor that invariant. > > > > > > > I've renamed the variables to how you preferred it except for the _end > one. I did: > > current_end > > last_allocation_address > > tlab_end_ptr > > > > The reason is that the architecture dependent code use the thread.hpp API > and it already has tlab included into the name so it becomes > tlab_current_end (which is better that tlab_current_tlab_end in my opinion). > > > > I also moved the update into a separate method with a TODO that says to > remove it when FastTLABRefill is deprecated > > > > This looks a lot better now. Thanks. > > Note that the following comment now needs updating accordingly in > threadLocalAllocBuffer.hpp: > > 41 // Heap sampling is performed via the end/actual_end fields. > > 42 // actual_end contains the real end of the tlab allocation, > > 43 // whereas end can be set to an arbitrary spot in the tlab to > > 44 // trip the return and sample the allocation. > > 45 // slow_path_end is used to track if a fast tlab refill > occured > > 46 // between slowpath calls. > > There might be other comments too, I have not looked in detail. > > > > This was the only spot that still had an actual_end, I fixed it now. I'll > do a sweep to double check other comments. > > > > > > > > > > > > > Not sure it's better but before updating the webrev, I wanted to try > to get input/consensus :) > > (Note hard_end was always further off than end). > > src/hotspot/share/prims/jvmti.xml: > > 10357 <capabilityfield id="can_sample_heap" since="9"> > 10358 <description> > 10359 Can sample the heap. > 10360 If this capability is enabled then the heap sampling > methods > can be called. > 10361 </description> > 10362 </capabilityfield> > > Looks like this capability should not be "since 9" if it gets integrated > now. > > Updated now to 11, crossing my fingers :) > > src/hotspot/share/runtime/heapMonitoring.cpp: > > 448 if (is_alive->do_object_b(value)) { > 449 // Update the oop to point to the new object if it is still > alive. > 450 f->do_oop(&(trace.obj)); > 451 > 452 // Copy the old trace, if it is still live. > 453 _allocated_traces->at_put(curr_pos++, trace); > 454 > 455 // Store the live trace in a cache, to be served up on > /heapz. > 456 _traces_on_last_full_gc->append(trace); > 457 > 458 count++; > 459 } else { > 460 // If the old trace is no longer live, add it to the list of > 461 // recently collected garbage. > 462 store_garbage_trace(trace); > 463 } > > In the case where the oop was not live, I would like it to be explicitly > cleared. > > Done I think how you wanted it. Let me know because I'm not familiar > with the RootAccess API. I'm unclear if I'm doing this right or not so > reviews of these parts are highly appreciated. Robbin had talked of > perhaps later pushing this all into a OopStorage, should I do this now > do you think? Or can that wait a second webrev later down the road? > > > > I think using handles can and should be done later. You can use the Access > API now. > I noticed that you are missing an #include "oops/access.inline.hpp" in > your heapMonitoring.cpp file. > > > > The missing header is there for me so I don't know, I made sure it is > present in the latest webrev. Sorry about that. > > > > + Did I clear it the way you wanted me to or were you thinking of > something else? > > > That is precisely how I wanted it to be cleared. Thanks. > > + Final question here, seems like if I were to want to not do the > f->do_oop directly on the trace.obj, I'd need to do something like: > > f->do_oop(&value); > ... > trace->store_oop(value); > > to update the oop internally. Is that right/is that one of the > advantages of going to the Oopstorage sooner than later? > > > I think you really want to do the do_oop on the root directly. Is there a > particular reason why you would not want to do that? > Otherwise, yes - the benefit with using the handle approach is that you do > not need to call do_oop explicitly in your code. > > > > There is no reason except that now we have a load_oop and a get_oop_addr, > I was not sure what you would think of that. > > > > > > That's fine. > > > > > > > Also I see a lot of concurrent-looking use of the following field: > 267 volatile bool _initialized; > > Please note that the "volatile" qualifier does not help with reordering > here. Reordering between volatile and non-volatile fields is completely > free > for both compiler and hardware, except for windows with MSVC, where > volatile > semantics is defined to use acquire/release semantics, and the hardware is > TSO. But for the general case, I would expect this field to be stored with > OrderAccess::release_store and loaded with OrderAccess::load_acquire. > Otherwise it is not thread safe. > > Because everything is behind a mutex, I wasn't really worried about > this. I have a test that has multiple threads trying to hit this > corner case and it passes. > > However, to be paranoid, I updated it to using the OrderAccess API > now, thanks! Let me know what you think there too! > > > If it is indeed always supposed to be read and written under a mutex, then > I would strongly prefer to have it accessed as a normal non-volatile > member, and have an assertion that given lock is held or we are in a > safepoint, as we do in many other places. Something like this: > > assert(HeapMonitorStorage_lock->owned_by_self() || > (SafepointSynchronize::is_at_safepoint() && > Thread::current()->is_VM_thread()), "this should not be accessed > concurrently"); > > It would be confusing to people reading the code if there are uses of > OrderAccess that are actually always protected under a mutex. > > > > Thank you for the exact example to be put in the code! I put it around > each access/assignment of the _initialized method and found one case where > yes you can touch it and not have the lock. It actually is "ok" because you > don't act on the storage until later and only when you really want to > modify the storage (see the object_alloc_do_sample method which calls the > add_trace method). > > > > But, because of this, I'm going to put the OrderAccess here, I'll do some > performance numbers later and if there are issues, I might add a "unsafe" > read and a "safe" one to make it explicit to the reader. But I don't think > it will come to that. > > > Okay. This double return in heapMonitoring.cpp looks wrong: > > 283 bool initialized() { > 284 return OrderAccess::load_acquire(&_initialized) != 0; > 285 return _initialized; > 286 } > > Since you said object_alloc_do_sample() is the only place where you do not > hold the mutex while reading initialized(), I had a closer look at that. It > looks like in its current shape, the lack of a mutex may lead to a memory > leak. In particular, it first checks if (initialized()). Let's assume this > is now true. It then allocates a bunch of stuff, and checks if the number > of frames were over 0. If they were, it calls > StackTraceStorage::storage()->add_trace() seemingly hoping that after > grabbing the lock in there, initialized() will still return true. But it > could now return false and skip doing anything, in which case the allocated > stuff will never be freed. > > > > I fixed this now by making add_trace return a boolean and checking for > that. It will be in the next webrev. Thanks, the truth is that in our > implementation the system is always on or off, so this never really occurs > :). In this version though, that is not true and it's important to handle > so thanks again! > > > > > > > So the analysis seems to be that _initialized is only used outside of the > mutex in once instance, where it is used to perform double-checked locking, > that actually causes a memory leak. > > I am not proposing how to fix that, just raising the issue. If you still > want to perform this double-checked locking somehow, then the use of > acquire/release still seems odd. Because the memory ordering restrictions > of it never comes into play in this particular case. If it ever did, then > the use of destroy_stuff(); release_store(_initialized, 0) would be broken > anyway as that would imply that whatever concurrent reader there ever was > would after reading _initialized with load_acquire() could *never* read the > data that is concurrently destroyed anyway. I would be biased to think that > RawAccess<MO_RELAXED>::load/store looks like a more appropriate solution, > given that the memory leak issue is resolved. I do not know how painful it > would be to not perform this double-checked locking. > > > > So I agree with this entirely. I looked also a bit more and the difference > and code really stems from our internal version. In this version however, > there are actually a lot of things going on that I did not go entirely > through in my head but this comment made me ponder a bit more on it. > > > > Since every object_alloc_do_sample is protected by a check to > HeapMonitoring::enabled(), there is only a small chance that the call is > happening when things have been disabled. So there is no real need to do a > first check on the initialized, it is a rare occurence that a call happens > to object_alloc_do_sample and the initialized of the storage returns false. > > > > (By the way, even if you did call object_alloc_do_sample without looking > at HeapMonitoring::enabled(), that would be ok too. You would gather the > stacktrace and get nowhere at the add_trace call, which would return false; > so though not optimal performance wise, nothing would break). > > > > Furthermore, the add_trace is really the moment of no return and we have > the mutex lock and then the initialized check. So, in the end, I did two > things: I removed that first check and then I removed the OrderAccess for > the storage initialized. I think now I have a better grasp and > understanding why it was done in our code and why it is not needed here. > Thanks for pointing it out :). This now still passes my JTREG tests, > especially the threaded one. > > > > > > > > > > > > > > > > > As a kind of meta comment, I wonder if it would make sense to add sampling > for non-TLAB allocations. Seems like if someone is rapidly allocating a > whole bunch of 1 MB objects that never fit in a TLAB, I might still be > interested in seeing that in my traces, and not get surprised that the > allocation rate is very high yet not showing up in any profiles. > > That is handled by the handle_sample where you wanted me to put a > UseTlab because you hit that case if the allocation is too big. > > > I see. It was not obvious to me that non-TLAB sampling is done in the TLAB > class. That seems like an abstraction crime. > What I wanted in my previous comment was that we do not call into the TLAB > when we are not using TLABs. If there is sampling logic in the TLAB that is > used for something else than TLABs, then it seems like that logic simply > does not belong inside of the TLAB. It should be moved out of the TLAB, and > instead have the TLAB call this common abstraction that makes sense. > > > > So in the incremental version: > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.07_08/, this is still > a "crime". The reason is that the system has to have the bytes_until_sample > on a per-thread level and it made "sense" to have it with the TLAB > implementation. Also, I was not sure how people felt about adding something > to the thread instance instead. > > > > Do you think it fits better at the Thread level? I can see how difficult > it is to make it happen there and add some logic there. Let me know what > you think. > > > We have an unfortunate situation where everyone that has some fields that > are thread local tend to dump them right into Thread, making the size and > complexity of Thread grow as it becomes tightly coupled with various > unrelated subsystems. It would be desirable to have a separate class for > this instead that encapsulates the sampling logic. That class could > possibly reside in Thread though as a value object of Thread. > > > > I imagined that would be the case but was not sure. I will look at the > example that Robbin is talking about (ThreadSMR) and will see how to > refactor my code to use that. > > > > Thanks again for your help, > > Jc > > > > > > > > > > > Hope I have answered your questions and that my feedback makes sense to > you. > > > > You have and thank you for them, I think we are getting to a cleaner > implementation and things are getting better and more readable :) > > > Yes it is getting better. > > Thanks, > /Erik > > > > > Thanks for your help! > > Jc > > > > > > Thanks, > /Erik > > > > I double checked by changing the test > > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.05a/raw_files/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatObjectCorrectnessTest.java > > to use a smaller Tlab (2048) and made the object bigger and it goes > through that and passes. > > Thanks again for your review and I look forward to your pointers for > the questions I now have raised! > Jc > > > > > > > > > Thanks, > /Erik > > > On 2018-01-26 06:45, JC Beyler wrote: > > Thanks Robbin for the reviews :) > > The new full webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.03/ > The incremental webrev is here: > http://cr.openjdk.java.net/~jcbeyler/8171119/webrev.02_03/ > > I inlined my answers: > > On Thu, Jan 25, 2018 at 1:15 AM, Robbin Ehn <robbin....@oracle.com> wrote: > > Hi JC, great to see another revision! > > #### > heapMonitoring.cpp > > StackTraceData should not contain the oop for 'safety' reasons. > When StackTraceData is moved from _allocated_traces: > L452 store_garbage_trace(trace); > it contains a dead oop. > _allocated_traces could instead be a tupel of oop and StackTraceData thus > dead oops are not kept. > > Done I used inheritance to make the copier work regardless but the > idea is the same. > > You should use the new Access API for loading the oop, something like > this: > RootAccess<ON_PHANTOM_OOP_REF | AS_NO_KEEPALIVE>::load(...) > I don't think you need to use Access API for clearing the oop, but it > would > look nicer. And you shouldn't probably be using: > Universe::heap()->is_in_reserved(value) > > I am unfamiliar with this but I think I did do it like you wanted me > to (all tests pass so that's a start). I'm not sure how to clear the > oop exactly, is there somewhere that does that, which I can use to do > the same? > > I removed the is_in_reserved, this came from our internal version, I > don't know why it was there but my tests work without so I removed it > :) > > The lock: > L424 MutexLocker mu(HeapMonitorStorage_lock); > Is not needed as far as I can see. > weak_oops_do is called in a safepoint, no TLAB allocation can happen and > JVMTI thread can't access these data-structures. Is there something more > to > this lock that I'm missing? > > Since a thread can call the JVMTI getLiveTraces (or any of the other > ones), it can get to the point of trying to copying the > _allocated_traces. I imagine it is possible that this is happening > during a GC or that it can be started and a GC happens afterwards. > Therefore, it seems to me that you want this protected, no? > > #### > You have 6 files without any changes in them (any more): > g1CollectedHeap.cpp > psMarkSweep.cpp > psParallelCompact.cpp > genCollectedHeap.cpp > referenceProcessor.cpp > thread.hpp > > Done. > > #### > I have not looked closely, but is it possible to hide heap sampling in > AllocTracer ? (with some minor changes to the AllocTracer API) > > I am imagining that you are saying to move the code that does the > sampling code (change the tlab end, do the call to HeapMonitoring, > etc.) into the AllocTracer code itself? I think that is right and I'll > look if that is possible and prepare a webrev to show what would be > needed to make that happen. > > #### > Minor nit, when declaring pointer there is a little mix of having the > pointer adjacent by type name and data name. (Most hotspot code is by > type > name) > E.g. > heapMonitoring.cpp:711 jvmtiStackTrace *trace = .... > heapMonitoring.cpp:733 Method* m = vfst.method(); > (not just this file) > > Done! > > #### > HeapMonitorThreadOnOffTest.java:77 > I would make g_tmp volatile, otherwise the assignment in loop may > theoretical be skipped. > > Also done! > > Thanks again! > Jc > > > > > > > > > > > > > >