Upload a new webrev at
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_07/
It fix a potential issue that unexpected number of threads maybe calculated
for "parallel" option of jmap -histo in container.
As shown at
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_07-delta/src/hotspot/share/services/attachListener.cpp.udiff.html
############### attachListener.cpp ####################
@@ -252,11 +252,11 @@
static jint heap_inspection(AttachOperation* op, outputStream* out) {
bool live_objects_only = true; // default is true to retain the behavior
before this change is made
outputStream* os = out; // if path not specified or path is NULL, use out
fileStream* fs = NULL;
const char* arg0 = op->arg(0);
- uint parallel_thread_num = MAX(1, os::processor_count() * 3 / 8); // default
is less than half of processors.
+ uint parallel_thread_num = MAX(1, os::initial_active_processor_count() * 3 /
8); // default is less than half of processors.
if (arg0 != NULL && (strlen(arg0) > 0)) {
if (strcmp(arg0, "-all") != 0 && strcmp(arg0, "-live") != 0) {
out->print_cr("Invalid argument to inspectheap operation: %s", arg0);
return JNI_ERR;
}
###################################################
Thanks.
BRs,
Lin
On 2020/7/9, 3:22 PM, "linzang(臧琳)" <[email protected]> wrote:
Hi Paul,
Thanks for reviewing!
>>
>> I'd move all the argument parsing code to JMap.java and just
pass the results to Hotspot. Both histo() in JMap.java and code in
attachListener.* parse the command line arguments, though the code in histo()
doesn't parse the argument to "parallel". I'd upgrade the code in histo() to do
a complete parse and pass the option values to executeCommandForPid as before:
there would just be more of them now. That would allow you to eliminate all the
parsing code in attachListener.cpp as well as the change to arguments.hpp.
>>
The reason I made the change in Jmap.java that compose all arguments as
1 string , instead of passing 3 argments, is to avoid the compatibility issue,
as we discussed in
http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-February/thread.html#27240.
The root cause of the compatibility issue is because max argument count in
HotspotVirtualMachineImpl.java and attachlistener.cpp need to be enlarged
(changes like http://hg.openjdk.java.net/jdk/jdk/rev/e7cf035682e3#l2.1) when
jmap has more than 3 arguments. But if user use an old jcmd/jmap tool, it may
stuck at socket read(), because the "max argument count" don't match.
I re-checked this change, the argument count of jmap histo is equal to
3 (live, file, parallel), so it can work normally even without the change of
passing argument. But I think we have to face the problem if more arguments is
added in jcmd alike tools later, not sure whether it should be sloved (or a
workaround) in this changeset.
And here are the lastest webrev and delta:
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_06/
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_06-delta/
Cheers,
Lin
On 2020/7/7, 5:57 AM, "Hohensee, Paul" <[email protected]> wrote:
I'd like to see this feature added. :)
The CSR looks good, as does the basic parallel inspection algorithm.
Stefan's done the GC part, so I'll stick to the non-GC part (fwiw, the GC part
lgtm).
I'd move all the argument parsing code to JMap.java and just pass the
results to Hotspot. Both histo() in JMap.java and code in attachListener.*
parse the command line arguments, though the code in histo() doesn't parse the
argument to "parallel". I'd upgrade the code in histo() to do a complete parse
and pass the option values to executeCommandForPid as before: there would just
be more of them now. That would allow you to eliminate all the parsing code in
attachListener.cpp as well as the change to arguments.hpp.
heapInspection.hpp:
_shared_miss_count (s/b _missed_count, see below) isn't a size, so it
should be a uint instead of a size_t. Same with the new parallel_thread_num
argument to heap_inspection() and populate_table().
Comment copy-edit:
+// Parallel heap inspection task. Parallel inspection can fail due to
+// a native OOM when allocating memory for TL-KlassInfoTable.
+// _success will be set false on an OOM, and serial inspection tried.
_shared_miss_count should be _missed_count to match the missed_count()
getter, or rename missed_count() to be shared_miss_count(). Whichever way you
go, the field type should match the getter result type: uint is reasonable.
heapInspection.cpp:
You might use ResourceMark twice in populate_table, separately for the
parallel attempt and the serial code. If the parallel attempt fails and
available memory is low, it would be good to clean up the memory used by the
parallel attempt before doing the serial code.
Style nit in KlassInfoTable::merge_entry(). I'd line up the definitions
of k and elt, so "k" is even with "elt". And, because it's two lines shorter,
I'd replace
+ } else {
+ return false;
+ }
with
+ return false;
KlassInfoTableMergeClosure.is_success() should be just success() (i.e.,
no "is_" prefix) because it's a getter.
I'd reorganize the code in populate_table() to make it more clear, vis
(I changed _shared_missed_count to _missed_count)
+ if (cit.allocation_failed()) {
+ // fail to allocate memory, stop parallel mode
+ Atomic::store(&_success, false);
+ return;
+ }
+ RecordInstanceClosure ric(&cit, _filter);
+ _poi->object_iterate(&ric, worker_id);
+ missed_count = ric.missed_count();
+ {
+ MutexLocker x(&_mutex);
+ merge_success = _shared_cit->merge(&cit);
+ }
+ if (merge_success) {
+ Atomic::add(&_missed_count, missed_count);
+ else {
+ Atomic::store(&_success, false);
+ }
Thanks,
Paul
On 6/29/20, 7:20 PM, "linzang(臧琳)" <[email protected]> wrote:
Dear All,
Sorry to bother again, I just want to make sure that is
this change worth to be continue to work on? If decision is made to not. I
think I can drop this work and stop asking for help reviewing...
Thanks for all your help about reviewing this previously.
BRs,
Lin
On 2020/5/9, 3:47 PM, "linzang(臧琳)" <[email protected]> wrote:
Dear All,
May I ask your help again for review the latest change?
Thanks!
BRs,
Lin
On 2020/4/28, 1:54 PM, "linzang(臧琳)" <[email protected]>
wrote:
Hi Stefan,
>> - Adding Atomic::load/store.
>> - Removing the time measurement in the run_task. I
renamed G1's function
>> to run_task_timed. If we need this outside of G1, we
can rethink the API
>> at that point.
>> - ZGC style cleanups
Thanks for revising the patch, they are all good to me,
and I have made a tiny change based on it:
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_04/
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_04-delta/
it reduce the scope of mutex in ParHeapInspectTask, and
delete unnecessary comments.
BRs,
Lin
On 2020/4/27, 4:34 PM, "Stefan Karlsson"
<[email protected]> wrote:
Hi Lin,
On 2020-04-26 05:10, linzang(臧琳) wrote:
> Hi Stefan and Paul,
> I have made a new patch based on your comments
and Stefan's Poc code:
> Webrev:
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_03/
> Delta(based on Stefan's change:) :
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_03-delta/webrev_03-delta/
Thanks for providing a delta patch. It makes it much
easier to look at,
and more likely for reviewers to continue reviewing.
I'm going to continue focusing on the GC parts, and
leave the rest to
others to review.
>
> And Here are main changed I made and want to
discuss with you:
> 1. changed"parallelThreadNum=" to "parallel="
for jmap -histo options.
> 2. Add logic to test where
parallelHeapInspection is fail, in heapInspection.cpp
> This is because the parHeapInspectTask
create thread local KlassInfoTable in it's work() method, and this may fail
because of native OOM, in this case, the parallel should fail and serial heap
inspection can be tried.
> One more thing I want discuss with you is
about the member "_success" of parHeapInspectTask, when native OOM happenes, it
is set to false. And since this "set" operation can be conducted in multiple
threads, should it be atomic ops? IMO, this is not necessary because
"_success" can only be set to false, and there is no way to change it from back
to true after the ParHeapInspectTask instance is created, so it is save to be
non-atomic, do you agree with that?
In these situations you should be using the
Atomic::load/store
primitives. We're moving toward a later C++ standard
were data races are
considered undefined behavior.
> 3. make CollectedHeap::run_task() be an abstract
virtual func, so that every subclass of collectedHeap should support it, so
later implementation of new collectedHeap will not miss the "parallel" features.
> The problem I want to discuss with you is
about epsilonHeap and SerialHeap, as they may not need parallel heap iteration,
so I only make task->work(0), in case the run_task() is invoked someway in
future. Another way is to left run_task() unimplemented, which one do you
think is better?
I don't have a strong opinion about this.
And also please help take a look at the zHeap, as
there is a class
zTask that wrap the abstractGangTask, and the
collectedHeap::run_task()
only accept AbstraceGangTask* as argument, so I made a
delegate class
to adapt it , please see
src/hotspot/share/gc/z/zHeap.cpp.
>
> There maybe other better ways to sovle the
above problems, welcome for any comments, Thanks!
I've created a few cleanups and changes on top of your
latest patch:
https://cr.openjdk.java.net/~stefank/8215624/webrev.02.delta
https://cr.openjdk.java.net/~stefank/8215624/webrev.02
- Adding Atomic::load/store.
- Removing the time measurement in the run_task. I
renamed G1's function
to run_task_timed. If we need this outside of G1, we
can rethink the API
at that point.
- ZGC style cleanups
Thanks,
StefanK
>
> BRs,
> Lin
>
> On 2020/4/23, 11:08 AM, "linzang(臧琳)"
<[email protected]> wrote:
>
> Thanks Paul! I agree with using "parallel", will
make the update in next patch, Thanks for help update the CSR.
>
> BRs,
> Lin
>
> On 2020/4/23, 4:42 AM, "Hohensee, Paul"
<[email protected]> wrote:
>
> For the interface, I'd use "parallel"
instead of "parallelThreadNum". All the other options are lower case, and it's
a lot easier to type "parallel". I took the liberty of updating the CSR. If
you're ok with it, you might want to change variable names and such, plus of
course JMap.usage.
>
> Thanks,
> Paul
>
> On 4/22/20, 2:29 AM, "serviceability-dev on
behalf of linzang(臧琳)" <[email protected] on behalf
of [email protected]> wrote:
>
> Dear Stefan,
>
> Thanks a lot! I agree with you
to decouple the heap inspection code with GC's.
> I will start from your POC
code, may discuss with you later.
>
>
> BRs,
> Lin
>
> On 2020/4/22, 5:14 PM, "Stefan Karlsson"
<[email protected]> wrote:
>
> Hi Lin,
>
> I took a look at this earlier and
saw that the heap inspection code is
> strongly coupled with the
CollectedHeap and G1CollectedHeap. I'd prefer
> if we'd abstract this away, so that
the GCs only provide a "parallel
> object iteration" interface, and the
heap inspection code is kept elsewhere.
>
> I started experimenting with doing
that, but other higher-priority (to
> me) tasks have had to take
precedence.
>
> I've uploaded my work-in-progress /
proof-of-concept:
>
https://cr.openjdk.java.net/~stefank/8215624/webrev.01.delta/
>
https://cr.openjdk.java.net/~stefank/8215624/webrev.01/
>
> The current code doesn't handle the
lifecycle (deletion) of the
> ParallelObjectIterators. There's
also code left unimplemented in around
> CollectedHeap::run_task. However, I
think this could work as a basis to
> pull out the heap inspection code
out of the GCs.
>
> Thanks,
> StefanK
>
> On 2020-04-22 02:21, linzang(臧琳)
wrote:
> > Dear all,
> > May I ask you help to
review? This RFR has been there for quite a while.
> > Thanks!
> >
> > BRs,
> > Lin
> >
> > > On 2020/3/16, 5:18 PM,
"linzang(臧琳)" <[email protected]> wrote:>
> >
> >> Just update a new path, my
preliminary measure show about 3.5x speedup of jmap histo on a nearly full 4GB
G1 heap (8-core platform with parallel thread number set to 4).
> >> webrev:
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_02/
> >> bug:
https://bugs.openjdk.java.net/browse/JDK-8215624
> >> CSR:
https://bugs.openjdk.java.net/browse/JDK-8239290
> >> BRs,
> >> Lin
> >> > On 2020/3/2, 9:56 PM,
"linzang(臧琳)" <[email protected]> wrote:
> >> >
> >> > Dear all,
> >> > Let me try to
ease the reviewing work by some explanation :P
> >> > The patch's
target is to speed up jmap -histo for heap iteration, from my experience it is
necessary for large heap investigation. E.g in bigData scenario I have tried to
conduct jmap -histo against 180GB heap, it does take quite a while.
> >> > And if my
understanding is corrent, even the jmap -histo without "live" option does heap
inspection with heap lock acquired. so it is very likely to block mutator
thread in allocation-sensitive scenario. I would say the faster the heap
inspection does, the shorter the mutator be blocked. This is parallel iteration
for jmap is necessary.
> >> > I think the
parallel heap inspection should be applied to all kind of heap. However,
consider the heap layout are different for GCs, much time is required to
understand all kinds of the heap layout to make the whole change. IMO, It is
not wise to have a huge patch for the whole solution at once, and it is even
harder to review it. So I plan to implement it incrementally, the first patch
(this one) is going to confirm the implemention detail of how jmap accept the
new option, passes it to attachListener of the jvm process and then how to make
the parallel inspection closure be generic enough to make it easy to extend to
different heap layout. And also how to implement the heap inspection in
specific gc's heap. This patch use G1's heap as the begining.
> >> > This patch
actually do several things:
> >> > 1. Add an option
"parallelThreadNum=<N>" to jmap -histo, the default behavior is to set N to 0,
means let's JVM decide how many threads to use for heap inspection. Set this
option to 1 will disable parallel heap inspection. (more details in CSR:
https://bugs.openjdk.java.net/browse/JDK-8239290)
> >> > 2. Make a change
in how Jmap passing arguments, changes in
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/src/jdk.jcmd/share/classes/sun/tools/jmap/JMap.java.udiff.html,
originally it pass options as separate arguments to attachListener, this patch
change to that all options be compose to a single string. So the arg_count_max
in attachListener.hpp do not need to be changed, and hence avoid the
compatibility issue, as disscussed at
https://mail.openjdk.java.net/pipermail/serviceability-dev/2019-March/027334.html
> >> > 3. Add an
abstract class ParHeapInspectTask in heapInspection.hpp / heapInspection.cpp,
It's work(uint worker_id) method prepares the data structure (KlassInfoTable)
need for every parallel worker thread, and then call
do_object_iterate_parallel() which is heap specific implementation. I also
added some machenism in KlassInfoTable to support parallel iteration, such as
merge().
> >> > 4. In specific
heap (G1 in this patch), create a subclass of ParHeapInspectTask, implement the
do_object_iterate_parallel() for parallel heap inspection. For G1, it simply
invoke g1CollectedHeap's object_iterate_parallel().
> >> > 5. Add related
test.
> >> > 6. it may be easy
to extend this patch for other kinds of heap by creating subclass of
ParHeapInspectTask and implement the do_object_iterate_parallel().
> >> >
> >> > Hope these info could
help on code review and initate the discussion :-)
> >> > Thanks!
> >> >
> >> > BRs,
> >> > Lin
> >> > >On 2020/2/19, 9:40
AM, "linzang(臧琳)" <[email protected]> wrote:.
> >> > >
> >> > > Re-post this RFR
with correct enhancement number to make it trackable.
> >> > > please ignore the
previous wrong post. sorry for troubles.
> >> > >
> >> > > webrev:
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215624/webrev_01/
> >> > > Hi bug:
https://bugs.openjdk.java.net/browse/JDK-8215624
> >> > > CSR:
https://bugs.openjdk.java.net/browse/JDK-8239290
> >> > > --------------
> >> > > Lin
> >> > > >Hi Lin,
> > > > > >
> >> > > >Could you,
please, re-post your RFR with the right enhancement number in
> >> > > >the message
subject?
> >> > > >It will be more
trackable this way.
> >> > > >
> >> > > >Thanks,
> >> > > >Serguei
> >> > > >
> >> > > >
> >> > > >On 2/17/20 10:29
PM, linzang(臧琳) wrote:
> >> > > >> Dear David,
> >> > > >> Thanks
a lot!
> >> > > >> I have
updated the refined code to
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_01/.
> >> > > >> IMHO
the parallel heap inspection can be extended to all kinds of heap as long as
the heap layout can support parallel iteration.
> >> > > >> Maybe
we can firstly use this webrev to discuss how to implement it, because I am not
sure my current implementation is an appropriate way to communicate with
collectedHeap, then we can extend the solution to other kinds of heap.
> >> > > >>
> >> > > >> Thanks,
> >> > > >> --------------
> >> > > >> Lin
> >> > > >>> Hi Lin,
> >> > > >>>
> >> > > >>> Adding in
hotspot-gc-dev as they need to see how this interacts with GC
> >> > > >>> worker
threads, and whether it needs to be extended beyond G1.
> >> > > >>>
> >> > > >>> I happened to
spot one nit when browsing:
> >> > > >>>
> >> > > >>>
src/hotspot/share/gc/shared/collectedHeap.hpp
> >> > > >>>
> >> > > >>> + virtual
bool run_par_heap_inspect_task(KlassInfoTable* cit,
> >> > > >>> +
BoolObjectClosure* filter,
> >> > > >>> +
size_t* missed_count,
> >> > > >>> +
size_t thread_num) {
> >> > > >>> + return
NULL;
> >> > > >>>
> >> > > >>> s/NULL/false/
> >> > > >>>
> >> > > >>> Cheers,
> >> > > >>> David
> > > > > >>>
> >> > > >>> On 18/02/2020
2:15 pm, linzang(臧琳) wrote:
> >> > > >>>> Dear All,
> >> > > >>>> May
I ask your help to review the follow changes:
> >> > > >>>>
webrev:
> >> > > >>>>
http://cr.openjdk.java.net/~lzang/jmap-8214535/8215264/webrev_00/
> >> > > >>>> bug:
https://bugs.openjdk.java.net/browse/JDK-8215624
> >> > > >>>> related
CSR: https://bugs.openjdk.java.net/browse/JDK-8239290
> >> > > >>>> This
patch enable parallel heap inspection of G1 for jmap histo.
> >> > > >>>> my
simple test shown it can speed up 2x of jmap -histo with
> >> > > >>>>
parallelThreadNum set to 2 for heap at ~500M on 4-core platform.
> >> > > >>>>
> >> > > >>>>
------------------------------------------------------------------------
> >> > > >>>> BRs,
> >> > > >>>> Lin
> >> > > >> >
> >> > > >
> >
> >
> >
>
>
>
>
>
>