Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump

Laurence Cable Sat, 22 Feb 2020 08:21:39 -0800



On 2/21/20 5:19 PM, Ioi Lam wrote:

Ralf and Christoph,
I agree that making it easy for the user is important, so dependencyon an external program like pgzip will be a hassle.
How about implementing the compression in a Java program? Willsomething like this be too much of a hassle?
    jcmd $PID GC.dump -stdout | java -jar HeapDumpZipper.jar > heap.gz

we could integrate the compression into cmd itself?

This way, we can implement the exact compression algorithm as Ralfdescribed, without making it part of the VM. Writing it in Javaprobably would be easier to maintain.
If it makes sense, we can include the Java code as part of the JDK, sothere's no need to ship a separate JAR file to the user.
jcmd $PID GC.dump -stdout | java jdk.internal.heapdump.Zipper >heap.gz
Thanks
- Ioi

On 2/21/20 8:35 AM, Langer, Christoph wrote:
Hi all,
let me share my thoughts after going through this mail thread andinterrogating Ralf quite a bit about the feature 😉.
First of all, I very much value the discussion and the points broughtup here. When deciding about the introduction of an enhancement or anew feature, it's always wise to thoroughly discuss it and valuebenefits against maintenance cost incurred. However, in this case I'mat a point where I would really like to see this going in. Let meelaborate on this.
In the mail cited below, I think Ralf enumerates all the benefitsquite comprehensively. With the gzip feature built into theheapdumper, we'll get the option to easily have the VM dump its heapin a space saving format in the same time (or even a bit quicker)than we currently can get fully exploded hprofs. There's no need foradditional configuration steps and arrangements, just a simpleadditional option in the existing jcmd. And with the slightly updateddump format, tool builders will get options to improve handling ofcompressed heap dumps.
Speaking as somebody who has to do customer support once in a while,I can't tell you how valuable it is to be able to give the customersimple instructions that just work when it comes to directing them toprovide diagnosis data. And that's clearly a point here. Also, giventhe loads of different deployment scenarios of JVM applications, e.g.cloud, containers, monolith servers... it's really good to havesimple options.
On the other hand, that's true, the change introduces a bit ofadditional complexity. But, without looking into the new code in alldetails, I think the amount is acceptable. Most of the code reallyonly touches a distinct module for dumping the heap (heapdumper.cpp).Some additional 600 lines of code (the file already had 2000 before).But the code actually is not messing too deep with hotspot internals,so it should be quite maintainable. The rest of the code is a fewlines about enhancing the dcmd and some additional access points intozlib. Furthermore, it brings a bit of testing code, but that is agood thing. So, this should really be acceptable - given that Ralf isaround to support this once it's checked in and there's also the restof the SAP team which will be able to help out here.
The ideas collected in this thread that go beyond this change, e.g.the possibility to dump the heap out to the network, the option toget heapdumps out to the jcmd and also the potential enhancements tothe -XX: HeapDumpBeforeFullGC, -XX: HeapDumpAfterFullGC and-XX:HeapDumpOnOutOfMemoryError are partly orthogonal and are probablyworth pursuing on their own.
So I really think we should allow this enhancement in and startfocusing on a good code review 😊.
Best regards
Christoph
-----Original Message-----
From: hotspot-runtime-dev <hotspot-runtime-dev-
[email protected]> On Behalf Of Schmelter, Ralf
Sent: Donnerstag, 20. Februar 2020 14:21
To: Yasumasa Suenaga <[email protected]>; Ioi Lam
<[email protected]>; [email protected]; hotspot-runtime-
[email protected] runtime <[email protected]>
Cc: [email protected]
Subject: [CAUTION] RE: RFR(L) 8237354: Add option to jcmd to write a
gzipped heap dump

Hi Yasumasa,
I think it would be great if we could redirect larger chunks data tojcmd.
But you have to differentiate between binary data (for the heapdump) and
text data (for the e.g. codelist).

Currently jcmd assumes all bytes to be UTF-8 encoded, converts them to
Unicode and then uses the platform encoding to write characters.This is not
suitable for binary data.
And of course you cannot use the bufferedStream to get the output tojcmd.
You would have to implement an outputStream which can directly write to
the AttachListener connection.
But even with this change, I would still like the gzip compressionto be done
in the VM. Let me try to list all the advantages I see for doing this:
1. It is by far the easiest to use. You just have to specify -gz forthe jcmd.While your command line (jcmd .... | gzip -c > file) is easy enough,it assumesyou have gzip (not by default on Windows) and it would be painfullyslow (~10 x and more), since it is not parallel. You could use pigz, but itis not asubiquitous as gzip. I know it is sometimes hard to image this couldbe a
problem for anyone, but it is.

It is easy to tell a customer to execute jcmd <pid> GC.heap_dump -gz
test.hprof.gz. Adding additional requirements, especially if it isexternal
programs, and your chance of success diminish fast.


2. The -XX:HeapDumpOnOutOfMemoryError, -XX: HeapDumpBeforeFullGC
and -XX: HeapDumpAfterFullGC options can easily create gzipped heap
dumps directly when the compression is in the VM. And especially if you
create more than one dump (with the before/after gc flags),compression is
very useful. Or if you want to support compressed heap dumps it in the
HotSpotDiagnosticMXBean. Just add a flag and/or compression level.
3. The created gz-file is not a simple gz-file you would get whensimply using
gzip.
It is created in a way that makes it possible to treat it like arandom access file
without decompressing it.
Currently for example the Eclipse Memory Analyzer (MAT) has theoption todirectly open a gzipped hprof file and use it without decompression.And forthe initial parsing, they can just read the file sequentially, sothis is not too
slow.
But when accessing the values of objects or arrays, they have toseek tospecific positions in the gzipped hprof file. This is currentlyimplemented byhaving a Java implementation of a InflaterInputStream which iscapable tocompletely copy its state. This copy is then used to startdecompressing atthe specific offset for which is was created. As you can imagine,the state ofthe inflater is not small (MAT assumes about 64Kb, 32kB is needed atleast forthe dictionary), so it limits the number of starting positions youcan use for
large files. But it works for all kinds of gzip compressed streams.

The gzip implementation used to write the heap dump in the VM creates
many small gzip compressed chunks. At the start of each chunk you can
create a fresh GZIPInputStream without having to store any internalstate.You only need to remember the physical offset and the logical offset(so 2long values) for each chunk. If you then want to read data at aspecific logical
offset, you binary search the nearest preceding chunk and create a
GZIPInputStream reading from the physical offset of that chunk. So on
average you have to decompress about half a chunk to get to the datayou
need.

If you look in the in webrev, you can see
http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.0/test/lib
/jdk/test/lib/hprof/parser/GzipRandomAccess.java.html. This implements
the needed logic to treat the gzipped hprof file as a random accessfile. I haveused it to add support for gzipped files in the jhat library (whichis only usedin tests). In jhat hat for example, the resolution of references isdone viarandom access. And the file also contains all the functionality MATwould
need.
You can generate a more or less equivalent file if you use pigz withthe --independent option. But to make it easier to detect that the gzipfile ischunked (without decompressing it first), I've added a commentmarking it asa hprof file with a given chunk size. This would be missing from thepigz file,but they instead adding 9 bytes when --independent is specified (0000 ff ff
00 00 00 ff ff), so you could detect it too.

To summarize, the gzipped hprof file created by the VM makes it much
easier for tools to access them efficiently at random positions. Youcan do
something equivalent with pigz, but not with gzip.
And getting support for this type of gzipped hprof file by the heapdumptools will be much easier, if this is the format the openjdkproduces, so it will
be widespread.

Best regards,
Ralf

-----Original Message-----
From: Yasumasa Suenaga <[email protected]>
Sent: Donnerstag, 20. Februar 2020 00:59
To: Ioi Lam <[email protected]>; Schmelter, Ralf
<[email protected]>; [email protected]; hotspot-runtime-
[email protected] runtime <[email protected]>
Cc: [email protected]
Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap
dump

Hi,
Generally I agree with Ioi, but I think it is not a problem only forgzipped heap
dump.

For example, Compiler.codelist and Compiler.CodeHeap_Analytics might be
large text.
In addition, some users want to redirect the result from jcmd to other
command or log collector.
So I think it would be better if jcmd provides stdout redurectoption to all
subocmmands. E.g.

    $ jcmd <PID> GC.heap_dump -stdout | gzip -c - > heapdump.hprof.gz


Thanks,

Yasumasa

Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump

Reply via email to