RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump

Langer, Christoph Fri, 21 Feb 2020 08:37:14 -0800

Hi all,

let me share my thoughts after going through this mail thread and interrogating 
Ralf quite a bit about the feature 😉.


First of all, I very much value the discussion and the points brought up here. 
When deciding about the introduction of an enhancement or a new feature, it's 
always wise to thoroughly discuss it and value benefits against maintenance 
cost incurred. However, in this case I'm at a point where I would really like 
to see this going in. Let me elaborate on this.

In the mail cited below, I think Ralf enumerates all the benefits quite 
comprehensively. With the gzip feature built into the heapdumper, we'll get the 
option to easily have the VM dump its heap in a space saving format in the same 
time (or even a bit quicker) than we currently can get fully exploded hprofs. 
There's no need for additional configuration steps and arrangements, just a 
simple additional option in the existing jcmd. And with the slightly updated 
dump format, tool builders will get options to improve handling of compressed 
heap dumps.

Speaking as somebody who has to do customer support once in a while, I can't 
tell you how valuable it is to be able to give the customer simple instructions 
that just work when it comes to directing them to provide diagnosis data. And 
that's clearly a point here. Also, given the loads of different deployment 
scenarios of JVM applications, e.g. cloud, containers, monolith servers... it's 
really good to have simple options.

On the other hand, that's true, the change introduces a bit of additional 
complexity. But, without looking into the new code in all details, I think the 
amount is acceptable. Most of the code really only touches a distinct module 
for dumping the heap (heapdumper.cpp). Some additional 600 lines of code (the 
file already had 2000 before). But the code actually is not messing too deep 
with hotspot internals, so it should be quite maintainable. The rest of the 
code is a few lines about enhancing the dcmd and some additional access points 
into zlib. Furthermore, it brings a bit of testing code, but that is a good 
thing. So, this should really be acceptable - given that Ralf is around to 
support this once it's checked in and there's also the rest of the SAP team 
which will be able to help out here.

The ideas collected in this thread that go beyond this change, e.g. the 
possibility to dump the heap out to the network, the option to get heapdumps 
out to the jcmd and also the potential enhancements to the -XX: 
HeapDumpBeforeFullGC, -XX: HeapDumpAfterFullGC and 
-XX:HeapDumpOnOutOfMemoryError are partly orthogonal and are probably worth 
pursuing on their own.

So I really think we should allow this enhancement in and start focusing on a 
good code review 😊.

Best regards
Christoph
 
> -----Original Message-----
> From: hotspot-runtime-dev <hotspot-runtime-dev-
> boun...@openjdk.java.net> On Behalf Of Schmelter, Ralf
> Sent: Donnerstag, 20. Februar 2020 14:21
> To: Yasumasa Suenaga <suen...@oss.nttdata.com>; Ioi Lam
> <ioi....@oracle.com>; serguei.spit...@oracle.com; hotspot-runtime-
> d...@openjdk.java.net runtime <hotspot-runtime-...@openjdk.java.net>
> Cc: serviceability-dev@openjdk.java.net
> Subject: [CAUTION] RE: RFR(L) 8237354: Add option to jcmd to write a
> gzipped heap dump
> 
> Hi Yasumasa,
> 
> I think it would be great if we could redirect larger chunks data to jcmd.
> 
> But you have to differentiate between binary data (for the heap dump) and
> text data (for the e.g. codelist).
> 
> Currently jcmd assumes all bytes to be UTF-8 encoded, converts them to
> Unicode and then uses the platform encoding to write characters. This is not
> suitable for binary data.
> 
> And of course you cannot use the bufferedStream to get the output to jcmd.
> You would have to implement an outputStream which can directly write to
> the AttachListener connection.
> 
> 
> But even with this change, I would still like the gzip compression to be done
> in the VM. Let me try to list all the advantages I see for doing this:
> 
> 1. It is by far the easiest to use. You just have to specify -gz for the jcmd.
> While your command line (jcmd .... | gzip -c > file) is easy enough, it 
> assumes
> you have gzip (not by default on Windows) and it would be painfully slow (~
> 10 x and more), since it is not parallel. You could use pigz, but it is not as
> ubiquitous as gzip. I know it is sometimes hard to image this could be a
> problem for anyone, but it is.
> 
> It is easy to tell a customer to execute jcmd <pid> GC.heap_dump -gz
> test.hprof.gz. Adding additional requirements, especially if it is external
> programs, and your chance of success diminish fast.
> 
> 
> 2. The -XX:HeapDumpOnOutOfMemoryError, -XX: HeapDumpBeforeFullGC
> and -XX: HeapDumpAfterFullGC options can easily create gzipped heap
> dumps directly when the compression is in the VM. And especially if you
> create more than one dump (with the before/after gc flags), compression is
> very useful. Or if you want to support compressed heap dumps it in the
> HotSpotDiagnosticMXBean. Just add a flag and/or compression level.
> 
> 
> 3. The created gz-file is not a simple gz-file you would get when simply using
> gzip.
> 
>  It is created in a way that makes it possible to treat it like a random 
> access file
> without decompressing it.
> 
> Currently for example the Eclipse Memory Analyzer (MAT) has the option to
> directly open a gzipped hprof file and use it without decompression. And for
> the initial parsing, they can just read the file sequentially, so this is not 
> too
> slow.
> 
> But when accessing the values of objects or arrays, they have to seek to
> specific positions in the gzipped hprof file. This is currently implemented by
> having a Java implementation of a InflaterInputStream which is capable to
> completely copy its state. This copy is then used to start decompressing at
> the specific offset for which is was created. As you can imagine, the state of
> the inflater is not small (MAT assumes about 64Kb, 32kB is needed at least for
> the dictionary), so it limits the number of starting positions you can use for
> large files. But it works for all kinds of gzip compressed streams.
> 
> The gzip implementation used to write the heap dump in the VM creates
> many small gzip compressed chunks. At the start of each chunk you can
> create a fresh GZIPInputStream without having to store any internal state.
> You only need to remember the physical offset and the logical offset (so 2
> long values) for each chunk. If you then want to read data at a specific 
> logical
> offset, you binary search the nearest preceding chunk and create a
> GZIPInputStream reading from the physical offset of that chunk. So on
> average you have to decompress about half a chunk to get to the data you
> need.
> 
> If you look in the in webrev, you can see
> http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.0/test/lib
> /jdk/test/lib/hprof/parser/GzipRandomAccess.java.html. This implements
> the needed logic to treat the gzipped hprof file as a random access file. I 
> have
> used it to add support for gzipped files in the jhat library (which is only 
> used
> in tests). In jhat hat for example, the resolution of references is done via
> random access. And the file also contains all the functionality MAT would
> need.
> 
> You can generate a more or less equivalent file if you use pigz with the --
> independent option. But to make it easier to detect that the gzip file is
> chunked (without decompressing it first), I've added a comment marking it as
> a hprof file with a given chunk size. This would be missing from the pigz 
> file,
> but they instead adding 9 bytes when --independent is specified (00 00 ff ff
> 00 00 00 ff ff), so you could detect it too.
> 
> To summarize, the gzipped hprof file created by the VM makes it much
> easier for tools to access them efficiently at random positions. You can do
> something equivalent with pigz, but not with gzip.
> 
> And getting support for this type of gzipped hprof file by the heap dump
> tools will be much easier, if this is the format the openjdk produces, so it 
> will
> be widespread.
> 
> Best regards,
> Ralf
> 
> -----Original Message-----
> From: Yasumasa Suenaga <suen...@oss.nttdata.com>
> Sent: Donnerstag, 20. Februar 2020 00:59
> To: Ioi Lam <ioi....@oracle.com>; Schmelter, Ralf
> <ralf.schmel...@sap.com>; serguei.spit...@oracle.com; hotspot-runtime-
> d...@openjdk.java.net runtime <hotspot-runtime-...@openjdk.java.net>
> Cc: serviceability-dev@openjdk.java.net
> Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap
> dump
> 
> Hi,
> 
> Generally I agree with Ioi, but I think it is not a problem only for gzipped 
> heap
> dump.
> 
> For example, Compiler.codelist and Compiler.CodeHeap_Analytics might be
> large text.
> In addition, some users want to redirect the result from jcmd to other
> command or log collector.
> 
> So I think it would be better if jcmd provides stdout redurect option to all
> subocmmands. E.g.
> 
>    $ jcmd <PID> GC.heap_dump -stdout | gzip -c - > heapdump.hprof.gz
> 
> 
> Thanks,
> 
> Yasumasa

RE: RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump

Reply via email to