-----Original Message-----
From: hotspot-runtime-dev <hotspot-runtime-dev-
boun...@openjdk.java.net> On Behalf Of Schmelter, Ralf
Sent: Donnerstag, 20. Februar 2020 14:21
To: Yasumasa Suenaga <suen...@oss.nttdata.com>; Ioi Lam
<ioi....@oracle.com>; serguei.spit...@oracle.com; hotspot-runtime-
d...@openjdk.java.net runtime <hotspot-runtime-...@openjdk.java.net>
Cc: serviceability-dev@openjdk.java.net
Subject: [CAUTION] RE: RFR(L) 8237354: Add option to jcmd to write a
gzipped heap dump
Hi Yasumasa,
I think it would be great if we could redirect larger chunks data to
jcmd.
But you have to differentiate between binary data (for the heap
dump) and
text data (for the e.g. codelist).
Currently jcmd assumes all bytes to be UTF-8 encoded, converts them to
Unicode and then uses the platform encoding to write characters.
This is not
suitable for binary data.
And of course you cannot use the bufferedStream to get the output to
jcmd.
You would have to implement an outputStream which can directly write to
the AttachListener connection.
But even with this change, I would still like the gzip compression
to be done
in the VM. Let me try to list all the advantages I see for doing this:
1. It is by far the easiest to use. You just have to specify -gz for
the jcmd.
While your command line (jcmd .... | gzip -c > file) is easy enough,
it assumes
you have gzip (not by default on Windows) and it would be painfully
slow (~
10 x and more), since it is not parallel. You could use pigz, but it
is not as
ubiquitous as gzip. I know it is sometimes hard to image this could
be a
problem for anyone, but it is.
It is easy to tell a customer to execute jcmd <pid> GC.heap_dump -gz
test.hprof.gz. Adding additional requirements, especially if it is
external
programs, and your chance of success diminish fast.
2. The -XX:HeapDumpOnOutOfMemoryError, -XX: HeapDumpBeforeFullGC
and -XX: HeapDumpAfterFullGC options can easily create gzipped heap
dumps directly when the compression is in the VM. And especially if you
create more than one dump (with the before/after gc flags),
compression is
very useful. Or if you want to support compressed heap dumps it in the
HotSpotDiagnosticMXBean. Just add a flag and/or compression level.
3. The created gz-file is not a simple gz-file you would get when
simply using
gzip.
It is created in a way that makes it possible to treat it like a
random access file
without decompressing it.
Currently for example the Eclipse Memory Analyzer (MAT) has the
option to
directly open a gzipped hprof file and use it without decompression.
And for
the initial parsing, they can just read the file sequentially, so
this is not too
slow.
But when accessing the values of objects or arrays, they have to
seek to
specific positions in the gzipped hprof file. This is currently
implemented by
having a Java implementation of a InflaterInputStream which is
capable to
completely copy its state. This copy is then used to start
decompressing at
the specific offset for which is was created. As you can imagine,
the state of
the inflater is not small (MAT assumes about 64Kb, 32kB is needed at
least for
the dictionary), so it limits the number of starting positions you
can use for
large files. But it works for all kinds of gzip compressed streams.
The gzip implementation used to write the heap dump in the VM creates
many small gzip compressed chunks. At the start of each chunk you can
create a fresh GZIPInputStream without having to store any internal
state.
You only need to remember the physical offset and the logical offset
(so 2
long values) for each chunk. If you then want to read data at a
specific logical
offset, you binary search the nearest preceding chunk and create a
GZIPInputStream reading from the physical offset of that chunk. So on
average you have to decompress about half a chunk to get to the data
you
need.
If you look in the in webrev, you can see
http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.0/test/lib
/jdk/test/lib/hprof/parser/GzipRandomAccess.java.html. This implements
the needed logic to treat the gzipped hprof file as a random access
file. I have
used it to add support for gzipped files in the jhat library (which
is only used
in tests). In jhat hat for example, the resolution of references is
done via
random access. And the file also contains all the functionality MAT
would
need.
You can generate a more or less equivalent file if you use pigz with
the --
independent option. But to make it easier to detect that the gzip
file is
chunked (without decompressing it first), I've added a comment
marking it as
a hprof file with a given chunk size. This would be missing from the
pigz file,
but they instead adding 9 bytes when --independent is specified (00
00 ff ff
00 00 00 ff ff), so you could detect it too.
To summarize, the gzipped hprof file created by the VM makes it much
easier for tools to access them efficiently at random positions. You
can do
something equivalent with pigz, but not with gzip.
And getting support for this type of gzipped hprof file by the heap
dump
tools will be much easier, if this is the format the openjdk
produces, so it will
be widespread.
Best regards,
Ralf
-----Original Message-----
From: Yasumasa Suenaga <suen...@oss.nttdata.com>
Sent: Donnerstag, 20. Februar 2020 00:59
To: Ioi Lam <ioi....@oracle.com>; Schmelter, Ralf
<ralf.schmel...@sap.com>; serguei.spit...@oracle.com; hotspot-runtime-
d...@openjdk.java.net runtime <hotspot-runtime-...@openjdk.java.net>
Cc: serviceability-dev@openjdk.java.net
Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap
dump
Hi,
Generally I agree with Ioi, but I think it is not a problem only for
gzipped heap
dump.
For example, Compiler.codelist and Compiler.CodeHeap_Analytics might be
large text.
In addition, some users want to redirect the result from jcmd to other
command or log collector.
So I think it would be better if jcmd provides stdout redurect
option to all
subocmmands. E.g.
$ jcmd <PID> GC.heap_dump -stdout | gzip -c - > heapdump.hprof.gz
Thanks,
Yasumasa