Dear Lin,
Before I go to the details below could you, please, explain why do
we need a separate file for incremental dump.
Should we just record the full dump into file=<file>
incrementally?
Recording into the full dump file just has to be always
incremental.
It can be done by chunks if necessary, so I'm open to consider
introducing new chunksize option.
Do I miss anything here?
Thanks,
Serguei
On 5/17/19 03:18, 臧琳 wrote:
Dear Serguei,
On 5/13/19 23:46, 臧琳 wrote:
Dear Serguei,
Thanks for your comments.
> > - incremental[:<file_name>], enable the incremental dump of heap, dumped
> > data will be saved to, by default it is "IncrementalHisto.dump"
>
> Q1: Should the <file_name> be full path or short name?
> Is there any default path? What is the path of the
> "IncrementalHisto.dump" file?
The original design doesn't have the <file_name> option so the file is hardcoded named "IncrementalHisto.dump" and save to the same path as "file=" specified. Or print to whatever output stream is if "file=" is not set.
> The file option is described as:
file=<file> dump data to <file>
> It does not tell anything about the path.
Yes, do you agree that we add a comment in the help info like:
file=<file> dump data to <file>, file can be
specified with full path.
With the new design, I suggest firstly parse <file_name>, if the value contains folder path, use the specified path, if not, use same path as "file=" value, and if "file=" is not set, use output stream. (The reason I prefer to use same path as "file=" is I assume that users prefer to save all data file under the same folder.)
> It needs to be clearly specified.
> What statements do you suggest?
>
> One idea of simplification is to get rid of the
default <file_name>
> and to require it to be always specified
(non-optional).
>
> Then we could replace this:
> file=<file> dump data to <file>
> incremental dump support:
> incremental[:<file_name>] enable incremental dump, data will be dumped
> to <file_name> (default is "IncrementalHisto.dump")
>
> with this:
> file=<file> dump data to <file>
> incremental=<inc_file> dump incremental data to <inc_file>
I think having a default IncrementalHisto.dump file saved
at the same path of the <file> is a way to make
incremental easy to use.
IMHO, when user use jmap -histo with "file=<file>”,
and want to enable inremental histo, the easiest way is just
use "-incremental" flag and all data files will be saved
under the same folder of <file>. They don’t have to
consider the specific filename for incremental additionally.
This is the reason I set default value of
IncrementalHisto.dump.
But I also want the user to have freedom to use different
filename and path for incremental results, so I make it
optional for incremental file_name.
If we can make it non-optional, does it mean that user may
have following command:
Jmap
-histo,file=<absoult_path/a/b/c/histo.dump>,incremental:<absoult_path/a/b/c/incrementalHisto.dump>
pid
It seems a little bit complicated to me, what do you
think?
> > - chunksize=<N>, size of objects (in KB) will be dumped in one chunk.
>
> Q2: Should it be chunk of dump, not chunk of objects?
The purpose of "chunksize" is to decide how many objects' info are dumped at once. for example use "chunksize=1" on a "Xmx1m", there will be at max 1MB/1KB = 1000 chunks, which indicates that there will be 1000 times of file writing when do "jmap -histo".
> I hardly understand the point to know max of objects
that can be dumped at once.
> It is more important to know how much memory in the
file it is going to take.
> How much of dump memory will take one object?
> Does it vary (does it depend on object types)?
Yes, the dump memory for one object varies from size and types.
The option“chunksize” is for user to control the proportion
of heap that the incremental dump can process at a time. IMO
the use scenario is as following:
when the JVM have an 180GB max heap size, and jmap histo
used with chunksize=1g, it means the incremental dump happens
when every 1GB heap is scaned, so it does’t has too much
incremenal dump, because the incremental dump takes time and may
cause jmap -histo work slower.
PS, I think we should support “g”,“m” and “k” instead of
using “KB” , do you agree?
> > - maxfilesize=<N>, size of the incremental data dump file (in KB), when data size
> > is larger than maxfilesize, the file is erased and latest data will be written.
> Q3: What is a relation and limitations between chunksize and maxfilesize?
> Should the maxfilesize be multiple of the chunksize?
> The question Q3 above was not unanswered.
> But never mind. Please, see the suggestion below.
If the chunksize it large, and there are too much objects
of different classes in heap, the actual filesize can be
larger than the maxfilesize.
But I believe this is raraly happened because the size of
one class’s histo info only takes several bytes in the final
result, and from the implementation of jmap,
it can find out all loaded classes before doing heap
iteration, so the different of result only happens on the
object quantity.
> Q4: The sentence "the file is erased and latest data will be written"
is not clear enough.
> Why the whole file needs to be erased
> Should the incremental file behave like a cyclic buffer?
> If so, then only next chunk needs to be erased.
> Then the chunks need to be numbered in order, so the earliest one can be found.
The "maxfilesize" controls the file size not to be too large, so when the dumped data is larger than "maxfilesize", the file is erased and latest data are written.The reason I erase whole file is that chunk data is accumulative, so the latest data includes the previous statistical ones. And this way may make the file easy to read.
I agree that we can add ordered number in chunks, I think it more or less help user to get to know how object distributed in heap.
I think maybe it is reasonable to have the incremental file behave like gclog, when maxfilesize is reached, the file is renamed with numbered suffix, and new file is created to use. so there can be IncrementalHisto.dump.0 and IncrementalHisto.dump.1 etc for large heap.
what do you think?
> I think, it is not a bad idea.
> In general, new incremental feature design does not
look simple and clear enough.
> It feels like another step of simplification is
needed.
> What about to get rid of the maxfilesize option?
> Then each chunk can be recorded to a separate file
IncrementalHisto.dump.<chunk_number>.
> A couple of questions to clarify:
> - Do want all chunks or just the latest chunk
to be saved?
I think usually it is
not required to save all chunks. The chunk is incremental, so
the new one contains all info that old ones have.
But having old chunks may help user to know how
object is distributed , because one chunk is a fixed
proportion of heap, so the different between to a chunk and
it’s predecessor can tell the object distribution of the
newly scanned portion of heap.
The question is do you think these info is
necessary? If not, I agree we can get rid of the maxfilesize.
> - If we save all chunks then what
is the point to have the full dump recorded as well?
IMO, the incremental histo solves two problems:
<1> The jmap histo may stuck if heap is large, so it is
useful if we can get intermediate result. <2>
incremental info may help user know object distribution of
some portion of the heap.
And I agree that if full dump is successfully
gotten, the chunks became less useful.
> The advantages of this approach is that there is no
need to describe:
> - relationship between chunksize and
maxfilesize
> - recording behavior for multiple chunks in the
incremental file
> - what chunks have been recorded into the
incremental
I agree that maxfilesize may not be useful because the
histo data of chunks are usually not large. And it sounds
good to me that we save chunk data in seperate files named
IncrementalHisto.dump.<chunk_number>.
So the problem is how much chunks do you think we need to
save? I think the latest chunk is a must, and maybe the
previous 3-5 ones?
> But again, this still needs to be clearly
specified.
> It would be nice to reach a consensus on a design
first.
Totally agree :)
Again, Thanks for your comments
BRs,
Lin
Thanks,
Lin
________________________________________
From: serguei.spit...@oracle.com <serguei.spit...@oracle.com>
Sent: Saturday, May 11, 2019 2:17:41 AM
To: 臧琳; Hohensee, Paul; JC Beyler
Cc: serviceability-dev@openjdk.java.net
Subject: Re: [RFR]8215623: Add incremental dump for jmap histo
Dear Lin,
Sorry for the late reply.
I've edited the CSR a little bit to fix some incorrect spots.
Now, a couple of spots are not clear to me.
> - incremental[:<file_name>], enable the incremental dump of heap, dumped
> data will be saved to, by default it is "IncrementalHisto.dump"
Q1: Should the <file_name> be full path or short name?
Is there any default path? What is the path of the
"IncrementalHisto.dump" file?
> - chunksize=<N>, size of objects (in KB) will be dumped in one chunk.
Q2: Should it be chunk of dump, not chunk of objects?
> - maxfilesize=<N>, size of the incremental data dump file (in KB),
when data size
> is larger than maxfilesize, the file is erased and latest data will
be written.
Q3: What is a relation and limitations between chunksize and maxfilesize?
Should the maxfilesize be multiple of the chunksize?
Q4: The sentence "the file is erased and latest data will be written"
is not clear enough.
Why the whole file needs to be erased
Should the incremental file behave like a cyclic buffer?
If so, then only next chunk needs to be erased.
Then the chunks need to be numbered in order, so the earliest one
can be found.
(I do not want you to accept my suggestions right away. It is just
a discussion point.
You need to prove that your approach is good and clean enough.)
If we resolve the questions (or get into agreement) then I'll update the
CSR as needed.
Thanks,
Serguei
On 5/5/19 00:34, 臧琳 wrote:
Dear All,
I have updated the CSR at https://bugs.openjdk.java.net/browse/JDK-8222319
May I ask your help to review it?
When it is finalized, I will refine the webrev.
BRs,
Lin
Dear Serguei,
Thanks a lot for your reviewing.
System.err.println(" incremental dump support:");
+ System.err.println(" chunkcount=<N> object number counted (in Kilo) to trigger incremental dump");
+ System.err.println(" maxfilesize=<N> size limit of incremental dump file (in KB)");
From this description is not clear at all what does the chunkcount mean.
Is it to define how many heap objects are dumped in one chunk?
If so, would it better to name it chunksize instead where chunksize is measured in heap objects?
Then would it better to use the same units to define the maxfilesize as well?
(I'm not insisting on this, just asking.)
The original meaning of “chunkcount" is how many objects are dumped in one chunk, and the “maxfilesize” is the limited size of the dump file.
For example, “chunkcount=1, maxfilesize=10” means that intermediated data will be written to the dump file for every 1000 objects, and
when the dump file is larger than 10k,erase the file and rewrite it with the latest dumped data.
The reason I didn’t use object count to control the dump file size is that there can be humongous object, which may cause the file too large.
Do you think use object size instead of chunkcount is a good option? So the two options can be with same units.
BRs,
Lin
|