Re: [RFR]8215623: Add incremental dump for jmap histo

臧琳 Fri, 17 May 2019 03:20:27 -0700

Dear Serguei,

在 
2019年5月16日，下午1:39，[email protected]<mailto:[email protected]> 
写道：

On 5/13/19 23:46, 臧琳 wrote:

Dear Serguei,
     Thanks for your comments.

 > > - incremental[:<file_name>], enable the incremental dump of heap, dumped
 > >   data will be saved to, by default it is "IncrementalHisto.dump"
 >
 >  Q1: Should the <file_name> be full path or short name?
 >      Is there any default path? What is the path of the
 > "IncrementalHisto.dump" file?

The original design doesn't have the <file_name> option so the file is 
hardcoded named "IncrementalHisto.dump" and save to the same path as "file=" 
specified. Or print to whatever output stream is if "file=" is not set.

> The file option is described as: file=<file> dump data to <file>
> It does not tell anything about the path.

Yes, do you agree that we add a comment in the help info like:
file=<file> dump data to <file>, file can be specified with full path.

With the new design, I suggest firstly parse <file_name>, if the value contains 
folder path, use the specified path, if not, use same path as "file=" value,  
and if "file=" is not set, use output stream. (The reason I prefer to use same 
path as "file=" is I assume that users prefer to save all data file under the 
same folder.)

> It needs to be clearly specified.
> What statements do you suggest?
>
> One idea of simplification is to get rid of the default <file_name>
> and to require it to be always specified (non-optional).
>
> Then we could replace this:

>   file=<file> dump data to <file>
>   incremental dump support:
>     incremental[:<file_name>]  enable incremental dump, data will be dumped
>                                to <file_name> (default is 
> "IncrementalHisto.dump")

>
> with this:

>   file=<file> dump data to <file>
>   incremental=<inc_file>  dump incremental data to <inc_file>

I think having a default IncrementalHisto.dump file saved at the same path of 
the <file> is a way to make incremental easy to use.
IMHO, when user use jmap -histo with "file=<file>”, and want to enable 
inremental histo, the easiest way is just use "-incremental" flag and all data 
files will be  saved under the same folder of <file>. They don’t have to 
consider  the specific filename for incremental additionally. This is the 
reason I set default value of IncrementalHisto.dump.

But I also want the user to have freedom to use different filename and path for 
incremental results, so I  make it optional for incremental file_name.

If we can make it non-optional, does it mean that user may have following 
command:
       Jmap 
-histo,file=<absoult_path/a/b/c/histo.dump>,incremental:<absoult_path/a/b/c/incrementalHisto.dump>
 pid
It seems a little bit complicated to me, what do you think?

 > > - chunksize=<N>, size of objects (in KB) will be dumped in one chunk.
 >
 > Q2: Should it be chunk of dump, not chunk of objects?

The purpose of "chunksize" is to decide how many objects' info are dumped at 
once. for example use "chunksize=1" on a "Xmx1m", there will be at max 1MB/1KB 
= 1000 chunks, which indicates that there will be 1000 times of file writing 
when do "jmap -histo".

> I hardly understand the point to know max of objects that can be dumped at 
> once.
> It is more important to know how much memory in the file it is going to take.
> How much of dump memory will take one object?
> Does it vary (does it depend on object types)?

Yes, the dump memory for one object varies from size and types.
The option“chunksize” is for user to control the proportion of heap that the 
incremental dump can process at a time.  IMO the use scenario is as following:

   when the JVM have an 180GB max heap size, and jmap histo used with 
chunksize=1g, it means the incremental dump happens when every 1GB heap is 
scaned, so it does’t has too much incremenal dump, because the incremental dump 
takes time and may cause jmap -histo work slower.

PS, I think we should support “g”,“m” and “k” instead of using “KB” , do you 
agree?

 > > - maxfilesize=<N>, size of the incremental data dump file (in KB), when 
 > > data size
 > >   is larger than maxfilesize, the file is erased and latest data will be 
 > > written.

 > Q3: What is a relation and limitations between chunksize and maxfilesize?
 >    Should the maxfilesize be multiple of the chunksize?

> The question Q3 above was not unanswered.
> But never mind. Please, see the suggestion below.

If the chunksize it large, and there are too much objects of different classes 
in heap, the actual filesize can be larger than the maxfilesize.
But I believe this is raraly happened because the size of one class’s histo 
info only takes several bytes in the final result,  and from the implementation 
of jmap,
it can find out all loaded classes before doing heap iteration, so the 
different of result only happens on the object quantity.

 > Q4: The sentence "the file is erased and latest data will be written"
is not clear enough.
 >    Why the whole file needs to be erased
 >    Should the incremental file behave like a cyclic buffer?
 >    If so, then only next chunk needs to be erased.
 >    Then the chunks need to be numbered in order, so the earliest one can be 
 > found.

The "maxfilesize" controls the file size not to be too large, so when the 
dumped data is larger than "maxfilesize", the file is erased and latest data 
are written.The reason I erase whole file is that chunk data is accumulative, 
so the latest data includes the previous statistical ones. And this way may 
make the file easy to read.

I agree that we can add ordered number in chunks, I think it more or less help 
user to get to know how object distributed in heap.

I think maybe it is reasonable to have the incremental file behave like gclog, 
when maxfilesize is reached, the file is renamed with numbered suffix, and new 
file is created to use. so there can be IncrementalHisto.dump.0 and 
IncrementalHisto.dump.1 etc for large heap.

what do you think?

> I think, it is not a bad idea.
> In general, new incremental feature design does not look simple and clear 
> enough.
> It feels like another step of simplification is needed.

> What about to get rid of the maxfilesize option?
> Then each chunk can be recorded to a separate file 
> IncrementalHisto.dump.<chunk_number>.
> A couple of questions to clarify:
      > - Do want all chunks or just the latest chunk to be saved?

I think usually it is not required to save all chunks. The chunk is 
incremental, so the new one contains all info that old ones have.
        But having old chunks may help user to know how object is distributed , 
because one chunk is a fixed proportion of heap, so the different between to a 
chunk and it’s predecessor can tell the object distribution of the  newly 
scanned portion of heap.
        The question is do you think these info is necessary? If not, I agree 
we can get rid of the maxfilesize.

      > - If we save all chunks then what is the point to have the full dump 
recorded as well?

         IMO, the incremental histo solves two problems:  <1> The jmap histo 
may stuck if heap is large, so it is useful if we can get intermediate result. 
<2> incremental info may help user know object distribution of some portion of 
the heap.
         And I agree that if full dump is successfully gotten, the chunks 
became less useful.

> The advantages of this approach is that there is no need to describe:
      > - relationship between chunksize and maxfilesize
      > - recording behavior for multiple chunks in the incremental file
      > - what chunks have been recorded into the incremental

I agree that maxfilesize may not be useful because the histo data of chunks are 
usually not large.  And it sounds good to me that we save chunk data in 
seperate files named IncrementalHisto.dump.<chunk_number>.
So the problem is how much chunks do you think we need to save? I think the 
latest chunk is a must, and maybe the previous 3-5 ones?

> But again, this still needs to be clearly specified.
> It would be nice to reach a consensus on a design first.

Totally agree :)

> Thanks,
> Serguei

Again, Thanks for your comments

BRs,
Lin

Thanks,
Lin
________________________________________
From: [email protected]<mailto:[email protected]> 
<[email protected]><mailto:[email protected]>
Sent: Saturday, May 11, 2019 2:17:41 AM
To: 臧琳; Hohensee, Paul; JC Beyler
Cc: 
[email protected]<mailto:[email protected]>
Subject: Re: [RFR]8215623: Add incremental dump for jmap histo

Dear Lin,

Sorry for the late reply.
I've edited the CSR a little bit to fix some incorrect spots.
Now, a couple of spots are not clear to me.

 > - incremental[:<file_name>], enable the incremental dump of heap, dumped
 >   data will be saved to, by default it is "IncrementalHisto.dump"

  Q1: Should the <file_name> be full path or short name?
      Is there any default path? What is the path of the
"IncrementalHisto.dump" file?

 > - chunksize=<N>, size of objects (in KB) will be dumped in one chunk.

  Q2: Should it be chunk of dump, not chunk of objects?

 > - maxfilesize=<N>, size of the incremental data dump file (in KB),
when data size
 >   is larger than maxfilesize, the file is erased and latest data will
be written.

  Q3: What is a relation and limitations between chunksize and maxfilesize?
      Should the maxfilesize be multiple of the chunksize?

  Q4: The sentence "the file is erased and latest data will be written"
is not clear enough.
      Why the whole file needs to be erased
      Should the incremental file behave like a cyclic buffer?
      If so, then only next chunk needs to be erased.
      Then the chunks need to be numbered in order, so the earliest one
can be found.
      (I do not want you to accept my suggestions right away. It is just
a discussion point.
       You need to prove that your approach is good and clean enough.)

If we resolve the questions (or get into agreement) then I'll update the
CSR as needed.

Thanks,
Serguei

On 5/5/19 00:34, 臧琳 wrote:

Dear All,
      I have updated the CSR at https://bugs.openjdk.java.net/browse/JDK-8222319
      May I ask your help to review it?
      When it is finalized, I will refine the webrev.

BRs,
Lin

Dear Serguei，
          Thanks a lot for your reviewing.

   System.err.println("      incremental dump support:");
+        System.err.println("        chunkcount=<N>    object number counted 
(in Kilo) to trigger incremental dump");
+        System.err.println("        maxfilesize=<N>   size limit of 
incremental dump file (in KB)");

 From this description is not clear at all what does the chunkcount mean.
Is it to define how many heap objects are dumped in one chunk?
If so, would it better to name it chunksize instead where chunksize is measured 
in heap objects?
Then would it better to use the same units to define the maxfilesize as well?
(I'm not insisting on this, just asking.)

The original meaning of  “chunkcount"  is how many objects are dumped in one 
chunk, and the “maxfilesize” is the limited size of the dump file.
For example, “chunkcount=1, maxfilesize=10” means that intermediated data will 
be written to the dump file for every 1000 objects, and
when the dump file is larger than 10k，erase the file and rewrite it with the 
latest dumped data.

The reason I didn’t use object count to control the dump file size is that 
there can be humongous object, which may cause the file too large.
Do you think use object size instead of chunkcount is a good option? So the two 
options can be with same units.
BRs,
Lin

Re: [RFR]8215623: Add incremental dump for jmap histo

Reply via email to