On Fri, 19 Mar 2021 20:01:01 GMT, Chris Plummer <cjplum...@openjdk.org> wrote:

>> Dear Ralf and Chris,
>> 
>>> to be honest I would prefer if we would not add the parallel option but 
>>> instead do it automatically depending on the number of GC threads 
>>> available. I doubt that the user has a good idea about number of threads to 
>>> use, so I think it is better to just use a sensible default derived from 
>>> the number of threads available and if compression is enabled or not.
>> 
>> Thanks Ralf for your comments!
>> 
>> @plummercj and I have discussed the `parallel=` option several rounds, also 
>> together with Serguei @sspitsyn  and Ham-Lin @Hamlin-Li . And I recently 
>> also re-considered this option for dumping. here I list my opinion and hope 
>> we could made the final decision this time :-)
>> 
>> - The `parallel=<N>` option for `jmap -histo` has been discussed and 
>> finalized, I think we could leave it as it is. Because user may get general 
>> info about their system, such as core number and multi-process workload, 
>> these could help them tuning.  And also because the `parallel=<N>` option 
>> only affects how many threads would be used for object iteration (this is 
>> different with `jmap -dump`, I will explain in following comments), it is 
>> easy to observe the output or time and then adjust the number.
>> 
>> - The `parallel=<N>` option for `jmap -dump` is complicated, it has to 
>> consider the file writer threads and the disk writing speed. let me try to 
>> explain the implementation briefly:
>> 
>> In current implementation, the heap dump actaully has two kind of worker 
>> threads - dumper and writter. The dumper iterates over the heap and send 
>> data to the writer; the writer does compression if required and writes data 
>> to the file. There is buffer queue between dumper and writer: dumper grab 
>> empty buffer and fill data into the queue , and the writer threads grab the 
>> filled buffer from the queue, write data to file and then return empty 
>> buffer for futurn use. 
>> In summary it looks like following graph:
>> 
>> ` Dumper  --> BufferQueue (size == #writers) --> writer(multiple) ==> 
>> disk(file)`
>> 
>> Before this patch, there is only one dumper (the VMThread) and all other 
>> worker threads are writer, that is, all GC threads work as writer. The only 
>> dumper iterate heap and the multiple writers write data to file. So the 
>> bottomneck is more likely to be heap iteration (dumper side).
>> 
>> With this patch,  there are multiple dumpers and multiple writers,  they all 
>> divided the gc thread number (plus 1 vmthread). And since there are multiple 
>> dumpers, the buffer queue is more likely to be filled.  So the bottomneck 
>> move from heap iteration to two aspect (writer side):
>> - drain of buffer queue 
>> - speed of disk writing
>> 
>> Since the buffer queue size is determined by number of writers, and the disk 
>> writing speed determines how fast the buffer can be empty again, it is 
>> important to keep the writer number as much as it can be to satisfy the 
>> dumper's requirement.
>> 
>> The `parallel=<N>` option only control the dumper thread number, and it 
>> indirectly affects the writer's thread number (as they all divide the total 
>> thread number). it is really hard for user to understand these internal 
>> mechanism and do tuning.  
>> 
>> And by default, the number of dumpers and writers are half : half of GC 
>> threads +1(vmThread), my testing shown this configuration is likly to be 
>> optimal. ( But maybe more evalution is required to confirm)
>> 
>> In summary, I think it is very hard for user to tune the parallel thread 
>> number for heap dump. So I suggest to remove it for dump and enable the 
>> parallel by default. 
>> 
>> Again, For `jmap -histo`, I suggest to keep this since it is more easy to be 
>> understood.
>> 
>> IMO, to make final decision, we need to consider following:
>> - Is it acceptable that `jmap -dump` and `jmap -histo` have different 
>> behavior about the option of `parallel`, that is, `heapdump` does not use it 
>> and `heaphisto` does.
>> - If we decide to not introduce the `parallel` option, we don't need to 
>> introduce the `heapdumpext` attach api command, so the 2 CSRs related to 
>> this PR could be closed. But this may leave the argument count problem to 
>> future, when more argumebts is introduced, is this acceptable? 
>> - if we decide to introduce `parallel=<N>` option, do we need to add 
>> explaination in the document? or docs like tuning guide? 
>> 
>> Welcome for any of your comments!
>> 
>> BRs,
>> Lin
>
>> * Is it acceptable that `jmap -dump` and `jmap -histo` have different 
>> behavior about the option of `parallel`, that is, `heapdump` does not use it 
>> and `heaphisto` does.
> 
> And just to further clarify, the parallel support for `jmap -histo` is 
> already in JDK 16, which is one reason we were swayed towards including it 
> with `jmap -dump`. If we didn't we were thinking in order to be consistent we 
> would need to pull the parallel support for `jmap -histo`, which would be 
> somewhat problematic since it will already have shipped in 16. However, given 
> that these are really two different commands, and we have some good arguments 
> for not including parallel support for `jmap -dump`, I can see leaving it out.
> 
>> * If we decide to not introduce the `parallel` option, we don't need to 
>> introduce the `heapdumpext` attach api command, so the 2 CSRs related to 
>> this PR could be closed. But this may leave the argument count problem to 
>> future, when more argumebts is introduced, is this acceptable?
> 
> Yes, and in fact I would argue that we should not be doing the `heapdumpext` 
> solution now if we no longer have a need for it.
> 
>> * if we decide to introduce `parallel=<N>` option, do we need to add 
>> explaination in the document? or docs like tuning guide?
> 
> Which document were you thinking? Is the turning guide a hotspot tuning 
> guide? If so, I don't think that's the right place for it. The focus should 
> be in any jmap documentation.
> 
> And to add a 4th bullet item to your list, we also have the `GC.heap_dump` 
> jcmd. I think there as a CR and CSR to add parallel support to it (but 
> honestly I've lost track). If we choose not to include parallel support for 
> `jmap -dump`, I think we should exclude it from `GC.heap_dump` also.

Hi Chris, 
Thanks for your comments!

So I think we agree on keep `parallel=N` option for hitogram to keep it as 
JDK16 and not use it for heapdump. I will close related CSR and revert related 
codes.

> Yes, and in fact I would argue that we should not be doing the heapdumpext 
> solution now if we no longer have a need for it.

Agree, at least I don't have any idea for adding new features that require new 
options

> Which document were you thinking? Is the turning guide a hotspot tuning 
> guide? If so, I don't think that's the right place for it. The focus should 
> be in any jmap documentation.

I was thinking of the tuning guide. but it seems even there is explaination of 
the parallel implementation of heap dump, it is hard to do tuning...

> And to add a 4th bullet item to your list, we also have the `GC.heap_dump` 
> jcmd. I think there as a CR and CSR to add parallel support to it (but 
> honestly I've lost track). If we choose not to include parallel support for 
> `jmap -dump`, I think we should exclude it from `GC.heap_dump` also.

There are CSRs created for the parallel option for GC.class_histogram, the one 
for GC.heap_dump is not created yet (or I cannot find one)

Thanks,
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/2261

Reply via email to