Re: RFR: 8252842: Extend jmap to support parallel heap dump [v10]

Lin Zang Thu, 18 Mar 2021 07:44:32 -0700

On Thu, 18 Mar 2021 12:20:25 GMT, Ralf Schmelter <rschmel...@openjdk.org> wrote:


>> Dear Chris and Ralf,
>> May I ask your help to review the related CSR first? both JDK-8260424  and 
>> JDK-8261943? 
>> https://bugs.openjdk.java.net/browse/JDK-8260424 is for adding 
>> `parallel=<N>` option, which IMO also block the merge of #2379. 
>> https://bugs.openjdk.java.net/browse/JDK-8261943 is for adding new attach 
>> api comment `dumpheapext`
>> 
>> Thanks
>> Lin
>
> Hi Lin,
> 
> to be honest I would prefer if we would not add the parallel option but 
> instead do it automatically depending on the number of GC threads available. 
> I doubt that the user has a good idea about number of threads to use, so I 
> think it is better to just use a sensible default derived from the number of 
> threads available and if compression is enabled or not.

Dear Ralf and Chris,

> to be honest I would prefer if we would not add the parallel option but 
> instead do it automatically depending on the number of GC threads available. 
> I doubt that the user has a good idea about number of threads to use, so I 
> think it is better to just use a sensible default derived from the number of 
> threads available and if compression is enabled or not.

Thanks Ralf for your comments!

@plummercj and I have discussed the `parallel=` option several rounds, also 
together with Serguei @sspitsyn  and Ham-Lin @Hamlin-Li . And I recently also 
re-considered this option for dumping. here I list my opinion and hope we could 
made the final decision this time :-)

- The `parallel=<N>` option for `jmap -histo` has been discussed and finalized, 
I think we could leave it as it is. Because user may get general info about 
their system, such as core number and multi-process workload, these could help 
them tuning.  And also because the `parallel=<N>` option only affects how many 
threads would be used for object iteration (this is different with `jmap 
-dump`, I will explain in following comments), it is easy to observe the output 
or time and then adjust the number.

- The `parallel=<N>` option for `jmap -dump` is complicated, it has to consider 
the file writer threads and the disk writing speed. let me try to explain the 
implementation briefly:

In current implementation, the heap dump actaully has two kind of worker 
threads - dumper and writter. The dumper iterates over the heap and send data 
to the writer; the writer does compression if required and writes data to the 
file. There is buffer queue between dumper and writer: dumper grab empty buffer 
and fill data into the queue , and the writer threads grab the filled buffer 
from the queue, write data to file and then return empty buffer for futurn use. 
In summary it looks like following graph:

` Dumper  --> BufferQueue (size == #writers) --> writer(multiple) ==> 
disk(file)`

Before this patch, there is only one dumper (the VMThread) and all other worker 
threads are writer, that is, all GC threads work as writer. The only dumper 
iterate heap and the multiple writers write data to file. So the bottomneck is 
more likely to be heap iteration (dumper side).

With this patch,  there are multiple dumpers and multiple writers,  they all 
divided the gc thread number (plus 1 vmthread). And since there are multiple 
dumpers, the buffer queue is more likely to be filled.  So the bottomneck move 
from heap iteration to two aspect (writer side):
- drain of buffer queue 
- speed of disk writing

Since the buffer queue size is determined by number of writers, and the disk 
writing speed determines how fast the buffer can be empty again, it is 
important to keep the writer number as much as it can be to satisfy the 
dumper's requirement.

The `parallel=<N>` option only control the dumper thread number, and it 
indirectly affects the writer's thread number (as they all divide the total 
thread number). it is really hard for user to understand these internal 
mechanism and do tuning.  

And by default, the number of dumpers and writers are half : half of GC threads 
+1(vmThread), my testing shown this configuration is likly to be optimal. ( But 
maybe more evalution is required to confirm)

In summary, I think it is very hard for user to tune the parallel thread number 
for heap dump. So I suggest to remove it for dump and enable the parallel by 
default. 

Again, For `jmap -histo`, I suggest to keep this since it is more easy to be 
understood.

IMO, to make final decision, we need to consider following:
- Is it acceptable that `jmap -dump` and `jmap -histo` have different behavior 
about the option of `parallel`, that is, `heapdump` does not use it and 
`heaphisto` does.
- If we decide to not introduce the `parallel` option, we don't need to 
introduce the `heapdumpext` attach api command, so the 2 CSRs related to this 
PR could be closed. But this may leave the argument count problem to future, 
when more argumebts is introduced, is this acceptable? 
- if we decide to introduce `parallel=<N>` option, do we need to add 
explaination in the document? or docs like tuning guide? 

Welcome for any of your comments!

BRs,
Lin

-------------

PR: https://git.openjdk.java.net/jdk/pull/2261

Re: RFR: 8252842: Extend jmap to support parallel heap dump [v10]

Reply via email to