On Fri, 19 Mar 2021 20:01:01 GMT, Chris Plummer <cjplum...@openjdk.org> wrote:
>> Dear Ralf and Chris, >> >>> to be honest I would prefer if we would not add the parallel option but >>> instead do it automatically depending on the number of GC threads >>> available. I doubt that the user has a good idea about number of threads to >>> use, so I think it is better to just use a sensible default derived from >>> the number of threads available and if compression is enabled or not. >> >> Thanks Ralf for your comments! >> >> @plummercj and I have discussed the `parallel=` option several rounds, also >> together with Serguei @sspitsyn and Ham-Lin @Hamlin-Li . And I recently >> also re-considered this option for dumping. here I list my opinion and hope >> we could made the final decision this time :-) >> >> - The `parallel=<N>` option for `jmap -histo` has been discussed and >> finalized, I think we could leave it as it is. Because user may get general >> info about their system, such as core number and multi-process workload, >> these could help them tuning. And also because the `parallel=<N>` option >> only affects how many threads would be used for object iteration (this is >> different with `jmap -dump`, I will explain in following comments), it is >> easy to observe the output or time and then adjust the number. >> >> - The `parallel=<N>` option for `jmap -dump` is complicated, it has to >> consider the file writer threads and the disk writing speed. let me try to >> explain the implementation briefly: >> >> In current implementation, the heap dump actaully has two kind of worker >> threads - dumper and writter. The dumper iterates over the heap and send >> data to the writer; the writer does compression if required and writes data >> to the file. There is buffer queue between dumper and writer: dumper grab >> empty buffer and fill data into the queue , and the writer threads grab the >> filled buffer from the queue, write data to file and then return empty >> buffer for futurn use. >> In summary it looks like following graph: >> >> ` Dumper --> BufferQueue (size == #writers) --> writer(multiple) ==> >> disk(file)` >> >> Before this patch, there is only one dumper (the VMThread) and all other >> worker threads are writer, that is, all GC threads work as writer. The only >> dumper iterate heap and the multiple writers write data to file. So the >> bottomneck is more likely to be heap iteration (dumper side). >> >> With this patch, there are multiple dumpers and multiple writers, they all >> divided the gc thread number (plus 1 vmthread). And since there are multiple >> dumpers, the buffer queue is more likely to be filled. So the bottomneck >> move from heap iteration to two aspect (writer side): >> - drain of buffer queue >> - speed of disk writing >> >> Since the buffer queue size is determined by number of writers, and the disk >> writing speed determines how fast the buffer can be empty again, it is >> important to keep the writer number as much as it can be to satisfy the >> dumper's requirement. >> >> The `parallel=<N>` option only control the dumper thread number, and it >> indirectly affects the writer's thread number (as they all divide the total >> thread number). it is really hard for user to understand these internal >> mechanism and do tuning. >> >> And by default, the number of dumpers and writers are half : half of GC >> threads +1(vmThread), my testing shown this configuration is likly to be >> optimal. ( But maybe more evalution is required to confirm) >> >> In summary, I think it is very hard for user to tune the parallel thread >> number for heap dump. So I suggest to remove it for dump and enable the >> parallel by default. >> >> Again, For `jmap -histo`, I suggest to keep this since it is more easy to be >> understood. >> >> IMO, to make final decision, we need to consider following: >> - Is it acceptable that `jmap -dump` and `jmap -histo` have different >> behavior about the option of `parallel`, that is, `heapdump` does not use it >> and `heaphisto` does. >> - If we decide to not introduce the `parallel` option, we don't need to >> introduce the `heapdumpext` attach api command, so the 2 CSRs related to >> this PR could be closed. But this may leave the argument count problem to >> future, when more argumebts is introduced, is this acceptable? >> - if we decide to introduce `parallel=<N>` option, do we need to add >> explaination in the document? or docs like tuning guide? >> >> Welcome for any of your comments! >> >> BRs, >> Lin > >> * Is it acceptable that `jmap -dump` and `jmap -histo` have different >> behavior about the option of `parallel`, that is, `heapdump` does not use it >> and `heaphisto` does. > > And just to further clarify, the parallel support for `jmap -histo` is > already in JDK 16, which is one reason we were swayed towards including it > with `jmap -dump`. If we didn't we were thinking in order to be consistent we > would need to pull the parallel support for `jmap -histo`, which would be > somewhat problematic since it will already have shipped in 16. However, given > that these are really two different commands, and we have some good arguments > for not including parallel support for `jmap -dump`, I can see leaving it out. > >> * If we decide to not introduce the `parallel` option, we don't need to >> introduce the `heapdumpext` attach api command, so the 2 CSRs related to >> this PR could be closed. But this may leave the argument count problem to >> future, when more argumebts is introduced, is this acceptable? > > Yes, and in fact I would argue that we should not be doing the `heapdumpext` > solution now if we no longer have a need for it. > >> * if we decide to introduce `parallel=<N>` option, do we need to add >> explaination in the document? or docs like tuning guide? > > Which document were you thinking? Is the turning guide a hotspot tuning > guide? If so, I don't think that's the right place for it. The focus should > be in any jmap documentation. > > And to add a 4th bullet item to your list, we also have the `GC.heap_dump` > jcmd. I think there as a CR and CSR to add parallel support to it (but > honestly I've lost track). If we choose not to include parallel support for > `jmap -dump`, I think we should exclude it from `GC.heap_dump` also. Hi Chris, Thanks for your comments! So I think we agree on keep `parallel=N` option for hitogram to keep it as JDK16 and not use it for heapdump. I will close related CSR and revert related codes. > Yes, and in fact I would argue that we should not be doing the heapdumpext > solution now if we no longer have a need for it. Agree, at least I don't have any idea for adding new features that require new options > Which document were you thinking? Is the turning guide a hotspot tuning > guide? If so, I don't think that's the right place for it. The focus should > be in any jmap documentation. I was thinking of the tuning guide. but it seems even there is explaination of the parallel implementation of heap dump, it is hard to do tuning... > And to add a 4th bullet item to your list, we also have the `GC.heap_dump` > jcmd. I think there as a CR and CSR to add parallel support to it (but > honestly I've lost track). If we choose not to include parallel support for > `jmap -dump`, I think we should exclude it from `GC.heap_dump` also. There are CSRs created for the parallel option for GC.class_histogram, the one for GC.heap_dump is not created yet (or I cannot find one) Thanks, Lin ------------- PR: https://git.openjdk.java.net/jdk/pull/2261