On Tue, 6 Jun 2023 02:35:30 GMT, Yi Yang <[email protected]> wrote:
>> ### Motivation and proposal
>> Hi, heap dump brings about pauses for application's execution(STW), this is
>> a well-known pain. JDK-8252842 have added parallel support to heapdump in an
>> attempt to alleviate this issue. However, all concurrent threads
>> competitively write heap data to the same file, and more memory is required
>> to maintain the concurrent buffer queue. In experiments, we did not feel a
>> significant performance improvement from that.
>>
>> The minor-pause solution, which is presented in this PR, is a two-stage
>> segmented heap dump:
>>
>> 1. Stage One(STW): Concurrent threads directly write data to multiple heap
>> files.
>> 2. Stage Two(Non-STW): Merge multiple heap files into one complete heap dump
>> file.
>>
>> Now concurrent worker threads are not required to maintain a buffer queue,
>> which would result in more memory overhead, nor do they need to compete for
>> locks.
>>
>> ### Performance evaluation
>> | memory | numOfThread | STW | Total | Compression |
>> | --- | --------- | -------------- | ------------ | -------- |
>> | 8g | 1 thread | 15.612 secs | 15.612 secs | N |
>> | 8g | 32 thread | 2.5617250 secs | 14.498 secs | N |
>> | 8g | 32 thread | 2.3084878 secs | 3.198 secs | Compress1 |
>> | 8g | 32 thread | 10.9355128 secs | 21.882 secs | Compress2 |
>> | 8g | 96 thread | 2.6790452 secs | 14.012 secs | N |
>> | 8g | 96 thread | 2.3044796 secs | 3.589 secs | Compress1 |
>> | 8g | 96 thread | 9.7585151 secs | 20.219 secs| Compress2 |
>> | 16g | 1 thread | 26.278 secs | 26.278 secs | N |
>> | 16g | 32 thread | 5.2313740 secs | 26.417 secs | N |
>> | 16g | 32 thread | 5.6946983 secs | 6.538 secs | Compress1 |
>> | 16g | 32 thread | 21.8211105 secs | 41.133 secs | Compress2 |
>> | 16g | 96 thread | 6.2445556 secs | 27.141 secs | N |
>> | 16g | 96 thread | 4.6007096 secs | 6.259 secs | Compress1 |
>> | 16g | 96 thread | 19.2965783 secs | 39.007 secs | Compress2 |
>> | 32g | 1 thread | 48.149 secs | 48.149 secs | N |
>> | 32g | 32 thread | 10.7734677 secs | 61.643 secs | N |
>> | 32g | 32 thread | 10.1642097 secs | 10.903 secs | Compress1 |
>> | 32g | 32 thread | 43.8407607 secs | 88.152 secs | Compress2 |
>> | 32g | 96 thread | 13.1522042 secs | 61.432 secs | N |
>> | 32g | 96 thread | 9.0954641 secs | 9.885 secs | Compress1 |
>> | 32g | 96 thread | 38.9900931 secs | 80.574 secs | Compress2 |
>> | 64g | 1 thread | 100.583 secs | 100.583 secs | N |
>> | 64g | 32 thread | 20.9233744 secs | 134.701 secs | N |
>> | 64g | 32 thread | 18.5023784 secs | 19.358 secs | Compre...
>
> Yi Yang has updated the pull request incrementally with one additional commit
> since the last revision:
>
> undo refactor -- done
It looks good in general. I added some change requests.
src/hotspot/share/services/heapDumper.cpp line 577:
> 575: _in_dump_segment = true;
> 576: _is_huge_sub_record = len > buffer_size() - dump_segment_header_size;
> 577: ResourceMark rm;
Looks like redundant leftover from development
src/hotspot/share/services/heapDumper.cpp line 1892:
> 1890: }
> 1891:
> 1892: static int volatile dump_seq = 0;
Could you please convert this global variable to field of VM_HeapDumper.
The value can be passed to VM_HeapDumpMerge as ctor argument
src/hotspot/share/services/heapDumper.cpp line 1894:
> 1892: static int volatile dump_seq = 0;
> 1893:
> 1894: void VM_HeapDumpMerge::merge_done() {
Implementation of VM_HeapDumpMerge methods is in the middle of VM_HeapDumper
methods.
Please put them after VM_HeapDumpMerge declaration (as we have for all other
classes)
-------------
PR Review: https://git.openjdk.org/jdk/pull/13667#pullrequestreview-1478097941
PR Review Comment: https://git.openjdk.org/jdk/pull/13667#discussion_r1228735182
PR Review Comment: https://git.openjdk.org/jdk/pull/13667#discussion_r1228728069
PR Review Comment: https://git.openjdk.org/jdk/pull/13667#discussion_r1228731666