On Wed, 14 May 2025 12:46:41 GMT, Radim Vansa <rva...@openjdk.org> wrote:

>> This optimization is a followup to https://github.com/openjdk/jdk/pull/24290 
>> trying to reduce the performance regression in some scenarios introduced in 
>> https://bugs.openjdk.org/browse/JDK-8292818 . Based both on performance and 
>> memory consumption it is a (better) alternative to 
>> https://github.com/openjdk/jdk/pull/24713 .
>> 
>> This PR optimizes local field lookup in classes with more than 16 fields; 
>> rather than sequentially iterating through all fields during lookup we sort 
>> the fields based on the field name. The stream includes extra table after 
>> the field information: for field at position 16, 32 ... we record the 
>> (variable-length-encoded) offset of the field info in this stream. On field 
>> lookup, rather than iterating through all fields, we iterate through this 
>> table, resolve names for given fields and continue field-by-field iteration 
>> only after the last record (hence at most 16 fields).
>> 
>> In classes with <= 16 fields this PR reduces the memory consumption by 1 
>> byte that was left with value 0 at the end of stream. In classes with > 16 
>> fields we add extra 4 bytes with offset of the table, and the table contains 
>> one varint for each 16 fields. The terminal byte is not used either.
>> 
>> My measurements on the attached reproducer
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk-17/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk-17/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      51.3 ms ±   2.8 ms    [User: 44.7 ms, System: 13.7 
>> ms]
>>   Range (min … max):    45.1 ms …  53.9 ms    100 runs
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk25-master/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk25-master/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      78.2 ms ±   1.0 ms    [User: 74.6 ms, System: 17.3 
>> ms]
>>   Range (min … max):    73.8 ms …  79.7 ms    100 runs
>> 
>> (the jdk25-master above already contains JDK-8353175)
>> 
>> hyperfine -w 50 -r 100 '/path/to/jdk25-this-pr/bin/java -cp /tmp CCC'
>> Benchmark 1: /path/to/jdk25-this-pr/jdk/bin/java -cp /tmp CCC
>>   Time (mean ± σ):      38.5 ms ±   0.5 ms    [User: 34.4 ms, System: 17.3 
>> ms]
>>   Range (min … max):    37.7 ms …  42.1 ms    100 runs
>> 
>> While https://github.com/openjdk/jdk/pull/24713 returned the performance to 
>> previous levels, this PR improves it by 25% compared to JDK 17 (which does 
>> not contain the regression)! This time, the undisclosed production-grade 
>> reproducer shows even higher improvement:
>> 
>> JDK 17: 1.6 s
>> JDK 21 (no patches): 22 s
>> JDK25-master: 12.3 s
>> JDK25-this-pr: 0.5 s
>
> Radim Vansa has updated the pull request incrementally with five additional 
> commits since the last revision:
> 
>  - Revert change in array.hpp
>  - Revert changes in VerifyRawIndexesTest
>  - Improve FieldInfo::print
>  - Load constant for SORTED_FIELD_TABLE_THRESHOLD
>  - Replace JumpTable with SortedFieldTable

Please move comparison logic to `Symbol::fast_compare_cds_safe`, where it will 
be more maintainable.

Please do not store ad hoc non-compressed data in a CompressedStream.  
Compressed streams used to have such junk in them, and we worked hard to clean 
them up.  Compression is easier to maintain and enhance when it is consistently 
used.  That would also have been my objection to the control bytes in PR 24713.

I confirm that the main field-info stream should *not* be reordered.  That is 
likely to harm users which are sensitive to encounter order.

Instead, a side array should binary-searched.

Please use a separate Array<u1> to store the side array.

(More comments on the format of that side array are likely, but it makes sense 
to have each entry consist of a secondary key .)

This design will be more refactorable, for when we need the same thing for PC 
range and/or reloc info and/or other streamy data that may need indexing.

One thing that is hard to live with (maintain, enhance) is a single array that 
contains two or more sub-segments.  It is that way with the bootstrap method 
indices in the constant pool, but the ownership costs are high, and we are 
working to get free of that sort of local optimization.  This is why I am 
confident of asking for a second array, rather than clever repacking of the 
first array.

If the 8 bytes of the second array really are important (they are not, 
presently), we can make that (and/or other) pointers in InstanceKlass be 
optional, as some fields in Method are optional today.

-------------

Changes requested by jrose (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/24847#pullrequestreview-2855741807

Reply via email to