Re: RFR: 8324573: HashMap::putAll add notes for conservative resizing [v5]

Volker Simonis Thu, 29 Feb 2024 00:37:07 -0800

On Fri, 16 Feb 2024 23:30:22 GMT, Joshua Cao <d...@openjdk.org> wrote:


>> Add notes for `HashMap::putAll()` conservative resizing.
>> 
>> Note: everything below this line is from the original change. After 
>> discussion, we decided to keep the conservative resizing, but we should add 
>> an `@implNote` for the decision.
>> 
>> ---
>> 
>> This change mirrors what we did for ConcurrentHashMap in 
>> https://github.com/openjdk/jdk/pull/17116. When we add all entries from one 
>> map to anther, we should resize that map to the size of the sum of both maps.
>> 
>> I used the command below to run the benchmarks. I set a high heap to reduce 
>> garbage collection noise.
>> 
>> java -Xms25G -jar benchmarks.jar -p size=100000 -p addSize=100000 -gc true 
>> org.openjdk.bench.java.util.HashMapBench
>> 
>> 
>> Before change
>> 
>> 
>> Benchmark            (addSize)        (mapType)  (size)  Mode  Cnt   Score   
>> Error  Units
>> HashMapBench.putAll     100000         HASH_MAP  100000  avgt    4  22.927 ± 
>> 3.170  ms/op
>> HashMapBench.putAll     100000  LINKED_HASH_MAP  100000  avgt    4  25.198 ± 
>> 2.189  ms/op
>> 
>> 
>> After change
>> 
>> 
>> Benchmark            (addSize)        (mapType)  (size)  Mode  Cnt   Score   
>> Error  Units
>> HashMapBench.putAll     100000         HASH_MAP  100000  avgt    4  16.780 ± 
>> 0.526  ms/op
>> HashMapBench.putAll     100000  LINKED_HASH_MAP  100000  avgt    4  19.721 ± 
>> 0.349  ms/op
>> 
>> 
>> We see about average time improvements of 26% in HashMap and 20% in 
>> LinkedHashMap.
>> 
>> ---
>> 
>> In the worse case, we may have two maps with identical keys. In this case, 
>> we would aggressively resize when we do not need to. I'm also adding an 
>> additional `putAllSameKeys` benchmark.
>> 
>> Before change:
>> 
>> 
>> Benchmark                                       (addSize)        (mapType)  
>> (size)  Mode  Cnt        Score   Error   Units
>> HashMapBench.putAllSameKeys                        100000         HASH_MAP  
>> 100000  avgt             6.956           ms/op
>> HashMapBench.putAllSameKeys:gc.alloc.rate          100000         HASH_MAP  
>> 100000  avgt          1091.383          MB/sec
>> HashMapBench.putAllSameKeys:gc.alloc.rate.norm     100000         HASH_MAP  
>> 100000  avgt       7968871.917            B/op
>> HashMapBench.putAllSameKeys:gc.count               100000         HASH_MAP  
>> 100000  avgt               ≈ 0          counts
>> HashMapBench.putAllSameKeys                        100000  LINKED_HASH_MAP  
>> 100000  avgt             8.417           ms/op
>> HashMapBench.putAllSameKeys:gc.alloc.rate          100000  LINKED_HASH_MAP  
>> 100000  avgt           992.543          MB/sec
>> HashM...
>
> Joshua Cao has updated the pull request with a new target base due to a merge 
> or a rebase. The incremental webrev excludes the unrelated changes brought in 
> by the merge/rebase. The pull request contains six additional commits since 
> the last revision:
> 
>  - Add note about conservative resizing
>  - Merge branch 'master' into hashmap
>  - extract msize variable
>  - Use max of both sizes and other maps size in case of overflow
>  - Add benchmark with all duplicate keys
>  - 8324573: HashMap::putAll should resize to sum of both map sizes

I don't think we need to document what this method is *not* doing or what it 
*could* do. We should document what the current implementation *is* doing and 
give advice how the  impact of the current implementation can be mitigated. 
Something like:

> {@code HashMap}'s resize policy is intentionally conservative to avoid an 
> unnecessarily large capacity if {@code m} contains many duplicate keys. This 
> can lead to a potentially expensive, extra resize operation. In order to 
> avoid such an additional resize operation callers of {@code putAll} can 
> either use the {@code HashMap(int initialCapacity)} constructor to ensure 
> that this map has a large enough capacity or they can call {@code 
> newHashMap(int numMappings)} before calling {@code putAll()} to ensure that 
> the map is only resized and copied once.

Please fix the tags accordingly, I'm not fluent in JavaDoc :)

-------------

Changes requested by simonis (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/17544#pullrequestreview-1908264266

Re: RFR: 8324573: HashMap::putAll add notes for conservative resizing [v5]

Reply via email to