On Thu, 7 Jan 2021 14:45:03 GMT, Claes Redestad <redes...@openjdk.org> wrote:

>> I've identified a number of optimizations to the plumbing behind 
>> `MessageDigest.getDigest(..)` over in #1933 that removes 80-90% of the 
>> throughput overhead and all the allocation overhead compared to the 
>> `clone()` approach prototyped here. The remaining 20ns/op overhead might not 
>> be enough of a concern to do a point fix in `UUID::nameUUIDFromBytes`.
>
> Removing the UUID clone cache and running the microbenchmark along with the 
> changes in #1933:
> 
> Benchmark                                                  (size)   Mode  Cnt 
>    Score    Error   Units
> UUIDBench.fromType3Bytes                                    20000  thrpt   12 
>    2.182 ±  0.090  ops/us
> UUIDBench.fromType3Bytes:·gc.alloc.rate                     20000  thrpt   12 
>  439.020 ± 18.241  MB/sec
> UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12 
>  264.022 ±  0.003    B/op
> 
> The goal now is if to simplify the digest code and compare alternatives.

I've run various tests and concluded that the `VarHandle`ized code is matching 
or improving upon the `Unsafe`-riddled code in `ByteArrayAccess`. I then went 
ahead and consolidated to use similar code pattern in `ByteArrayAccess` for 
consistency, which amounts to a good cleanup.

With MD5 intrinsics disabled, I get this baseline:

Benchmark                                                  (size)   Mode  Cnt   
 Score    Error   Units
UUIDBench.fromType3Bytes                                    20000  thrpt   12   
 1.245 ±  0.077  ops/us
UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12  
488.042 ±  0.004    B/op

With the current patch here (not including #1933): 
Benchmark                                                  (size)   Mode  Cnt   
 Score    Error   Units
UUIDBench.fromType3Bytes                                    20000  thrpt   12   
 1.431 ±  0.106  ops/us
UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12  
408.035 ±  0.006    B/op

If I isolate the `ByteArrayAccess` changes I'm getting performance neutral or 
slightly better numbers compared to baseline for these tests:

Benchmark                                                  (size)   Mode  Cnt   
 Score    Error   Units
UUIDBench.fromType3Bytes                                    20000  thrpt   12   
 1.317 ±  0.092  ops/us
UUIDBench.fromType3Bytes:·gc.alloc.rate.norm                20000  thrpt   12  
488.042 ±  0.004    B/op

-------------

PR: https://git.openjdk.java.net/jdk/pull/1855

Reply via email to