On Fri, 23 Jan 2026 12:10:09 GMT, xinyangwu <[email protected]> wrote:

> ### Summary
> This PR introduces a parallel intrinsic for AES/ECB operations to replace the 
> current per-block processing approach, reducing native call overhead and 
> improving throughput for multi-block operations.
> ### Problem
> Except supporting AVX512, The existing AES/ECB implementation suffers from 
> three major performance issues:
> 1. Excessive stub call overhead: Each 16-byte block requires a separate 
> intrinsic call, resulting in high invocation frequency
> 
> 2. Inefficient instruction-level parallelism: The serialized block processing 
> fails to fully utilize instruction-level parallelism
> 
> 3. Redundant setup/teardown: Repeated initialization of encryption state for 
> each block
> ### Changes
> Added parallel AES intrinsic implementation
> ### Testing
> JMH benchmarks
> 
> It can bring about a **37.43%** performance improvement.
> 
> On a Intel(R) Core(TM) i9-14900HX CPU machine with origin implements:
> 
> 
> Benchmark     Mode  Cnt      Score    Error  Units
> AesTest.test  avgt    5  11518.846 ± 68.621  ns/op
> 
> 
> On the same machine with optimized implements:
> 
> 
> Benchmark     Mode  Cnt     Score    Error  Units
> AesTest.test  avgt    5  8381.499 ± 57.751  ns/op
> 
> 
> All Tier-1 tests pass on linux-x64. This modification does not involve 
> changing the encryption or decryption logic.

This pull request has now been integrated.

Changeset: 3e9fc5d4
Author:    wuxinyang <[email protected]>
Committer: SendaoYan <[email protected]>
URL:       
https://git.openjdk.org/jdk/commit/3e9fc5d49e52d79bcd2bb75068ff7efb31f768fd
Stats:     212 lines in 2 files changed: 209 ins; 0 del; 3 mod

8376164: Optimize AES/ECB implementation using full-message intrinsic stub and 
parallel RoundKey addition

Reviewed-by: sviswanathan, semery

-------------

PR: https://git.openjdk.org/jdk/pull/29385

Reply via email to