On Sat, 29 Nov 2025 04:55:44 GMT, Jiangli Zhou <[email protected]> wrote:

>> Oh man, the `pos`/`len` modifications in current code are confusing. I 
>> scratched my head for quite a while trying to comprehend why does `__ 
>> bind(MESG_BELOW_32_BLKS)` split the `pos += 16` and `len -= 16`? On a 
>> surface, that just looks like a bug.
>> 
>> But looks that way because we do `initial_blocks_16_avx512` twice, do `pos 
>> += 16` twice, but only do the `len += 32` after the second call. Which does 
>> not help if we shortcut after the first call. In fact, I am not at all sure 
>> that checking `len < 32` _without_ modifying `len` beforehand does not break 
>> the assumptions downstream:
>> 
>> 
>>   initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, 
>> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx,  ADDBE_4x4, ADDBE_1234, ADD_1234, 
>> SHUF_MASK, stack_offset);
>>   __ addl(pos, 16 * 16);
>>   __ cmpl(len, 32 * 16);
>>   __ jcc(Assembler::below, MESG_BELOW_32_BLKS);
>> 
>> 
>> Really, in these kind of intrinsics, _you want_ to make sure `pos` and `len` 
>> updates are tightly bound together, otherwise these kinds of mistakes would 
>> keep happening. You will lose on code density a bit, but would have more 
>> readable and robust code.
>> 
>> Shouldn't it be like this?
>> 
>> 
>> diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp 
>> b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp
>> index 1e728ffa279..a16e25b075d 100644
>> --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp
>> +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp
>> @@ -3475,12 +3475,14 @@ void StubGenerator::aesgcm_avx512(Register in, 
>> Register len, Register ct, Regist
>>  
>>    initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, 
>> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx,  ADDBE_4x4, ADDBE_1234, ADD_1234, 
>> SHUF_MASK, stack_offset);
>>    __ addl(pos, 16 * 16);
>> +  __ subl(len, 16 * 16);
>> +
>>    __ cmpl(len, 32 * 16);
>>    __ jcc(Assembler::below, MESG_BELOW_32_BLKS);
>>  
>>    initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, 
>> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, 
>> SHUF_MASK, stack_offset + 16);
>>    __ addl(pos, 16 * 16);
>> -  __ subl(len, 32 * 16);
>> +  __ subl(len, 16 * 16);
>>  
>>    __ cmpl(len, 32 * 16);
>>    __ jcc(Assembler::below, NO_BIG_BLKS);
>> @@ -3491,24 +3493,27 @@ void StubGenerator::aesgcm_avx512(Register in, 
>> Register len, Register ct, Regist
>>    ghash16_encrypt_parallel16_avx512(in, out, ct, pos, avx512_subkeyHtbl, 
>> CTR_CHECK, rounds, key, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, 
>> ADD_1234, SHUF_MASK,
>>                      ...
>
>> Oh man, the `pos`/`len` modifications in current code are confusing. I 
>> scratched my head for quite a while trying to comprehend why does `__ 
>> bind(MESG_BELOW_32_BLKS)` split the `pos += 16` and `len -= 16`? On a 
>> surface, that just looks like a bug.
> 
> The combination of handling for the fall through from `ENCRYPT_16_BLKS` and 
> conditional entry to `MESG_BELOW_32_BLKS` cases are subtle.
> 
> I had also missed the fall through case in my initial proposed fix (with 
> comp/jcc) until @sviswa7 pointed it out and suggested the current fix. The 
> fix for `StubGenerator::aesgcm_avx512` with moving `__ addl(pos, 16 * 16)` to 
> be before `__ bind(MESG_BELOW_32_BLKS)` works correctly for both the fall 
> through and conditional jump cases now.
> 
>> 
>> But looks that way because we do `initial_blocks_16_avx512` twice, do `pos 
>> += 16` twice, but only do the `len += 32` after the second call. Which does 
>> not help if we shortcut after the first call. In fact, I am not at all sure 
>> that checking `len < 32` _without_ modifying `len` beforehand does not break 
>> the assumptions downstream:
>> 
>> ```
>>   initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, 
>> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx,  ADDBE_4x4, ADDBE_1234, ADD_1234, 
>> SHUF_MASK, stack_offset);
>>   __ addl(pos, 16 * 16);
>>   __ cmpl(len, 32 * 16);
>>   __ jcc(Assembler::below, MESG_BELOW_32_BLKS);
>> ```
>> 
>> Really, in these kind of intrinsics, _you want_ to make sure `pos` and `len` 
>> updates are tightly bound together, otherwise these kinds of mistakes would 
>> keep happening. You will lose on code density a bit, but would have more 
>> readable and robust code.
>> 
>> Shouldn't it be like this?
>> 
>> ```
>> diff --git a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp 
>> b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp
>> index 1e728ffa279..a16e25b075d 100644
>> --- a/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp
>> +++ b/src/hotspot/cpu/x86/stubGenerator_x86_64_aes.cpp
>> @@ -3475,12 +3475,14 @@ void StubGenerator::aesgcm_avx512(Register in, 
>> Register len, Register ct, Regist
>>  
>>    initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, 
>> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx,  ADDBE_4x4, ADDBE_1234, ADD_1234, 
>> SHUF_MASK, stack_offset);
>>    __ addl(pos, 16 * 16);
>> +  __ subl(len, 16 * 16);
>> +
>>    __ cmpl(len, 32 * 16);
>>    __ jcc(Assembler::below, MESG_BELOW_32_BLKS);
>>  
>>    initial_blocks_16_avx512(in, out, ct, pos, key, avx512_subkeyHtbl, 
>> CTR_CHECK, rounds, CTR_BLOCKx, AAD_HASHx, ADDBE_4x4, ADDBE_1234, ADD_1234, 
>> ...

> @jianglizhou thank you for the AVX2 related output from the unit test 
> pre-fix. From this I was able to trace the point of failure and see that your 
> proposed changes are good for approval. Thank you for your work on these 
> issues!

@smemery Thanks for carefully testing the changes!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/28363#issuecomment-3590993560

Reply via email to