On Tue, 15 Nov 2022 17:42:08 GMT, Volodymyr Paprotski <[email protected]> wrote:
>> src/hotspot/cpu/x86/stubGenerator_x86_64_poly.cpp line 384:
>>
>>> 382: void StubGenerator::poly1305_limbs(const Register limbs, const
>>> Register a0, const Register a1, const Register a2, bool only128)
>>> 383: {
>>> 384: const Register t1 = r13;
>>
>> Please, make the temps explicit and lift them into arguments. Otherwise,
>> it's hard to see what registers are clobbered when helper methods are called.
>
> Thanks for pointing this out.. I spent quite a bit of time and went back and
> forth on 'register allocation'... it does make sense to pass all the temps
> needed, when the number of temps is small. This is the case for the three
> `*_limbs_*` functions. Maybe I should indeed do that...
>
> On other hand, there are functions like `poly1305_multiply8_avx512` and
> `poly1305_process_blocks_avx512` that use a _lot_ of temp registers. I think
> it makes sense to keep those as 'function-header declarations'.
>
> Then there are functions like `poly1305_multiply_scalar` that could go either
> way, has some temps and 'implicitly clobbered' registers, but probably should
> stay 'as is'..
>
> I ended up being 'pedantic' and making _all_ temps into 'header variables'. I
> also tried to comment, but those probably mean more to me then anyone else in
> hindsight?
>
>
> // Register Map:
> // GPRs:
> // input = rdi
> // length = rbx
> // accumulator = rcx
> // R = r8
> // a0 = rsi
> // a1 = r9
> // a2 = r10
> // r0 = r11
> // r1 = r12
> // c1 = r8;
> // t1 = r13
> // t2 = r14
> // t3 = r15
> // t0 = r14
> // rscratch = r13
> // stack(rsp, rbp)
> // imul(rax, rdx)
> // ZMMs:
> // T: xmm0-6
> // C: xmm7-9
> // A: xmm13-18
> // B: xmm19-24
> // R: xmm25-29
> ...
> // Register Map:
> // reserved: rsp, rbp, rcx
> // PARAMs: rdi, rbx, rsi, r8-r12
> // poly1305_multiply_scalar clobbers: r13-r15, rax, rdx
> const Register t0 = r14;
> const Register t1 = r13;
> const Register rscratch = r13;
>
> // poly1305_limbs_avx512 clobbers: xmm0, xmm1
> // poly1305_multiply8_avx512 clobbers: xmm0-xmm6
> const XMMRegister T0 = xmm2;
> ...
>
>
> I think I am ok changing the `*limbs*` functions (even started, before I
> remembered my train of thought from months back..) but let me know if you
> agree with the rest of the reasoning?
Changed just the three `*limbs*` functions.
-------------
PR: https://git.openjdk.org/jdk/pull/10582