On Thu, 23 Oct 2025 01:39:02 GMT, Ben Perez <[email protected]> wrote:

> An aarch64 implementation of the `MontgomeryIntegerPolynomial256.mult()` 
> method

On 29/10/2025 20:18, Ben Perez wrote:
> Is there by any chance documentation for `RegSet` that I can reference while 
> making these changes?

No, but it's all there in the source.

-- 
Andrew Haley  (he/him)
Java Platform Lead Engineer
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7144:

> 7142: 
> 7143:   address generate_intpoly_montgomeryMult_P256() {
> 7144: 

As a general point, it would help everyone if you provided pseudocode for the 
whole thing.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7160:

> 7158:     };
> 7159: 
> 7160:     Register c_ptr = r9;

`rscratch1` and `rscratch2` are used freely by macros, so aliasing them is 
always rather sketchy. As far as I can tell the arg registers aren't used here, 
so it makes sense to use `r3`...

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7169:

> 7167:     Register mul_tmp = r14;
> 7168:     Register n = r15;
> 7169: 

Here, you could do something like


    RegSet scratch = RegSet::range(r3, r28) - rscratch1 - rscratch2;

    {
      auto r_it = scratch.begin();
      Register
        c_ptr = *r_it++,
        a_i = *r_it++,
        c_idx = *r_it++, //c_idx is not used at the same time as a_i
        limb_mask_scalar = *r_it++,
        b_j = *r_it++,
        mod_j = *r_it++,
        mod_ptr = *r_it++,
        mul_tmp = *r_it++,
        n = *r_it++;
       ...
    }



Note that a RegSet iterator doesn't affect the RegSet it was created from, so 
once this block has ended you can allocate again from the set of scratch 
registers.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7201:

> 7199:     __ mov(limb_mask_scalar, 1);
> 7200:     __ neg(limb_mask_scalar, limb_mask_scalar);
> 7201:     __ lsr(limb_mask_scalar, limb_mask_scalar, 12);

Suggestion:

    __ mov(limb_mask_scalar, -UCONST64(1) >> (64 - BITS_PER_LIMB));

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7229:

> 7227:       __ shl(high_01, __ T2D, high_01, shift1);
> 7228:       __ ushr(tmp, __ T2D, low_01, shift2);
> 7229:       __ orr(high_01, __ T2D, high_01, tmp);

Suggestion:

      __ orr(high_01, __ T16B, high_01, tmp);

everywhere.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7403:

> 7401:     // c3 &= LIMB_MASK;
> 7402: 
> 7403:     __ ldr(mod_j, __ post(mod_ptr, 8));

Best not to use post-increment if you can avoid it.

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 7440:

> 7438: 
> 7439:     Register res_0 = r9;
> 7440:     Register res_1 = r10;

Aliasing the same register with different names is very dangerous, and has 
cause hard-to-find failures in production code in the past. You can confine the 
`Register` instances to block scope. You can also suffix or prefix the local 
names with canonical register names.

Best of all is to get rid of the manual register allocation altogether, by  
creating a `RegSet`, then adding and removing registers that you need, as you 
go along. That way the need to manually check register usage goes away 
altogether.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27946#issuecomment-3466766156
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460515105
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460510697
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2461123587
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460582673
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2461125143
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460621762
PR Review Comment: https://git.openjdk.org/jdk/pull/27946#discussion_r2460612964

Reply via email to