On Fri, 2 Sep 2022 16:52:02 GMT, Jamil Nimeh <jni...@openjdk.org> wrote:
>> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2521: >> >>> 2519: #undef INSN3 >>> 2520: #undef INSN4 >>> 2521: >> >> This code to handle the AdvSIMD load/store single structure and AdvSIMD >> load/store single structure (post-indexed) is excessive. >> >> Every one of these instructions has the the format, >> >> `0|Q|0011010|L|R|00000|opcode|S|size|Rn|Rt` >> >> or >> >> `0|Q|0011011|L|R| Rm|opcode|S|size|Rn|Rt` >> >> Perhaps consider using a `RegSet regs` for the registers. Then the >> instruction encoding to use (1,2,3,or 4 consecutive registers) can be picked >> up from `regs.size()`. There only needs to be a single routine for all of >> the `ld_st` variants. > > Thanks for the suggestion. I will look into this. I can see how > `regs.size()` could simplify these macros. Another thing that may be better than a `RegSet`. If you use a C++11 template parameter pack, you can do something like this: template<typename R, typename... Rx> void foo(R first_register, Rx... more_registers) { const R regs[] = { first_register, more_registers... }; // An array that contains the more regs const int count = sizeof...(more_registers); // The count of more regs ... } And then you can use the same logic, regardless of the number of registers. > What I don't know is if one approach is better than the other for other > reasons like performance or memory consumption. Do you have any feelings one > way or the other? `ADR` is smaller and faster at runtime, `lea(reg, ExternalAddress((address) foo)` with `const uint64_t[] foo = { ... }` will be slightly faster at start-up time. It makes no sense to emit the table with `emit_data64()` then take the address of the table you've just emitted with `lea`. That's worse for startup time _and_ for runtime. So I don't much mind emitting the table at runtime, but if you do, get its address with `ADR`. ------------- PR: https://git.openjdk.org/jdk/pull/7702