Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-10-30 Thread Paolo Savini
Thanks for the review Richard. On 10/30/24 11:40, Richard Henderson wrote: On 10/29/24 19:43, Paolo Savini wrote: This patch optimizes the emulation of unit-stride load/store RVV instructions when the data being loaded/stored per iteration amounts to 16 bytes or more. The optimization

[RFC v4 0/2] target/riscv: add wrapper for target specific macros in atomicity check.

2024-10-29 Thread Paolo Savini
Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Helene CHELIN (1): target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores Paolo Savini (1): target/riscv: rvv: improve performance of RISC-V vector loads and stores on

[RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-10-29 Thread Paolo Savini
and the destination memory address and vice versa. This is done only if we have direct access to the RAM of the host machine, if the host is little endiand and if it supports atomic 128 bit memory operations. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c| 17

[RFC v4 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-10-29 Thread Paolo Savini
sters (LMUL=1). The optimization consists of avoiding the overhead of probing the RAM of the host machine and doing a loop load/store on the input data grouped in chunks of as many bytes as possible (8,4,2,1 bytes). Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini Signed-off-by: Helene C

[RFC v3 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-10-14 Thread Paolo Savini
f the vector registers (LMUL=1). The optimization consists of avoiding the overhead of probing the RAM of the host machine and doing a loop load/store on the input data grouped in chunks of as many bytes as possible (8,4,2,1 bytes). Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini S

[RFC v3 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-10-14 Thread Paolo Savini
register and the destination memory address and vice versa. This is done only if we have direct access to the RAM of the host machine, if the host is little endiand and if it supports atomic 128 bit memory operations. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 14 +- 1

[RFC v3 0/2] target/riscv: add endianness checks and atomicity guarantees.

2024-10-14 Thread Paolo Savini
Cc: Palmer Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Helene CHELIN (1): target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores Paolo Savini

[RFC v2 2/2] target/riscv: use a simplified loop to emulate rvv loads/stores only in user mode.

2024-10-02 Thread Paolo Savini
The simplified emulation of vector loads and stores that bypasses the memory probing in the vext_ldst_us helper function seem to benefit only the user mode. We therefore limit this approach to the user mode configuration. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 3 ++- 1

[RFC v2 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-10-02 Thread Paolo Savini
f the vector registers (LMUL=1). The optimization consists of avoiding the overhead of probing the RAM of the host machine and doing a loop load/store on the input data grouped in chunks of as many bytes as possible (8,4,2,1 bytes). Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini S

[RFC v2 0/2] target/riscv: use a simplified loop to emulate rvv loads/stores only in user mode.

2024-10-02 Thread Paolo Savini
Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Helene CHELIN (1): target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores Paolo Savini (1): target/riscv: use a simplified loop to emulate

[RFC 1/1] target/riscv: use a simplified loop to emulate rvv loads/stores only in user mode.

2024-09-25 Thread Paolo Savini
The simplified emulation of vector loads and stores that bypasses the memory probing in the vext_ldst_us helper function seem to benefit only the user mode. We therefore limit this approach to the user mode configuration. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 3 ++- 1

[RFC 0/1] target/riscv: use a simplified loop to emulate rvv loads/stores only in user mode.

2024-09-25 Thread Paolo Savini
load/store loop for small vector and data sizes when QEMU is in system mode. Cc: Richard Handerson Cc: Palmer Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Paolo Savini (1): target/riscv

Re: [RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-09-10 Thread Paolo Savini
Thanks for the feedback Richard, I'm working on the endianness. Could you please give me more details about the atomicity issues you are referring to? Best wishes Paolo On 7/27/24 08:15, Richard Henderson wrote: On 7/18/24 01:30, Paolo Savini wrote: This patch optimizes the emulati

[RFC 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-07-17 Thread Paolo Savini
f the vector registers (LMUL=1). The optimization consists of avoiding the overhead of probing the RAM of the host machine and doing a loop load/store on the input data grouped in chunks of as many bytes as possible (8,4,2,1 bytes). Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini S

[RFC 0/2] Improve the performance of unit-stride RVV ld/st on

2024-07-17 Thread Paolo Savini
erhead for simple RISC-V vector unit-stride loads and stores Paolo Savini (1): target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data. target/riscv/vector_helper.c | 63 +++- 1 file changed, 62 insertions(+),

[RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-07-17 Thread Paolo Savini
register and the destination memory address and vice versa. This is done only if we have direct access to the RAM of the host machine. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/target/riscv