Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-25 Thread Daniel Jeliński
On Fri, 24 May 2024 18:37:13 GMT, Vladimir Kozlov wrote: >> Changed to `lea` with `InternalAddress()`. Generates the exact same code, >> but makes more sense. I looked at `movdqu` and see no code that generates >> RIP-relative loads. It merely checks `reachable()` and adds an intermediate >

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-24 Thread Vladimir Kozlov
On Fri, 24 May 2024 15:33:46 GMT, Scott Gibbons wrote: >> Thanks for checking. Well I know that the >> `MacroAssembler::movdqu(XMMRegister dst, AddressLiteral src, Register >> rscratch)` method actually generates rip-relative addresses. Maybe we could >> copy some of that code. > > Changed to

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-24 Thread Scott Gibbons
On Fri, 24 May 2024 14:49:05 GMT, Daniel Jeliński wrote: >> Just did the experiment and it turns out that `mov64(r15, >> (int64_t)small_jump_table)` and `lea(r15, >> ExternalAddress(small_jump_table))` produce exactly the same code: >> >> `0x7fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 m

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-24 Thread Vladimir Kozlov
On Fri, 24 May 2024 14:49:05 GMT, Daniel Jeliński wrote: >> Just did the experiment and it turns out that `mov64(r15, >> (int64_t)small_jump_table)` and `lea(r15, >> ExternalAddress(small_jump_table))` produce exactly the same code: >> >> `0x7fffe463d68b: 49 bf a0 d5 63 e4 ff 7f 00 00 m

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-24 Thread Daniel Jeliński
On Fri, 24 May 2024 14:19:13 GMT, Scott Gibbons wrote: >> the RIP-relative lea should have a shorter encoding. I think something like >> `lea(r15, ExternalAddress(small_jump_table))` should produce it (untested) > > Just did the experiment and it turns out that `mov64(r15, > (int64_t)small_jump

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-24 Thread Scott Gibbons
On Fri, 24 May 2024 06:31:40 GMT, Daniel Jeliński wrote: >> It may, but I believe the movq is shorter (although maybe not to r15). I'll >> do some experimentation. > > the RIP-relative lea should have a shorter encoding. I think something like > `lea(r15, ExternalAddress(small_jump_table))` sh

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-24 Thread Scott Gibbons
On Fri, 24 May 2024 06:31:36 GMT, Daniel Jeliński wrote: >> Thanks for finding this. It was ignorance on my part as I thought the xorq >> would have logic to not emit the REX prefix if not necessary, but it >> doesn't. Fixed. > > Right, it seems to surprise people. There's a lot of preexistin

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-23 Thread Daniel Jeliński
On Thu, 23 May 2024 19:26:10 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268: >> >>> 266: __ cmpq(needle_len_p, 0); >>> 267: __ jg_b(L_nextCheck); >>> 268: __ xorq(rax, rax); >> >> out of curiosity, is there any advantage to using `xorq` instead o

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-23 Thread Scott Gibbons
On Thu, 23 May 2024 19:02:05 GMT, Daniel Jeliński wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Fix for IndexOf.java on mac > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 268: > >> 266: __ cmpq(

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-23 Thread Daniel Jeliński
On Thu, 23 May 2024 17:25:34 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v33]

2024-05-23 Thread Scott Gibbons
> Re-write the IndexOf code without the use of the pcmpestri instruction, only > using AVX2 instructions. This change accelerates String.IndexOf on average > 1.3x for AVX2. The benchmark numbers: > > > BenchmarkScore > Latest