Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Scott Gibbons
On Tue, 28 May 2024 21:17:07 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: >> >>> 486: __ cmpq(r11, nMinusK); >>> 487: __ ja_b(L_return); >>> 488: __ movq(rax, r11); >> >> At places where we know that return value in r11 is correct, we dont

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Scott Gibbons
On Tue, 28 May 2024 16:37:23 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Fix tests > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: > >> 486: __ cmpq(r11, nMinu

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Scott Gibbons
On Tue, 28 May 2024 20:56:42 GMT, Scott Gibbons wrote: >> We can also do full reads for (n-k) == 31, as we also compare the kth byte. > > I'll change and test. Passes tests, so I'll change. - PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1617886613

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Scott Gibbons
On Tue, 28 May 2024 20:37:43 GMT, Sandhya Viswanathan wrote: >> I listed all registers for clarity. This ensures that we know what can be >> used as values or as scratch registers with no ambiguity. > > Sounds good. We could keep only comment out of the two as it is the same for > both small

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Volodymyr Paprotski
On Tue, 28 May 2024 17:36:03 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 488: >> >>> 486: __ cmpq(r11, nMinusK); >>> 487: __ ja_b(L_return); >>> 488: __ movq(rax, r11); >> >> At places where we know that return value in r11 is correct, we dont

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Scott Gibbons
On Tue, 28 May 2024 20:35:26 GMT, Sandhya Viswanathan wrote: >> No. For (n-k)==32 we can do full reads. I'll clarify by changing the label >> name. > > We can also do full reads for (n-k) == 31, as we also compare the kth byte. I'll change and test. - PR Review Comment: https://

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Scott Gibbons
On Tue, 28 May 2024 20:29:38 GMT, Sandhya Viswanathan wrote: >> No. This is checking for a zero length haystack. The following compare >> checks for needle length longer than haystack, regardless of the value in >> each. The comparison is signed, so a haystack length of 0 with a needle >>

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 18:11:13 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1333: >> >>> 1331: >>> 1332: __ cmpq(nMinusK, 32); >>> 1333: __ jae_b(L_greaterThan32); >> >> Should this check be (n-k+1) >= 32? And so accordingly (n-k) >= 31 >> __ cmpq

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 17:30:24 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 278: >> >>> 276: __ bind(L_nextCheck); >>> 277: __ testq(haystack_len_p, haystack_len_p); >>> 278: __ je(L_zeroCheckFailed); >> >> This check could be removed as the next

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Tue, 28 May 2024 17:59:49 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 578: >> >>> 576: // helper jumps to L_checkRangeAndReturn with a (-1) return value. >>> 577: big_case_loop_helper(false, 0, L_checkRangeAndReturn, L_loopTop, >>> mask, h

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Scott Gibbons
On Tue, 28 May 2024 12:48:19 GMT, Sandhya Viswanathan wrote: >> Scott Gibbons has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Fix tests > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 239: > >> 237: // the needle siz

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-28 Thread Sandhya Viswanathan
On Sat, 25 May 2024 22:19:41 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only >> using AVX2 instructions. This change accelerates String.IndexOf on average >> 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark

Re: RFR: 8320448: Accelerate IndexOf using AVX2 [v43]

2024-05-25 Thread Scott Gibbons
> Re-write the IndexOf code without the use of the pcmpestri instruction, only > using AVX2 instructions. This change accelerates String.IndexOf on average > 1.3x for AVX2. The benchmark numbers: > > > BenchmarkScore > Latest