Integrated: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-04-08 Thread Dong Bo
On Sat, 27 Mar 2021 08:58:03 GMT, Dong Bo wrote: > In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding. > Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic idea > can be found at > http://0x80.pl/articles/base64-simd-neon.html#encodi

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v7]

2021-04-08 Thread Dong Bo
On Thu, 8 Apr 2021 08:28:53 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional >> commit since the last revision: >> >> reduce unnecessary memory write traffic in non-SIMD code > > Marked as reviewed by aph (Revi

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v6]

2021-04-07 Thread Dong Bo
On Wed, 7 Apr 2021 09:53:36 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5829: >> >>> 5827: __ strb(r14, __ post(dst, 1)); >>> 5828: __ strb(r15, __ post(dst, 1)); >>> 5829: __ strb(r13, __ post(dst, 1)); >> >> I think this sequence should be 4

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v7]

2021-04-07 Thread Dong Bo
0 > 923.749 ± 0.389 ns/op > Base64Decode.testBase64MIMEDecode 4512 avgt 10 > 4000.725 ± 2.519 ns/op > Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 > 7674.994 ± 9.281 ns/op > Base64Decode.testBase64MIMEDecode 4

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v5]

2021-04-06 Thread Dong Bo
On Tue, 6 Apr 2021 14:04:07 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request with a new target base due to a merge >> or a rebase. The pull request now contains 10 commits: >> >> - conflicts resolved >> - Merge branch 'master'

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v6]

2021-04-06 Thread Dong Bo
0 > 923.749 ± 0.389 ns/op > Base64Decode.testBase64MIMEDecode 4512 avgt 10 > 4000.725 ± 2.519 ns/op > Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 > 7674.994 ± 9.281 ns/op > Base64Decode.testBase64MIMEDecode 4 2 avgt

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-04-06 Thread Dong Bo
On Tue, 6 Apr 2021 09:44:28 GMT, Andrew Haley wrote: > > It would be no benefits to use SIMD for this case, so the stub use no-simd > > instructions for MIME encoded data now. > > What is the reasoning here? Sure, there can be illegal characters at the > start, but what if there are not? The g

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-04-06 Thread Dong Bo
On Fri, 2 Apr 2021 10:17:57 GMT, Andrew Haley wrote: >> PING... Any suggestions on the updated commit? > >> PING... Any suggestions on the updated commit? > > Once you reply to the comments, sure. > > Are there any existing test cases for failing inputs? > I added one, the error character is in

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v5]

2021-04-06 Thread Dong Bo
0 > 923.749 ± 0.389 ns/op > Base64Decode.testBase64MIMEDecode 4512 avgt 10 > 4000.725 ± 2.519 ns/op > Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 > 7674.994 ± 9.281 ns/op > Base64Decode.testBase64MIMEDecode 4 2

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v4]

2021-04-05 Thread Dong Bo
0 > 923.749 ± 0.389 ns/op > Base64Decode.testBase64MIMEDecode 4512 avgt 10 > 4000.725 ± 2.519 ns/op > Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 > 7674.994 ± 9.281 ns/op > Base64Decode.testBase64MIMEDecode 4 2 avgt

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-04-02 Thread Dong Bo
On Tue, 30 Mar 2021 03:24:16 GMT, Dong Bo wrote: >>> I think I can rewrite this part as loops. >>> With an intial implemention, we can have almost half of the code size >>> reduced (1312B -> 748B). Sounds OK to you? >> >> Sounds great, but I'm

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v3]

2021-04-01 Thread Dong Bo
0 > 923.749 ± 0.389 ns/op > Base64Decode.testBase64MIMEDecode 4512 avgt 10 > 4000.725 ± 2.519 ns/op > Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 > 7674.994 ± 9.281 ns/op > Base64Decode.testBase64MIMEDecode 4 2 avgt

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-03-29 Thread Dong Bo
On Mon, 29 Mar 2021 08:38:59 GMT, Andrew Haley wrote: > > With an intial implemention, we can have almost half of the code size > > reduced (1312B -> 748B). Sounds OK to you? > > Sounds great, but I'm still somewhat concerned that the non-SIMD case only > offers 3-12% performance gain. Make it

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic [v2]

2021-03-29 Thread Dong Bo
0 > 923.749 ± 0.389 ns/op > Base64Decode.testBase64MIMEDecode 4512 avgt 10 > 4000.725 ± 2.519 ns/op > Base64Decode.testBase64MIMEDecode 4 1000 avgt 10 > 7674.994 ± 9.281 ns/op > Base64Decode.testBase64MIMEDecode 4 2 avgt

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-03-28 Thread Dong Bo
On Mon, 29 Mar 2021 03:15:57 GMT, Nick Gasson wrote: >> Firstly, I wonder how important this is for most applications. I don't >> actually know, but let's put that to one side. >> >> There's a lot of unrolling, particularly in the non-SIMD case. Please >> consider taking out some of the unrol

Re: RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-03-28 Thread Dong Bo
On Mon, 29 Mar 2021 03:15:57 GMT, Nick Gasson wrote: >> Firstly, I wonder how important this is for most applications. I don't >> actually know, but let's put that to one side. >> >> There's a lot of unrolling, particularly in the non-SIMD case. Please >> consider taking out some of the unrol

RFR: 8256245: AArch64: Implement Base64 decoding intrinsic

2021-03-27 Thread Dong Bo
In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding. Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic idea can be found at http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. Patch passed jtreg tier1-3 tests with linux-aarch64-server-fast

RFR: 8256820: AArch64: Optimize vector rotate (immediate) with shift and insert instructions

2020-12-13 Thread Dong Bo
This patch optimizes vectorial rotate (immediate) on aarch64 with shift and insert instructions, i.e. SLI and SRI. Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build. Tests under `test/hotspot/jtreg/compiler/c2/cr6340864/` runned specially for the correctness and passed.

RFR: 8256318: AArch64: Add support for floating-point absolute difference

2020-11-13 Thread Dong Bo
This supports for floating-point absolute difference instructions, i.e. FABD scalar/vector. Verified with linux-aarch64-server-release, tier1-3. Added a JMH micro `test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for performance test. The FABD (scalar), the performan

RFR: 8255625: AArch64: Implement Base64.encodeBlock accelerator/intrinsic

2020-11-01 Thread Dong Bo
Base64.encodeBlock stub is implemented for x86_64. We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL. A basic idea can be found here: http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords. Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build. Tests in

Integrated: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic

2020-10-28 Thread Dong Bo
On Mon, 26 Oct 2020 09:19:45 GMT, Dong Bo wrote: > BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not > intrinsified on aarch64, which have been done on x86_64. > We can implement them via USHL NEON instruction (register), which handles > four integers one

Re: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic [v2]

2020-10-28 Thread Dong Bo
On Tue, 27 Oct 2020 16:45:42 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional >> commit since the last revision: >> >> minor improvements for small BigIntegers > > Marked as reviewed by aph (Reviewer). @th

Re: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic [v2]

2020-10-27 Thread Dong Bo
On Mon, 26 Oct 2020 10:40:27 GMT, Andrew Haley wrote: >> Dong Bo has updated the pull request incrementally with one additional >> commit since the last revision: >> >> minor improvements for small BigIntegers > > src/hotspot/cpu/aarch64/stubGenerator_aar

Re: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic

2020-10-26 Thread Dong Bo
On Mon, 26 Oct 2020 09:19:45 GMT, Dong Bo wrote: > BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not > intrinsified on aarch64, which have been done on x86_64. > We can implement them via USHL NEON instruction (register), which handles > four integers one

Re: RFR: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic [v2]

2020-10-26 Thread Dong Bo
± 32.170 ns/op > BigIntegers.testLargeToString avgt 25 124.635 ± 2.157 ns/op > **BigIntegers.testLeftShift avgt 25 551.710 ± 7.836 ns/op** > BigIntegers.testMultiply avgt 25 5869.401 ± 54.803 ns/op > **BigIntegers.testRightShift avgt 25 186.896 ± 6.378 ns/