On Sat, 27 Mar 2021 08:58:03 GMT, Dong Bo wrote:
> In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding.
> Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic idea
> can be found at
> http://0x80.pl/articles/base64-simd-neon.html#encodi
On Thu, 8 Apr 2021 08:28:53 GMT, Andrew Haley wrote:
>> Dong Bo has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> reduce unnecessary memory write traffic in non-SIMD code
>
> Marked as reviewed by aph (Revi
On Wed, 7 Apr 2021 09:53:36 GMT, Andrew Haley wrote:
>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5829:
>>
>>> 5827: __ strb(r14, __ post(dst, 1));
>>> 5828: __ strb(r15, __ post(dst, 1));
>>> 5829: __ strb(r13, __ post(dst, 1));
>>
>> I think this sequence should be 4
0
> 923.749 ± 0.389 ns/op
> Base64Decode.testBase64MIMEDecode 4512 avgt 10
> 4000.725 ± 2.519 ns/op
> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10
> 7674.994 ± 9.281 ns/op
> Base64Decode.testBase64MIMEDecode 4
On Tue, 6 Apr 2021 14:04:07 GMT, Andrew Haley wrote:
>> Dong Bo has updated the pull request with a new target base due to a merge
>> or a rebase. The pull request now contains 10 commits:
>>
>> - conflicts resolved
>> - Merge branch 'master'
0
> 923.749 ± 0.389 ns/op
> Base64Decode.testBase64MIMEDecode 4512 avgt 10
> 4000.725 ± 2.519 ns/op
> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10
> 7674.994 ± 9.281 ns/op
> Base64Decode.testBase64MIMEDecode 4 2 avgt
On Tue, 6 Apr 2021 09:44:28 GMT, Andrew Haley wrote:
> > It would be no benefits to use SIMD for this case, so the stub use no-simd
> > instructions for MIME encoded data now.
>
> What is the reasoning here? Sure, there can be illegal characters at the
> start, but what if there are not? The g
On Fri, 2 Apr 2021 10:17:57 GMT, Andrew Haley wrote:
>> PING... Any suggestions on the updated commit?
>
>> PING... Any suggestions on the updated commit?
>
> Once you reply to the comments, sure.
>
> Are there any existing test cases for failing inputs?
>
I added one, the error character is in
0
> 923.749 ± 0.389 ns/op
> Base64Decode.testBase64MIMEDecode 4512 avgt 10
> 4000.725 ± 2.519 ns/op
> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10
> 7674.994 ± 9.281 ns/op
> Base64Decode.testBase64MIMEDecode 4 2
0
> 923.749 ± 0.389 ns/op
> Base64Decode.testBase64MIMEDecode 4512 avgt 10
> 4000.725 ± 2.519 ns/op
> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10
> 7674.994 ± 9.281 ns/op
> Base64Decode.testBase64MIMEDecode 4 2 avgt
On Tue, 30 Mar 2021 03:24:16 GMT, Dong Bo wrote:
>>> I think I can rewrite this part as loops.
>>> With an intial implemention, we can have almost half of the code size
>>> reduced (1312B -> 748B). Sounds OK to you?
>>
>> Sounds great, but I'm
0
> 923.749 ± 0.389 ns/op
> Base64Decode.testBase64MIMEDecode 4512 avgt 10
> 4000.725 ± 2.519 ns/op
> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10
> 7674.994 ± 9.281 ns/op
> Base64Decode.testBase64MIMEDecode 4 2 avgt
On Mon, 29 Mar 2021 08:38:59 GMT, Andrew Haley wrote:
> > With an intial implemention, we can have almost half of the code size
> > reduced (1312B -> 748B). Sounds OK to you?
>
> Sounds great, but I'm still somewhat concerned that the non-SIMD case only
> offers 3-12% performance gain. Make it
0
> 923.749 ± 0.389 ns/op
> Base64Decode.testBase64MIMEDecode 4512 avgt 10
> 4000.725 ± 2.519 ns/op
> Base64Decode.testBase64MIMEDecode 4 1000 avgt 10
> 7674.994 ± 9.281 ns/op
> Base64Decode.testBase64MIMEDecode 4 2 avgt
On Mon, 29 Mar 2021 03:15:57 GMT, Nick Gasson wrote:
>> Firstly, I wonder how important this is for most applications. I don't
>> actually know, but let's put that to one side.
>>
>> There's a lot of unrolling, particularly in the non-SIMD case. Please
>> consider taking out some of the unrol
On Mon, 29 Mar 2021 03:15:57 GMT, Nick Gasson wrote:
>> Firstly, I wonder how important this is for most applications. I don't
>> actually know, but let's put that to one side.
>>
>> There's a lot of unrolling, particularly in the non-SIMD case. Please
>> consider taking out some of the unrol
In JDK-8248188, IntrinsicCandidate and API is added for Base64 decoding.
Base64 decoding can be improved on aarch64 with ld4/tbl/tbx/st3, a basic idea
can be found at
http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
Patch passed jtreg tier1-3 tests with linux-aarch64-server-fast
This patch optimizes vectorial rotate (immediate) on aarch64 with shift and
insert instructions, i.e. SLI and SRI.
Patch passed jtreg tier1-3 tests with linux-aarch64-server-fastdebug build.
Tests under `test/hotspot/jtreg/compiler/c2/cr6340864/` runned specially for
the correctness and passed.
This supports for floating-point absolute difference instructions, i.e. FABD
scalar/vector.
Verified with linux-aarch64-server-release, tier1-3.
Added a JMH micro
`test/micro/org/openjdk/bench/vm/compiler/FloatingScalarVectorAbsDiff.java` for
performance test.
The FABD (scalar), the performan
Base64.encodeBlock stub is implemented for x86_64.
We can also do the same thing for aarch64 with SIMD LD3/ST4/TBL.
A basic idea can be found here:
http://0x80.pl/articles/base64-simd-neon.html#encoding-quadwords.
Patch passed jtreg tier1-3 tests with linux-aarch64-server-release build.
Tests in
On Mon, 26 Oct 2020 09:19:45 GMT, Dong Bo wrote:
> BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not
> intrinsified on aarch64, which have been done on x86_64.
> We can implement them via USHL NEON instruction (register), which handles
> four integers one
On Tue, 27 Oct 2020 16:45:42 GMT, Andrew Haley wrote:
>> Dong Bo has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> minor improvements for small BigIntegers
>
> Marked as reviewed by aph (Reviewer).
@th
On Mon, 26 Oct 2020 10:40:27 GMT, Andrew Haley wrote:
>> Dong Bo has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> minor improvements for small BigIntegers
>
> src/hotspot/cpu/aarch64/stubGenerator_aar
On Mon, 26 Oct 2020 09:19:45 GMT, Dong Bo wrote:
> BigInteger.shiftRightImplWorker and BigInteger.shiftLeftImplWorker are not
> intrinsified on aarch64, which have been done on x86_64.
> We can implement them via USHL NEON instruction (register), which handles
> four integers one
± 32.170 ns/op
> BigIntegers.testLargeToString avgt 25 124.635 ± 2.157 ns/op
> **BigIntegers.testLeftShift avgt 25 551.710 ± 7.836 ns/op**
> BigIntegers.testMultiply avgt 25 5869.401 ± 54.803 ns/op
> **BigIntegers.testRightShift avgt 25 186.896 ± 6.378 ns/
25 matches
Mail list logo