Thanks again for the review.
As per your and Andrew Haley's comments, I have updated the webrev:
- used NOINLINE
- used julong
- deleted the block of unused code.
Please let me know what you think.
updated webrev:
http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/01/01/
Thanks,
-Simon
On 2020-06-05 12:35 a.m., David Holmes wrote:
Hi Simon,
On 4/06/2020 11:35 pm, Simon Tooke wrote:
Hello, David, and thanks for the review!
I've responded to your comments below, and intend to post a new patch
for review today or tomorrow.
Thanks again,
-Simon
On 2020-06-03 2:06 a.m., David Holmes wrote:
Hi Simon,
On 23/05/2020 12:04 am, Sean Mullan wrote:
Cross-posting to hotspot-dev for additional review since the code
changes are in hotspot.
--Sean
On 5/21/20 1:24 PM, Simon Tooke wrote:
Hello,
I'd like to request a review for:
JBS: https://bugs.openjdk.java.net/browse/JDK-8243114
Webrev:
http://cr.openjdk.java.net/~stooke/webrevs/jdk-8243114-jdk/00/00/
This change implements the given intrinsics on Windows.
The Windows toolchain has no 128 bit integer types, and no inline
asm (on x64 and Arm). In addition, 'long' is 4 bytes, not 8, as
it is with gcc. This patch had to change some of the linux
implementation to account for these limitations.
I can't really comment on the intrinsics themselves but a couple of
suggestions:
- use explicitly sized types e.g. uint64_t instead of unsigned long
long
Yes, this hurt to type. A previous review suggested using julong, is
that acceptable to you?
j* types should only be used when dealing with values that come from
or go to Java. Obviously julong is a misnomer as Java doesn't have an
unsigned type, but in general this is something we are trying to fix
up in the codebase. If these arrays are extracted from Java arrays
then using a j* type would be appropriate, but then I would expect
jlong, unless the algorithm explicitly requires unsigned types? If so
julong would be acceptable.
(an aside: I'm now wondering if there is other code in the JDK that
assumes long is 64bits - which is not true on all platforms...)
There has been such code, but hopefully there is no remaining actively
used code with that bug. There are using of "long" that should be
eradicated (there's an open JBS issue for that as I recall) but the
remaining uses don't seem to require the long be 64-bit.
- use the existing NOINLINE macro for the _declspec(noinline)
Thanks, will do.
The conditional compilation in this code has me quite confused.
Looking at the existing code we have:
3680 #ifndef _WINDOWS
3681
3682 #define ASM_SUBTRACT
3683
3684 #ifdef ASM_SUBTRACT
...
3702 #else // ASM_SUBTRACT
...
3719 #endif // ! ASM_SUBTRACT
So all the above code is only built on not-Windows, and we
unconditionally define ASM_SUBTRACT, so lines 3702 to 3719 appear to
be dead code! I'm not at all sure whether that is actually a bug and
the windows ifndef should have had an endif after line 3682; or
whether we can just simplify the code.
The dead code existed prior to this patch, so I wasn't proposing
removing it. I'm happy to do so if that's for the best. As far as I
can tell, it's for implementing these intrinsics on gcc platforms
without assembler.
AFAICS Andrew originally had the ASM_SUBTRACT parts, with the
unconditional #define, but there was no explanation in the review
thread as to why the unused code remained present. Then before
integration all the code was wrapped by the ifndef Windows to exclude
it from Windows.
I think it needs to be fixed as you are making changes to the Windows
part and it is very hard to establish how the dead code should look in
that case.
Thanks,
David
Thanks,
David
My gratitude for Andrew Haley for doing the heavy lifting at the
core of this patch.
The patch, if accepted, will be offered to 11u as a potential
backport. The changes apply cleanly modulo some line number changes.
As for the speedup, this test case:
BigInteger base = BigInteger.ONE.shiftLeft(1024);
long count = LongStream.rangeClosed(2, 100_000)
.mapToObj(n -> BigInteger.valueOf(n).add(base))
.filter(i -> i.isProbablePrime(50))
.count();
goes from 1 minute 20 seconds down to about 35 seconds om my VM,
over multiple runs. As is the case on other platforms where the
intrinsics are supported, they will be enabled by default on Windows.
Thank you for your time,
-Simon Tooke
Principal Software Engineer,
Red Hat Canada