On Thu, 23 Jan 2025 17:36:11 GMT, Matthias Ernst wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value)
>> require allocation of an intermediate buffer to adapt the FFM's to the
>> native stub's calling convention. In the current implem
On Thu, 23 Jan 2025 17:36:11 GMT, Matthias Ernst wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value)
>> require allocation of an intermediate buffer to adapt the FFM's to the
>> native stub's calling convention. In the current implem
On Wed, 15 Jan 2025 21:39:05 GMT, Matthias Ernst wrote:
> Certain signatures for foreign function calls (e.g. HVA return by value)
> require allocation of an intermediate buffer to adapt the FFM's to the native
> stub's calling convention. In the current implementat
On Thu, 23 Jan 2025 17:36:11 GMT, Matthias Ernst wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value)
>> require allocation of an intermediate buffer to adapt the FFM's to the
>> native stub's calling convention. In the current implem
10 33.892 ? 0.034 ns/op
>
> After:
> BenchmarkMode Cnt Score Error Units
> CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op
> CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op
>
>
> `-prof gc` also shows that the new call path is fully scalar
On Thu, 23 Jan 2025 17:14:56 GMT, Jorn Vernee wrote:
> test is timing out
I think it's the stress test, the starting Thread sleeps and never gets
rescheduled because it's starved out by the others. I can repro. Need to
strategically place yields or interrupt the competitors.
-
PR
10 33.892 ? 0.034 ns/op
>
> After:
> BenchmarkMode Cnt Score Error Units
> CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op
> CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op
>
>
> `-prof gc` also shows that the new call path is fully scalar
On Thu, 23 Jan 2025 15:14:30 GMT, Maurizio Cimadamore
wrote:
>> A simpler (maybe interim) solution: if the requesting thread is not virtual,
>> use the cache (and use a confined arena). Otherwise use a brand new confined
>> arena.
>
> Yet another option would be to use a confined or shared are
On Thu, 23 Jan 2025 14:26:23 GMT, Maurizio Cimadamore
wrote:
>> Ah no, the allocate fails of course:
>>
>> java.lang.WrongThreadException: Attempted access outside owning thread
>> at
>> java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:322)
>> at
>
On Thu, 23 Jan 2025 12:37:16 GMT, Maurizio Cimadamore
wrote:
>> Matthias Ernst has updated the pull request incrementally with four
>> additional commits since the last revision:
>>
>> - test deep linker stack
>> - Merge remote-tracking branch 'origin/mer
On Thu, 23 Jan 2025 12:59:53 GMT, Matthias Ernst wrote:
>>> the shared memory segment is confined on the carrier thread
>>
>> But is it? When the CarrierThreadLocal is initialized, we may be executing
>> in a VT and Arena.ofConfined will confine to it.
On Thu, 23 Jan 2025 12:50:41 GMT, Matthias Ernst wrote:
>> So:
>> * the shared memory segment is confined on the carrier thread
>> * allocation requests need to reinterpret segment slices to the arena (which
>> is associated with the requesting thread, not the carrier)
On Thu, 23 Jan 2025 12:27:53 GMT, Maurizio Cimadamore
wrote:
> the shared memory segment is confined on the carrier thread
But is it? When the CarrierThreadLocal is initialized, we may be executing in a
VT and Arena.ofConfined will confine to it. We'd need something like an
Arena.ofCarrierCon
On Wed, 22 Jan 2025 20:05:25 GMT, Jorn Vernee wrote:
>> Matthias Ernst has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> (c)
>
> test/jdk/java/foreign/TestBufferStack.java line 122:
>
>> 120:
On Wed, 22 Jan 2025 17:06:22 GMT, Jorn Vernee wrote:
>> Matthias Ernst has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> an attempt at a stress test
>
> test/jdk/java/foreign/TestBufferStack.java line 9:
>
10 33.892 ? 0.034 ns/op
>
> After:
> BenchmarkMode Cnt Score Error Units
> CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op
> CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op
>
>
> `-prof gc` also shows that the new call path is fully
10 33.892 ? 0.034 ns/op
>
> After:
> BenchmarkMode Cnt Score Error Units
> CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op
> CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op
>
>
> `-prof gc` also shows that the new call path is fully scalar
10 33.892 ? 0.034 ns/op
>
> After:
> BenchmarkMode Cnt Score Error Units
> CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op
> CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op
>
>
> `-prof gc` also shows that the new call path is fully scalar
0.152 ns/op
> CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op
>
> After:
> Benchmark Mode CntScoreError Units
> CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op
> CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op
>
>
>
0.152 ns/op
> CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op
>
> After:
> Benchmark Mode CntScoreError Units
> CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op
> CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op
>
>
>
On Wed, 22 Jan 2025 12:36:17 GMT, Matthias Ernst wrote:
>> On another note: in principle if a Frame is not the latest returned in a
>> given thread, it is not safe to allow its allocation method (and probably
>> close too) to succeed. Consider this case:
>>
On Wed, 22 Jan 2025 11:05:14 GMT, Maurizio Cimadamore
wrote:
>> src/java.base/share/classes/jdk/internal/foreign/abi/BufferStack.java line
>> 38:
>>
>>> 36: @SuppressWarnings("restricted")
>>> 37: public MemorySegment allocate(long byteSize, long
>>> byteAlignment) {
>>> 38:
On Wed, 22 Jan 2025 10:40:07 GMT, Maurizio Cimadamore
wrote:
>> Matthias Ernst has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> --unnecessary annotations
>
> src/java.base/share/classes/jdk/internal/foreig
On Wed, 22 Jan 2025 11:50:10 GMT, Jorn Vernee wrote:
>> I'm told that TerminatingThreadLocal runs the "terminate" action for an
>> object T from the same thread T refers to. So, in principle, using a
>> TerminatingThreadLocal + confined arena should be ok.
>>
>> If that works, I'd suggest to c
0.152 ns/op
> CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op
>
> After:
> Benchmark Mode CntScoreError Units
> CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op
> CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op
>
>
>
0.152 ns/op
> CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op
>
> After:
> Benchmark Mode CntScoreError Units
> CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op
> CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op
>
>
>
On Wed, 22 Jan 2025 09:57:15 GMT, Matthias Ernst wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value)
>> require allocation of an intermediate buffer to adapt the FFM's to the
>> native stub's calling convention. In the current implem
On Wed, 22 Jan 2025 09:57:15 GMT, Matthias Ernst wrote:
>> Certain signatures for foreign function calls (e.g. HVA return by value)
>> require allocation of an intermediate buffer to adapt the FFM's to the
>> native stub's calling convention. In the current implem
0.152 ns/op
> CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op
>
> After:
> Benchmark Mode CntScoreError Units
> CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op
> CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op
>
>
>
On Mon, 20 Jan 2025 18:39:06 GMT, Jorn Vernee wrote:
>> Matthias Ernst has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> whitespace :scream:
>
> test/jdk/java/foreign/CallBufferCacheTest.java line 95
0.152 ns/op
> CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op
>
> After:
> Benchmark Mode CntScoreError Units
> CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op
> CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op
>
>
>
On Mon, 20 Jan 2025 18:33:55 GMT, Jorn Vernee wrote:
>> test/micro/org/openjdk/bench/java/lang/foreign/CallOverheadByValue.java line
>> 54:
>>
>>> 52: @State(org.openjdk.jmh.annotations.Scope.Thread)
>>> 53: @OutputTimeUnit(TimeUnit.NANOSECONDS)
>>> 54: @Fork(value = 1, jvmArgs = {"--enable-nat
On Mon, 20 Jan 2025 18:15:11 GMT, Jorn Vernee wrote:
>> That was my original version, but this proved to be faster (albeit very
>> little, O(.5ns)). I can't really explain why, that's above my paygrade, but
>> one thing that comes to mind when storing references is that there's might
>> be a G
ns/op <=
> ###
> PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op
> PointsAlloc.jni_long_allocavgt 30 24.885 ? 2.461 ns/op
> PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op
>
>
> `-prof gc` also shows that the ne
On Mon, 20 Jan 2025 17:27:40 GMT, Jorn Vernee wrote:
>> Matthias Ernst has updated the pull request incrementally with three
>> additional commits since the last revision:
>>
>> - shift api boundary
>> - move bench
>> - revert formatting
>
> src/ja
On Mon, 20 Jan 2025 17:34:45 GMT, Maurizio Cimadamore
wrote:
>> src/java.base/share/classes/jdk/internal/foreign/abi/CallBufferCache.java
>> line 112:
>>
>>> 110:
>>> 111: @SuppressWarnings("restricted")
>>> 112: public static MemorySegment acquireOrAllocate(long requestedSize) {
>>
On Mon, 20 Jan 2025 17:22:09 GMT, Jorn Vernee wrote:
>> Matthias Ernst has updated the pull request incrementally with three
>> additional commits since the last revision:
>>
>> - shift api boundary
>> - move bench
>> - revert formatting
>
> test
ns/op <=
> ###
> PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op
> PointsAlloc.jni_long_allocavgt 30 24.885 ? 2.461 ns/op
> PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op
>
>
> `-prof gc` also shows that the ne
On Mon, 20 Jan 2025 15:03:37 GMT, Maurizio Cimadamore
wrote:
>> Matthias Ernst has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Implementation notes.
>
> src/java.base/share/classes/jdk/internal/foreign/ab
ns/op <=
> ###
> PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op
> PointsAlloc.jni_long_allocavgt 30 24.885 ? 2.461 ns/op
> PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op
>
>
> `-prof gc` also shows that the ne
On Mon, 20 Jan 2025 15:00:55 GMT, Maurizio Cimadamore
wrote:
>> Matthias Ernst has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Implementation notes.
>
> src/java.base/share/classes/jdk/internal/foreign/ab
On Mon, 20 Jan 2025 14:06:33 GMT, Jorn Vernee wrote:
>> Matthias Ernst has updated the pull request incrementally with one
>> additional commit since the last revision:
>>
>> Implementation notes.
>
> test/micro/org/openjdk/bench/java/lang/foreign/points/PointsA
On Mon, 20 Jan 2025 15:00:14 GMT, Maurizio Cimadamore
wrote:
>> src/java.base/share/classes/jdk/internal/foreign/abi/SharedUtils.java line
>> 396:
>>
>>> 394: long address = fromCache != 0 ? fromCache :
>>> CallBufferCache.allocate(bufferSize);
>>> 395: return new
>>> Bounded
ns/op <=
> ###
> PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op
> PointsAlloc.jni_long_allocavgt 30 24.885 ? 2.461 ns/op
> PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op
>
>
> `-prof gc` also shows that the
try {
> nativeStub.invoke(tmp); // leaves v0, v1 in tmp
> MemorySegment result = a.allocate(16);
> result.setDouble(0, tmp.getDouble(0));
> result.setDouble(8, tmp.getDouble(16));
> return result;
>...
Matthias Ernst has updated the pull request i
On Fri, 17 Jan 2025 14:58:37 GMT, Jorn Vernee wrote:
> Could you add the benchmark you're using to the PR as well?
Done. I slotted it into the "points" BM suite, alas I had to define another
"DoublePoint" struct, though, since the existing int/int pair gets packed into
a long.
Full disclosur
Certain signatures for foreign function calls require allocation of an
intermediate buffer to adapt the FFM's to the native stub's calling convention
("needsReturnBuffer"). In the current implementation, this buffer is malloced
and freed on every FFM invocation, a non-negligible overhead.
Sampl
On Thu, 16 Jan 2025 14:16:15 GMT, Matthias Ernst wrote:
>> Per Minborg has updated the pull request incrementally with one additional
>> commit since the last revision:
>>
>> Remove unused class
>
> src/java.base/share/classes/jdk/internal/util/SingleElemen
On Thu, 16 Jan 2025 11:58:20 GMT, Per Minborg wrote:
>> Going forward, converting older JDK code to use the relatively new FFM API
>> requires system calls that can provide `errno` and the likes to explicitly
>> allocate a MemorySegment to capture potential error states. This can lead to
>> ne
49 matches
Mail list logo