Re: RFR: 8287788: Implement a better allocator for downcalls [v18]

2025-02-01 Thread Matthias Ernst
On Thu, 23 Jan 2025 17:36:11 GMT, Matthias Ernst wrote: >> Certain signatures for foreign function calls (e.g. HVA return by value) >> require allocation of an intermediate buffer to adapt the FFM's to the >> native stub's calling convention. In the current implem

Re: RFR: 8287788: Implement a better allocator for downcalls [v18]

2025-01-31 Thread Matthias Ernst
On Thu, 23 Jan 2025 17:36:11 GMT, Matthias Ernst wrote: >> Certain signatures for foreign function calls (e.g. HVA return by value) >> require allocation of an intermediate buffer to adapt the FFM's to the >> native stub's calling convention. In the current implem

Integrated: 8287788: Implement a better allocator for downcalls

2025-01-27 Thread Matthias Ernst
On Wed, 15 Jan 2025 21:39:05 GMT, Matthias Ernst wrote: > Certain signatures for foreign function calls (e.g. HVA return by value) > require allocation of an intermediate buffer to adapt the FFM's to the native > stub's calling convention. In the current implementat

Re: RFR: 8287788: Implement a better allocator for downcalls [v18]

2025-01-27 Thread Matthias Ernst
On Thu, 23 Jan 2025 17:36:11 GMT, Matthias Ernst wrote: >> Certain signatures for foreign function calls (e.g. HVA return by value) >> require allocation of an intermediate buffer to adapt the FFM's to the >> native stub's calling convention. In the current implem

Re: RFR: 8287788: Implement a better allocator for downcalls [v18]

2025-01-23 Thread Matthias Ernst
10 33.892 ? 0.034 ns/op > > After: > BenchmarkMode Cnt Score Error Units > CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op > CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op > > > `-prof gc` also shows that the new call path is fully scalar

Re: RFR: 8287788: Implement a better allocator for downcalls [v17]

2025-01-23 Thread Matthias Ernst
On Thu, 23 Jan 2025 17:14:56 GMT, Jorn Vernee wrote: > test is timing out I think it's the stress test, the starting Thread sleeps and never gets rescheduled because it's starved out by the others. I can repro. Need to strategically place yields or interrupt the competitors. - PR

Re: RFR: 8287788: Implement a better allocator for downcalls [v17]

2025-01-23 Thread Matthias Ernst
10 33.892 ? 0.034 ns/op > > After: > BenchmarkMode Cnt Score Error Units > CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op > CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op > > > `-prof gc` also shows that the new call path is fully scalar

Re: RFR: 8287788: Implement a better allocator for downcalls [v8]

2025-01-23 Thread Matthias Ernst
On Thu, 23 Jan 2025 15:14:30 GMT, Maurizio Cimadamore wrote: >> A simpler (maybe interim) solution: if the requesting thread is not virtual, >> use the cache (and use a confined arena). Otherwise use a brand new confined >> arena. > > Yet another option would be to use a confined or shared are

Re: RFR: 8287788: Implement a better allocator for downcalls [v8]

2025-01-23 Thread Matthias Ernst
On Thu, 23 Jan 2025 14:26:23 GMT, Maurizio Cimadamore wrote: >> Ah no, the allocate fails of course: >> >> java.lang.WrongThreadException: Attempted access outside owning thread >> at >> java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:322) >> at >

Re: RFR: 8287788: Implement a better allocator for downcalls [v16]

2025-01-23 Thread Matthias Ernst
On Thu, 23 Jan 2025 12:37:16 GMT, Maurizio Cimadamore wrote: >> Matthias Ernst has updated the pull request incrementally with four >> additional commits since the last revision: >> >> - test deep linker stack >> - Merge remote-tracking branch 'origin/mer

Re: RFR: 8287788: Implement a better allocator for downcalls [v8]

2025-01-23 Thread Matthias Ernst
On Thu, 23 Jan 2025 12:59:53 GMT, Matthias Ernst wrote: >>> the shared memory segment is confined on the carrier thread >> >> But is it? When the CarrierThreadLocal is initialized, we may be executing >> in a VT and Arena.ofConfined will confine to it.

Re: RFR: 8287788: Implement a better allocator for downcalls [v8]

2025-01-23 Thread Matthias Ernst
On Thu, 23 Jan 2025 12:50:41 GMT, Matthias Ernst wrote: >> So: >> * the shared memory segment is confined on the carrier thread >> * allocation requests need to reinterpret segment slices to the arena (which >> is associated with the requesting thread, not the carrier)

Re: RFR: 8287788: Implement a better allocator for downcalls [v8]

2025-01-23 Thread Matthias Ernst
On Thu, 23 Jan 2025 12:27:53 GMT, Maurizio Cimadamore wrote: > the shared memory segment is confined on the carrier thread But is it? When the CarrierThreadLocal is initialized, we may be executing in a VT and Arena.ofConfined will confine to it. We'd need something like an Arena.ofCarrierCon

Re: RFR: 8287788: Implement a better allocator for downcalls [v14]

2025-01-23 Thread Matthias Ernst
On Wed, 22 Jan 2025 20:05:25 GMT, Jorn Vernee wrote: >> Matthias Ernst has updated the pull request incrementally with one >> additional commit since the last revision: >> >> (c) > > test/jdk/java/foreign/TestBufferStack.java line 122: > >> 120:

Re: RFR: 8287788: Implement a better allocator for downcalls [v13]

2025-01-23 Thread Matthias Ernst
On Wed, 22 Jan 2025 17:06:22 GMT, Jorn Vernee wrote: >> Matthias Ernst has updated the pull request incrementally with one >> additional commit since the last revision: >> >> an attempt at a stress test > > test/jdk/java/foreign/TestBufferStack.java line 9: >

Re: RFR: 8287788: Implement a better allocator for downcalls [v16]

2025-01-23 Thread Matthias Ernst
10 33.892 ? 0.034 ns/op > > After: > BenchmarkMode Cnt Score Error Units > CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op > CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op > > > `-prof gc` also shows that the new call path is fully

Re: RFR: 8287788: Implement a better allocator for downcalls [v15]

2025-01-22 Thread Matthias Ernst
10 33.892 ? 0.034 ns/op > > After: > BenchmarkMode Cnt Score Error Units > CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op > CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op > > > `-prof gc` also shows that the new call path is fully scalar

Re: RFR: 8287788: Implement a better allocator for downcalls [v14]

2025-01-22 Thread Matthias Ernst
10 33.892 ? 0.034 ns/op > > After: > BenchmarkMode Cnt Score Error Units > CallOverheadByValue.byPtravgt 30 3.311 ? 0.034 ns/op > CallOverheadByValue.byValue avgt 30 6.143 ? 0.053 ns/op > > > `-prof gc` also shows that the new call path is fully scalar

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v12]

2025-01-22 Thread Matthias Ernst
0.152 ns/op > CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op > > After: > Benchmark Mode CntScoreError Units > CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op > CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op > > >

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v11]

2025-01-22 Thread Matthias Ernst
0.152 ns/op > CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op > > After: > Benchmark Mode CntScoreError Units > CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op > CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op > > >

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v9]

2025-01-22 Thread Matthias Ernst
On Wed, 22 Jan 2025 12:36:17 GMT, Matthias Ernst wrote: >> On another note: in principle if a Frame is not the latest returned in a >> given thread, it is not safe to allow its allocation method (and probably >> close too) to succeed. Consider this case: >>

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v9]

2025-01-22 Thread Matthias Ernst
On Wed, 22 Jan 2025 11:05:14 GMT, Maurizio Cimadamore wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/BufferStack.java line >> 38: >> >>> 36: @SuppressWarnings("restricted") >>> 37: public MemorySegment allocate(long byteSize, long >>> byteAlignment) { >>> 38:

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v9]

2025-01-22 Thread Matthias Ernst
On Wed, 22 Jan 2025 10:40:07 GMT, Maurizio Cimadamore wrote: >> Matthias Ernst has updated the pull request incrementally with one >> additional commit since the last revision: >> >> --unnecessary annotations > > src/java.base/share/classes/jdk/internal/foreig

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v8]

2025-01-22 Thread Matthias Ernst
On Wed, 22 Jan 2025 11:50:10 GMT, Jorn Vernee wrote: >> I'm told that TerminatingThreadLocal runs the "terminate" action for an >> object T from the same thread T refers to. So, in principle, using a >> TerminatingThreadLocal + confined arena should be ok. >> >> If that works, I'd suggest to c

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v10]

2025-01-22 Thread Matthias Ernst
0.152 ns/op > CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op > > After: > Benchmark Mode CntScoreError Units > CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op > CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op > > >

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v9]

2025-01-22 Thread Matthias Ernst
0.152 ns/op > CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op > > After: > Benchmark Mode CntScoreError Units > CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op > CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op > > >

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v8]

2025-01-22 Thread Matthias Ernst
On Wed, 22 Jan 2025 09:57:15 GMT, Matthias Ernst wrote: >> Certain signatures for foreign function calls (e.g. HVA return by value) >> require allocation of an intermediate buffer to adapt the FFM's to the >> native stub's calling convention. In the current implem

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v8]

2025-01-22 Thread Matthias Ernst
On Wed, 22 Jan 2025 09:57:15 GMT, Matthias Ernst wrote: >> Certain signatures for foreign function calls (e.g. HVA return by value) >> require allocation of an intermediate buffer to adapt the FFM's to the >> native stub's calling convention. In the current implem

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v8]

2025-01-22 Thread Matthias Ernst
0.152 ns/op > CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op > > After: > Benchmark Mode CntScoreError Units > CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op > CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op > > >

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v6]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 18:39:06 GMT, Jorn Vernee wrote: >> Matthias Ernst has updated the pull request incrementally with one >> additional commit since the last revision: >> >> whitespace :scream: > > test/jdk/java/foreign/CallBufferCacheTest.java line 95

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v7]

2025-01-20 Thread Matthias Ernst
0.152 ns/op > CallOverheadByValue.byValue avgt 10 33.892 ? 0.034 ns/op > > After: > Benchmark Mode CntScoreError Units > CallOverheadByValue.byPtravgt 10 3.291 ? 0.031 ns/op > CallOverheadByValue.byValue avgt 10 5.464 ? 0.007 ns/op > > >

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v6]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 18:33:55 GMT, Jorn Vernee wrote: >> test/micro/org/openjdk/bench/java/lang/foreign/CallOverheadByValue.java line >> 54: >> >>> 52: @State(org.openjdk.jmh.annotations.Scope.Thread) >>> 53: @OutputTimeUnit(TimeUnit.NANOSECONDS) >>> 54: @Fork(value = 1, jvmArgs = {"--enable-nat

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v3]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 18:15:11 GMT, Jorn Vernee wrote: >> That was my original version, but this proved to be faster (albeit very >> little, O(.5ns)). I can't really explain why, that's above my paygrade, but >> one thing that comes to mind when storing references is that there's might >> be a G

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v6]

2025-01-20 Thread Matthias Ernst
ns/op <= > ### > PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op > PointsAlloc.jni_long_allocavgt 30 24.885 ? 2.461 ns/op > PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op > > > `-prof gc` also shows that the ne

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v3]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 17:27:40 GMT, Jorn Vernee wrote: >> Matthias Ernst has updated the pull request incrementally with three >> additional commits since the last revision: >> >> - shift api boundary >> - move bench >> - revert formatting > > src/ja

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v4]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 17:34:45 GMT, Maurizio Cimadamore wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/CallBufferCache.java >> line 112: >> >>> 110: >>> 111: @SuppressWarnings("restricted") >>> 112: public static MemorySegment acquireOrAllocate(long requestedSize) { >>

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v3]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 17:22:09 GMT, Jorn Vernee wrote: >> Matthias Ernst has updated the pull request incrementally with three >> additional commits since the last revision: >> >> - shift api boundary >> - move bench >> - revert formatting > > test

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v5]

2025-01-20 Thread Matthias Ernst
ns/op <= > ### > PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op > PointsAlloc.jni_long_allocavgt 30 24.885 ? 2.461 ns/op > PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op > > > `-prof gc` also shows that the ne

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v2]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 15:03:37 GMT, Maurizio Cimadamore wrote: >> Matthias Ernst has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Implementation notes. > > src/java.base/share/classes/jdk/internal/foreign/ab

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v4]

2025-01-20 Thread Matthias Ernst
ns/op <= > ### > PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op > PointsAlloc.jni_long_allocavgt 30 24.885 ? 2.461 ns/op > PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op > > > `-prof gc` also shows that the ne

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v2]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 15:00:55 GMT, Maurizio Cimadamore wrote: >> Matthias Ernst has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Implementation notes. > > src/java.base/share/classes/jdk/internal/foreign/ab

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v2]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 14:06:33 GMT, Jorn Vernee wrote: >> Matthias Ernst has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Implementation notes. > > test/micro/org/openjdk/bench/java/lang/foreign/points/PointsA

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v2]

2025-01-20 Thread Matthias Ernst
On Mon, 20 Jan 2025 15:00:14 GMT, Maurizio Cimadamore wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/SharedUtils.java line >> 396: >> >>> 394: long address = fromCache != 0 ? fromCache : >>> CallBufferCache.allocate(bufferSize); >>> 395: return new >>> Bounded

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v3]

2025-01-20 Thread Matthias Ernst
ns/op <= > ### > PointsAlloc.jni_ByteBuffer_alloc avgt 30 211.161 ? 23.284 ns/op > PointsAlloc.jni_long_allocavgt 30 24.885 ? 2.461 ns/op > PointsAlloc.panama_alloc avgt 30 26.905 ? 1.935 ns/op > > > `-prof gc` also shows that the

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations [v2]

2025-01-19 Thread Matthias Ernst
try { > nativeStub.invoke(tmp); // leaves v0, v1 in tmp > MemorySegment result = a.allocate(16); > result.setDouble(0, tmp.getDouble(0)); > result.setDouble(8, tmp.getDouble(16)); > return result; >... Matthias Ernst has updated the pull request i

Re: RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations

2025-01-19 Thread Matthias Ernst
On Fri, 17 Jan 2025 14:58:37 GMT, Jorn Vernee wrote: > Could you add the benchmark you're using to the PR as well? Done. I slotted it into the "points" BM suite, alas I had to define another "DoublePoint" struct, though, since the existing int/int pair gets packed into a long. Full disclosur

RFR: 8287788: reuse intermediate segments allocated during FFM stub invocations

2025-01-19 Thread Matthias Ernst
Certain signatures for foreign function calls require allocation of an intermediate buffer to adapt the FFM's to the native stub's calling convention ("needsReturnBuffer"). In the current implementation, this buffer is malloced and freed on every FFM invocation, a non-negligible overhead. Sampl

Re: RFR: 8347408: Create an internal method handle adapter for system calls with errno [v16]

2025-01-16 Thread Matthias Ernst
On Thu, 16 Jan 2025 14:16:15 GMT, Matthias Ernst wrote: >> Per Minborg has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Remove unused class > > src/java.base/share/classes/jdk/internal/util/SingleElemen

Re: RFR: 8347408: Create an internal method handle adapter for system calls with errno [v16]

2025-01-16 Thread Matthias Ernst
On Thu, 16 Jan 2025 11:58:20 GMT, Per Minborg wrote: >> Going forward, converting older JDK code to use the relatively new FFM API >> requires system calls that can provide `errno` and the likes to explicitly >> allocate a MemorySegment to capture potential error states. This can lead to >> ne