This patch contains the changes associated with the first incubation round of 
the foreign linker access API incubation
(see JEP 389 [1]). This work is meant to sit on top of the foreign memory 
access support (see JEP 393 [2] and
associated pull request [3]).

The main goal of this API is to provide a way to call native functions from 
Java code without the need of intermediate
JNI glue code. In order to do this, native calls are modeled through the 
MethodHandle API. I suggest reading the
writeup [4] I put together few weeks ago, which illustrates what the foreign 
linker support is, and how it should be
used by clients.

Disclaimer: the pull request mechanism isn't great at managing *dependent* 
reviews. For this reasons, I'm attaching a
webrev which contains only the differences between this PR and the memory 
access PR. I will be periodically uploading
new webrevs, as new iterations come out, to try and make the life of reviewers 
as simple as possible.

A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects 
of all the hotspot changes you see here,
and without their help, the foreign linker support wouldn't be what it is 
today. As usual, a big thank to Paul Sandoz,
who provided many insights (often by trying the bits first hand).

Thanks
Maurizio

Webrev:
http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev

Javadoc:

http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html

Specdiff (relative to [3]):

http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html

CSR:

https://bugs.openjdk.java.net/browse/JDK-8254232



### API Changes

The API changes are actually rather slim:

* `LibraryLookup`
  * This class allows clients to lookup symbols in native libraries; the 
interface is fairly simple; you can load a library
    by name, or absolute path, and then lookup symbols on that library.
* `FunctionDescriptor`
  * This is an abstraction that is very similar, in spirit, to `MethodType`; it 
is, at its core, an aggregate of memory
    layouts for the function arguments/return type. A function descriptor is 
used to describe the signature of a native
    function.
* `CLinker`
  * This is the real star of the show. A `CLinker` has two main methods: 
`downcallHandle` and `upcallStub`; the first takes
    a native symbol (as obtained from `LibraryLookup`), a `MethodType` and a 
`FunctionDescriptor` and returns a
    `MethodHandle` instance which can be used to call the target native symbol. 
The second takes an existing method handle,
    and a `FunctionDescriptor` and returns a new `MemorySegment` corresponding 
to a code stub allocated by the VM which
    acts as a trampoline from native code to the user-provided method handle. 
This is very useful for implementing upcalls.
   * This class also contains the various layout constants that should be used 
by clients when describing native signatures
     (e.g. `C_LONG` and friends); these layouts contain additional ABI 
classfication information (in the form of layout
     attributes) which is used by the runtime to *infer* how Java arguments 
should be shuffled for the native call to take
     place.
  * Finally, this class provides some helper functions e.g. so that clients can 
convert Java strings into C strings and
    back.
* `NativeScope`
  * This is an helper class which allows clients to group together logically 
related allocations; that is, rather than
    allocating separate memory segments using separate *try-with-resource* 
constructs, a `NativeScope` allows clients to
    use a _single_ block, and allocate all the required segments there. This is 
not only an usability boost, but also a
    performance boost, since not all allocation requests will be turned into 
`malloc` calls.
* `MemorySegment`
  * Only one method added here - namely `handoff(NativeScope)` which allows a 
segment to be transferred onto an existing
    native scope.

### Safety

The foreign linker API is intrinsically unsafe; many things can go wrong when 
requesting a native method handle. For
instance, the description of the native signature might be wrong (e.g. have too 
many arguments) - and the runtime has,
in the general case, no way to detect such mismatches. For these reasons, 
obtaining a `CLinker` instance is
a *restricted* operation, which can be enabled by specifying the usual JDK 
property `-Dforeign.restricted=permit` (as
it's the case for other restricted method in the foreign memory API).

### Implementation changes

The Java changes associated with `LibraryLookup` are relative straightforward; 
the only interesting thing to note here
is that library loading does _not_ depend on class loaders, so `LibraryLookup` 
is not subject to the same restrictions
which apply to JNI library loading (e.g. same library cannot be loaded by 
different classloaders).

As for `NativeScope` the changes are again relatively straightforward; it is an 
API which sits neatly on top of the
foreign meory access API, providing some kind of allocation service which 
shares the same underlying memory segment(s),
and turns an allocation request into a segment slice, which is a much less 
expensive operation. `NativeScope` comes in
two variants: there are native scopes for which the allocation size is known a 
priori, and native scopes which can
grow - these two schemes are implemented by two separate subclasses of 
`AbstractNativeScopeImpl`.

Of course the bulk of the changes are to support the `CLinker` downcall/upcall 
routines. These changes cut pretty deep
into the JVM; I'll briefly summarize the goal of some of this changes - for 
further details, Jorn has put together a
detailed writeup which explains the rationale behind the VM support, with some 
references to the code [5].

The main idea behind foreign linker is to infer, given a Java method type 
(expressed as a `MethodType` instance) and
the description of the signature of a native function (expressed as a 
`FunctionDescriptor` instance) a _recipe_ that
can be used to turn a Java call into the corresponding native call targeting 
the requested native function.

This inference scheme can be defined in a pretty straightforward fashion by 
looking at the various ABI specifications
(for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). 
The various `CallArranger` classes, of
which we have a flavor for each supported platform, do exactly that kind of 
inference.

For the inference process to work, we need to attach extra information to 
memory layouts; it is no longer sufficient to
know e.g. that a layout is 32/64 bits - we need to know whether it is meant to 
represent a floating point value, or an
integral value; this knowledge is required because floating points are passed 
in different registers by most ABIs. For
this reason, `CLinker` offers a set of pre-baked, platform-dependent layout 
constants which contain the required
classification attributes (e.g. a `Clinker.TypeKind` enum value). The runtime 
extracts this attribute, and performs
classification accordingly.

A native call is decomposed into a sequence of basic, primitive operations, 
called `Binding` (see the great javadoc on
the `Binding.java` class for more info). There are many such bindings - for 
instance the `Move` binding is used to move
a value into a specific machine register/stack slot. So, the main job of the 
various `CallingArranger` classes is to
determine, given a Java `MethodType` and `FunctionDescriptor` what is the set 
of bindings associated with the
downcall/upcall.

At the heart of the foreign linker support is the `ProgrammableInvoker` class. 
This class effectively generates a
`MethodHandle` which follows the steps described by the various bindings 
obtained by `CallArranger`. There are actually
various strategies to interpret these bindings - listed below:

* basic intepreted mode; in this mode, all bindings are interpreted using a 
stack-based machine written in Java (see
  `BindingInterpreter`), except for the `Move` bindings. For these bindings, 
the move is implemented by allocating
  a *buffer* (whose size is ABI specific) and by moving all the lowered values 
into positions within this buffer. The
  buffer is then passed to a piece of assembly code inside the VM which takes 
values from the buffer and moves them in
  their expected registers/stack slots (note that each position in the buffer 
corresponds to a different register). This
  is the most general invocation mode, the more "customizable" one, but also 
the slowest - since for every call there is
  some extra allocation which takes place.

* specialized interpreted mode; same as before, but instead of interpreting the 
bindings with a stack-based interpreter,
  we generate a method handle chain which effectively interprets all the 
bindings (again, except `Move` ones).

* intrinsified mode; this is typically used in combination with the specialized 
interpreted mode described above
  (although it can also be used with the Java-based binding interpreter). The 
goal here is to remove the buffer
  allocation and copy by introducing an additional JVM intrinsic. If a native 
call recipe is constant (e.g. the set of
  bindings is constant, which is probably the case if the native method handle 
is stored in a `static`, `final` field),
  then the VM can generate specialized assembly code which interprets the 
`Move` binding without the need to go for an
  intermediate buffer. This gives us back performances that are on par with JNI.

For upcalls, the support is not (yet) as advanced, and only the basic 
interpreted mode is available there. We plan to
add support for intrinsified modes there as well, which should considerably 
boost perfomances (probably well beyond
what JNI can offer at the moment, since the upcall support in JNI is not very 
well optimized).

Again, for more readings on the internals of the foreign linker support, please 
refer to [5].

#### Test changes

Many new tests have been added to validate the foreign linker support; we have 
high level tests (see `StdLibTest`)
which aim at testing the linker from the perspective of code that clients could 
write. But we also have deeper
combinatorial tests (see `TestUpcall` and `TestDowncall`) which are meant to 
stress every corner of the ABI
implementation. There are also some great tests (see the `callarranger` folder) 
which test the various `CallArranger`s
for all the possible platforms; these tests adopt more of a white-box approach 
- that is, instead of treating the
linker machinery as a black box and verify that the support works by checking 
that the native call returned the results
we expected, these tests aims at checking that the set of bindings generated by 
the call arranger is correct. This also
mean that we can test the classification logic for Windows, Mac and Linux 
regardless of the platform we're executing on.

Some additional microbenchmarks have been added to compare the performances of 
downcall/upcall with JNI.

[1] - https://openjdk.java.net/jeps/389
[2] - https://openjdk.java.net/jeps/393
[3] - https://git.openjdk.java.net/jdk/pull/548
[4] - 
https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md
[5] - 
http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html

-------------

Commit messages:
 - Fix more whitespaces
 - Fix whitespaces
 - Remove rejected file
 - More updates
 - Add new files
 - Merge with master
 - Merge branch 'master' into 8254162
 - Remove spurious check on MemoryScope::confineTo
 - Merge branch 'master' into 8254162
 - Simplify example in the toplevel javadoc
 - ... and 8 more: https://git.openjdk.java.net/jdk/compare/5d6a6255...7cf0ef09

Changes: https://git.openjdk.java.net/jdk/pull/634/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=634&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8254231
  Stats: 75468 lines in 264 files changed: 72636 ins; 1596 del; 1236 mod
  Patch: https://git.openjdk.java.net/jdk/pull/634.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634

PR: https://git.openjdk.java.net/jdk/pull/634

Reply via email to