Updated VM-bridges document

Brian Goetz Thu, 04 Apr 2019 05:36:17 -0700

At the BUR meeting, we discussed reshuffling the dependency graph to do 
forwarding+reversing bridges earlier, which has the effect of taking some 
pressure off of the descriptor language.  Here’s an updated doc on 
forwarding-reversing bridges in the VM.


I’ve dropped, for the time being, any discussion of replacing existing generic 
bridges with this mechanism; we can revisit that later if it makes sense.  
Instead, I’ve focused solely on the migration aspects.  I’ve also dropped any 
mention of implementation strategy, and instead appealed to “as if” behavior.  


## From Bridges to Forwarders

In the Java 1.0 days, `javac` was little more than an "assembler" for
the classfile format, translating source code to bytecode in a mostly
1:1 manner.  And, we liked it that way; the more predictable the
translation scheme, the more effective the runtime optimizations.
Even the major upgrade of Java 5 didn't significantly affect the
transparency of the resulting classfiles.

Over time, we've seen small divergences between the language model and
the classfile model, and each of these is a source of sharp edges.  In
Java 1.1 the addition of inner classes, and the mismatch between the
accessibility model in the language and the JVM (the language treated
a nest as a single entity; the JVM treat nest members as separate
classes) required _access bridges_ (`access$000` methods), which have
been the source of various issues over the years.  Twenty years later,
these methods were obviated by [_Nest-based Access Control_][jep181]
-- which represents the choice to align the VM model to the language
model, so these adaptation artifacts are no longer required.

In Java 5, while we were able to keep the translation largely stable
and transparent through the use of erasure, there was one point of
misalignment; several situations (covariant overrides, instantiated
generic supertypes) could give rise to the situation where two or more
method descriptors -- which the JVM treats as distinct methods -- are
treated by the language as if they correspond to the same method.  To
fool the VM, the compiler emits _bridge methods_ which forward
invocations from one signature to another.  And, as often happens when
we try to fool the VM, it ultimately has its revenge.

#### Example: covariant overrides

Java 5 introduced the ability to override a method but to provide a
more specific return type.  (Java 8 later extended this to bridges in
interfaces as well.)  For example:

```{.java}
class Parent {
    Object m() { ... }
}

class Child extends Parent {
    @Override
    String m() { ... }
}
```

`Parent` declares a method whose descriptor is `()Object`, and `Child`
declares a method with the same name whose descriptor is `()String`.
If we compiled this class in the obvious way, the method in `Child`
would not override the method in `Parent`, and anyone calling
`Parent.m()` would find themselves executing the wrong implementation.

The compiler addresses this by providing an additional implementation
of `m()`, whose descriptor is `()Object` (an actual override), marked
with `ACC_SYNTHETIC` and `ACC_BRIDGE`, whose body invokes `m()String`
(with `invokevirtual`), redirecting calls to the right implementation.

#### Example: generic substitution

A similar situation arises when we have a generic substitution with a
superclass.  For example:

```{.java}
interface Parent<T> {
    void m(T x);
}

class Child extends Parent<String> {
    @Override
    void m(String x) { ... }
}
```

At the language level, it is clear that `Child::m` intends to override
`Parent::m`.  But the descriptor of `Parent::m` is `(Object)V`, and
the descriptor of `Child::m` is `(String)V`, so again a bridge is
needed.

Because the two signatures -- `m(Object)V` and `m(String)V` -- have
been "merged" in this manner, the compiler will prevent subclasses
from overriding the bridge signature, in order to maintain the
integrity of the bridging scheme.  (The first time you encounter  an
error message informing you of an illegal override in this situation,
it can be extremely confusing!)

#### Anatomy of a bridge method

The bridge methods that are generated by the compiler today operate by
_forwarding_.  That is, a bridge method `m(X)` is always defined
relative to some other method `m(Y)`, and the body of a bridge method
pushes its arguments on the stack, adapting them (widening, casting,
boxing, etc) the arguments from X to Y, invoking `m(Y)` with
`invokevirtual`, and adapting the return type from Y to X, and
returning that.  Because the bridge uses `invokevirtual`, it need only
be generated once, and invocations of the bridge may select a method
in a subclass.  (The bridge is generated at the "highest" place in the
inheritance hierarchy where the need for a bridge is identified,
which may be a class or an interface.)

#### Bridges are brittle

Bridges can be brittle under separate compilation (and, there was a
nontrivial bug tail initially.)  Separate compilation can move bridges
from where already-compiled code expects them to be to places it does
not expect them.  This can cause the wrong method body to be invoked,
or can cause "bridge loops" (resulting in `StackOverflowError`).
(These anomalies disappear if the entire hierarchy is consistently
recompiled; they are solely an artifact of inconsistent separate
compilation.)

The basic problem with bridge methods is that the language views the
two method descriptors as two faces of the same actual method, whereas
the JVM sees them as distinct methods.  (And, reflection also has to
participate in the charade.)

#### Limits of bridge methods

Bridge methods have worked well enough for the uses to which we've put
them, but there are a number of desirable scenarios where bridge
methods ultimately run out of gas.  These scenarios stem from various
forms of _migration_, and the desire to make these migrations
binary-compatible.

The problem of migration arises both from language evolution (Valhalla
aims to enable compatible migrating from value-based classes to value
types, and from erased generics to specialized), as well as from the
ordinary evolution of libraries.

An example of the "ordinary migration" problem is the replacement of
the old `Date` classes with `LocalDateTime` and friends.  We can
easily add new the classes to the JDK, along with conversions to and
from the old types, but there are existing APIs that still deal in
`Date` -- and if we ever want to be able to deprecate the old
versions, we have to find a way to compatibly migrate APIs that deal
in `Date` to the new types.  (The extreme form of this is the
"Collections 2.0" problem; we could surely write a new Collections
library, but when nearly every API deals in `List`, unless we can
migrate these away, what would be the point?)

Migration scenarios like these pose two problems that bridge methods
cannot solve:

  - **Fields.**  While we can often reroute method invocations with
    bridges, we have no similar mechanism for fields.  If a field
    signature changes (whether due to changes in the translation
    strategy, or changes in the API), there is no way to make this
    binary-compatible.
  - **Overrides.**  Bridges allow us to reroute _invocations_ of
    methods, but not _overrides_ of methods.  If a method descriptor
    in a non-final class changes, but has subclasses in a separate
    maintenance domain that continue to use the old descriptor, what
    is intended to be an override may accidentally become an overload,
    or might override the bridge instead of the actual method.

#### Wildcards and polymorphic fields

A non-migration application for bridges that comes out of Valhalla is
_wildcards_.  For a class `C<T>` with a method `m(T)`, the wildcard
`C<?>` (the class type) has an abstract method `m(Object)`, which
needs to be implemented by each species type.  This is, effectively, a
bridge; the method `m(Object)` generated for the species adapts the
arguments and forwards to the "real" (`m(T)`) method.  While this
could be implemented using straightforward code generation in the
static compiler, it may be preferable to treat this as a bridge as
well.

More importantly, the same is true for fields; if `C<T>` has a field
of type `T`, then the wildcard `C<?>` will expose this field as if it
were of type `Object`.  This cannot be implemented using
straightforward code generation in the static compiler (without
undermining the promise of migration compatibility.)

## Forwarding

In this document, we attempt to learn from the history of bridges, and
create a new mechanism -- _forwarders_ -- that work with the JVM
instead of against it.  This raises the level of expressivity of
classfiles and opens the possibility of greater laziness.  It is
possible that traditional bridging scenarios can eventually be handled
by forwarders too, but for purposes of this document, we will focus
exclusively on the migration scenarios.

A _forwarder_ is a non-abstract method that, instead of a `Code`
attribute, has a `Forwarding` attribute:

```
Forwarding {
    u2 name;
    u4 length;
    u2 forwardeeDescriptor;
}
```

Let's assume that forwarders have the `ACC_FORWARDER` and
`ACC_SYNTHETIC` bits (in reality we will likely overload
`ACC_BRIDGE`).

When compiling a method (concrete or abstract) that has been migrated
from an old descriptor to a new descriptor (such as migrating
`m(Object)V` to `m(String)V`), the compiler would generate an ordinary
method with the new descriptor, and a forwarder with the old
descriptor which forwarders to the new descriptor.  This captures the
statement that there used to be a method called `m` with the old
descriptor, but it migrated to the new descriptor -- so that the JVM
can transparently adjust the behavior of clients and overriders that
were not aware of the migration.

#### Invocation of forwarders

Given a forwarder in a class with name `N` and descriptor `D` that
forwards to descriptor `E`, define `M` by:

    MethodHandle M = MethodHandles.lookup()
                                  .findVirtual(thisClass, N, E);

If the forwarder is _selected_ as the target of an `invokevirtual`,
the behavior should be _as if_ the caller invoked `M.asType(D)`, where
the arguments of `D` are adapted to their counterparts in `E`, and the
return type in `E` is adapted back to the return type in `D`.  (We may
wish to reduce the set of built-in adaptations to a smaller set than
those implemented by `MethodHandle::asType`, for simplicity, based on
requirements.)

Because forwarders exist for migration, we hope that over time,
callers will migrate from the old descriptor to the new, rendering
forwarders vestigial.  As a result, we may wish to defer as much of
the bridge generation logic as possible to first-selection time.

#### Forwarders for fields

The forwarding strategy can be applied to fields as well.  In this
case, the forwardee descriptor is that of a field descriptor, and the
behavior has the same semantics as adapting a target field accessor
method handle to the type of the bridge descriptor.  (If the forwarder
field is static, then the field should be static too.)

#### Overriding of forwarders

Capturing forwarding information declaratively enables us to detect
when a class overrides a forwarder descriptor with a non-forwarder
(which indicates that the subclass is out of date with its supertypes)
and redirect the override to the actual method (with arguments and
return values adapted.)

Given a forwarder in a class `A` with name `N` and descriptor `D` that
forwards to descriptor `E`, suppose a subclass `B` overrides the
forwarder with `N(D)`.  Let `M` be the method handle that corresponds
to the `Code` attribute of `B.N(D)`.  We would like it to behave as if
`B` had instead specified a method `N(E)`, whose `Code` attribute
corresponded to `M.asType(E)`.

#### Additional adaptations

The uses we anticipate for L100 all can be done with `asType()`
adaptations (in fact, with a subset of `asType()` adaptations).
However, if we wish to support user-provided migrations (such as
migrating libraries that use `Date` to `LocalDateTime`) or migrate
complex JDK APIs such as `Stream`, we may need to provide additional
adaptation logic in the `ForwardingBridge` attribute.  Let's extend
the `Forwarding` attribute:

```
Forwarding {
    u2 name;
    u4 length;
    u2 forwardeeDescriptor;
    u2 adapter;
}
```

where `adaptor` is the constant pool index of a method handle whose
type is `(MethodHandle;MethodType;)MethodHandle;` (note that the
method handle for `MethodHandle::asType` has this shape).  If
`adapter` is zero, we use the built-in adaptations; if it is nonzero,
we use the referred-to method handle to adapt between the forwarder
and forwardee descriptors (in both directions).  

#### Adaptation failures and limitations

Whatever adaptations we are prepared to do between forwarder and
forwardee, we need to be prepared to do them in both directions; if a
method `m(int)` is migrated to `m(long)`, invocation arguments will be
adapted `int` to `long`, but if overridden, we'll do the reverse
adaptation on the (out of date) overrider `m(int)`.  Given that most
adaptations are not between isomorphic domains, there will be cases in
one direction or the other that cannot be represented  (`long` to
`int` is lossy; `Integer` to `int` can NPE; `Object` to  `String` can
CCE.)

Our guidance is that adaptations should form a projection/embedding
pair; this gives us the nice property that we can repeat adaptations
with impunity (if the first adaptation doesn't fail, adapting back and
back again is guaranteed to be an identity.)  Even within this,
though, there are often multiple ways to implement the adaptation; an
embedding can throw on an out-of-range value, or it could pick an
in-range target and map to that.  So, for example, if we migrated
`Collection::size` to return `long`, for `int`-desiring clients, we
could clamp values greater than `MAX_VALUE` to `MAX_VALUE`, rather
than throwing -- and this would likely be a better outcome for most
clients.  The choice of adaptation should ultimately be left to
metadata present at the declaration of the migrated method.

#### Type checking and corner cases

A forwarder should always forward to a non-forwarder method (concrete
or abstract) _in the same class_.  (Because they are in the same
class, there is no chance that separate compilation can cause a
forwarder to point to another forwarder.)

In general, we expect that forwarders are only ever overridden by
non-forwarder methods (and then, only in out-of-date classfiles).
(This means that invocations that resolve to the forwarder will
generally select the forwarder.)

  - If a forwarder method is overridden by another forwarder method,
    this is probably a result of a migration happening in a subclass
    and then later the same migration happens in a superclass.  We can
    let the override proceed.
  - If a forwarder is overridden by a legacy bridge, we have a few bad
    choices.  We could accept the bridge (which would interfere with
    forwarding), or discard the bridge (which could cause other
    anomalies.)  If we leave existing bridge generation alone, this
    case is unlikely and accepting the bridge is probably a reasonable
    answer; if we migrate bridges to use forwarding, we'd probably
    want to err in the other direction.
  - If a forwarder has a forwardee descriptor that is exactly the
    same as the forwarder, the forwarder should be discarded.  (These
    can arise from specialization situations.)




[jep181]: https://openjdk.java.net/jeps/181

Updated VM-bridges document

Reply via email to