At the BUR meeting, we discussed reshuffling the dependency graph to do forwarding+reversing bridges earlier, which has the effect of taking some pressure off of the descriptor language. Here’s an updated doc on forwarding-reversing bridges in the VM.
I’ve dropped, for the time being, any discussion of replacing existing generic bridges with this mechanism; we can revisit that later if it makes sense. Instead, I’ve focused solely on the migration aspects. I’ve also dropped any mention of implementation strategy, and instead appealed to “as if” behavior. ## From Bridges to Forwarders In the Java 1.0 days, `javac` was little more than an "assembler" for the classfile format, translating source code to bytecode in a mostly 1:1 manner. And, we liked it that way; the more predictable the translation scheme, the more effective the runtime optimizations. Even the major upgrade of Java 5 didn't significantly affect the transparency of the resulting classfiles. Over time, we've seen small divergences between the language model and the classfile model, and each of these is a source of sharp edges. In Java 1.1 the addition of inner classes, and the mismatch between the accessibility model in the language and the JVM (the language treated a nest as a single entity; the JVM treat nest members as separate classes) required _access bridges_ (`access$000` methods), which have been the source of various issues over the years. Twenty years later, these methods were obviated by [_Nest-based Access Control_][jep181] -- which represents the choice to align the VM model to the language model, so these adaptation artifacts are no longer required. In Java 5, while we were able to keep the translation largely stable and transparent through the use of erasure, there was one point of misalignment; several situations (covariant overrides, instantiated generic supertypes) could give rise to the situation where two or more method descriptors -- which the JVM treats as distinct methods -- are treated by the language as if they correspond to the same method. To fool the VM, the compiler emits _bridge methods_ which forward invocations from one signature to another. And, as often happens when we try to fool the VM, it ultimately has its revenge. #### Example: covariant overrides Java 5 introduced the ability to override a method but to provide a more specific return type. (Java 8 later extended this to bridges in interfaces as well.) For example: ```{.java} class Parent { Object m() { ... } } class Child extends Parent { @Override String m() { ... } } ``` `Parent` declares a method whose descriptor is `()Object`, and `Child` declares a method with the same name whose descriptor is `()String`. If we compiled this class in the obvious way, the method in `Child` would not override the method in `Parent`, and anyone calling `Parent.m()` would find themselves executing the wrong implementation. The compiler addresses this by providing an additional implementation of `m()`, whose descriptor is `()Object` (an actual override), marked with `ACC_SYNTHETIC` and `ACC_BRIDGE`, whose body invokes `m()String` (with `invokevirtual`), redirecting calls to the right implementation. #### Example: generic substitution A similar situation arises when we have a generic substitution with a superclass. For example: ```{.java} interface Parent<T> { void m(T x); } class Child extends Parent<String> { @Override void m(String x) { ... } } ``` At the language level, it is clear that `Child::m` intends to override `Parent::m`. But the descriptor of `Parent::m` is `(Object)V`, and the descriptor of `Child::m` is `(String)V`, so again a bridge is needed. Because the two signatures -- `m(Object)V` and `m(String)V` -- have been "merged" in this manner, the compiler will prevent subclasses from overriding the bridge signature, in order to maintain the integrity of the bridging scheme. (The first time you encounter an error message informing you of an illegal override in this situation, it can be extremely confusing!) #### Anatomy of a bridge method The bridge methods that are generated by the compiler today operate by _forwarding_. That is, a bridge method `m(X)` is always defined relative to some other method `m(Y)`, and the body of a bridge method pushes its arguments on the stack, adapting them (widening, casting, boxing, etc) the arguments from X to Y, invoking `m(Y)` with `invokevirtual`, and adapting the return type from Y to X, and returning that. Because the bridge uses `invokevirtual`, it need only be generated once, and invocations of the bridge may select a method in a subclass. (The bridge is generated at the "highest" place in the inheritance hierarchy where the need for a bridge is identified, which may be a class or an interface.) #### Bridges are brittle Bridges can be brittle under separate compilation (and, there was a nontrivial bug tail initially.) Separate compilation can move bridges from where already-compiled code expects them to be to places it does not expect them. This can cause the wrong method body to be invoked, or can cause "bridge loops" (resulting in `StackOverflowError`). (These anomalies disappear if the entire hierarchy is consistently recompiled; they are solely an artifact of inconsistent separate compilation.) The basic problem with bridge methods is that the language views the two method descriptors as two faces of the same actual method, whereas the JVM sees them as distinct methods. (And, reflection also has to participate in the charade.) #### Limits of bridge methods Bridge methods have worked well enough for the uses to which we've put them, but there are a number of desirable scenarios where bridge methods ultimately run out of gas. These scenarios stem from various forms of _migration_, and the desire to make these migrations binary-compatible. The problem of migration arises both from language evolution (Valhalla aims to enable compatible migrating from value-based classes to value types, and from erased generics to specialized), as well as from the ordinary evolution of libraries. An example of the "ordinary migration" problem is the replacement of the old `Date` classes with `LocalDateTime` and friends. We can easily add new the classes to the JDK, along with conversions to and from the old types, but there are existing APIs that still deal in `Date` -- and if we ever want to be able to deprecate the old versions, we have to find a way to compatibly migrate APIs that deal in `Date` to the new types. (The extreme form of this is the "Collections 2.0" problem; we could surely write a new Collections library, but when nearly every API deals in `List`, unless we can migrate these away, what would be the point?) Migration scenarios like these pose two problems that bridge methods cannot solve: - **Fields.** While we can often reroute method invocations with bridges, we have no similar mechanism for fields. If a field signature changes (whether due to changes in the translation strategy, or changes in the API), there is no way to make this binary-compatible. - **Overrides.** Bridges allow us to reroute _invocations_ of methods, but not _overrides_ of methods. If a method descriptor in a non-final class changes, but has subclasses in a separate maintenance domain that continue to use the old descriptor, what is intended to be an override may accidentally become an overload, or might override the bridge instead of the actual method. #### Wildcards and polymorphic fields A non-migration application for bridges that comes out of Valhalla is _wildcards_. For a class `C<T>` with a method `m(T)`, the wildcard `C<?>` (the class type) has an abstract method `m(Object)`, which needs to be implemented by each species type. This is, effectively, a bridge; the method `m(Object)` generated for the species adapts the arguments and forwards to the "real" (`m(T)`) method. While this could be implemented using straightforward code generation in the static compiler, it may be preferable to treat this as a bridge as well. More importantly, the same is true for fields; if `C<T>` has a field of type `T`, then the wildcard `C<?>` will expose this field as if it were of type `Object`. This cannot be implemented using straightforward code generation in the static compiler (without undermining the promise of migration compatibility.) ## Forwarding In this document, we attempt to learn from the history of bridges, and create a new mechanism -- _forwarders_ -- that work with the JVM instead of against it. This raises the level of expressivity of classfiles and opens the possibility of greater laziness. It is possible that traditional bridging scenarios can eventually be handled by forwarders too, but for purposes of this document, we will focus exclusively on the migration scenarios. A _forwarder_ is a non-abstract method that, instead of a `Code` attribute, has a `Forwarding` attribute: ``` Forwarding { u2 name; u4 length; u2 forwardeeDescriptor; } ``` Let's assume that forwarders have the `ACC_FORWARDER` and `ACC_SYNTHETIC` bits (in reality we will likely overload `ACC_BRIDGE`). When compiling a method (concrete or abstract) that has been migrated from an old descriptor to a new descriptor (such as migrating `m(Object)V` to `m(String)V`), the compiler would generate an ordinary method with the new descriptor, and a forwarder with the old descriptor which forwarders to the new descriptor. This captures the statement that there used to be a method called `m` with the old descriptor, but it migrated to the new descriptor -- so that the JVM can transparently adjust the behavior of clients and overriders that were not aware of the migration. #### Invocation of forwarders Given a forwarder in a class with name `N` and descriptor `D` that forwards to descriptor `E`, define `M` by: MethodHandle M = MethodHandles.lookup() .findVirtual(thisClass, N, E); If the forwarder is _selected_ as the target of an `invokevirtual`, the behavior should be _as if_ the caller invoked `M.asType(D)`, where the arguments of `D` are adapted to their counterparts in `E`, and the return type in `E` is adapted back to the return type in `D`. (We may wish to reduce the set of built-in adaptations to a smaller set than those implemented by `MethodHandle::asType`, for simplicity, based on requirements.) Because forwarders exist for migration, we hope that over time, callers will migrate from the old descriptor to the new, rendering forwarders vestigial. As a result, we may wish to defer as much of the bridge generation logic as possible to first-selection time. #### Forwarders for fields The forwarding strategy can be applied to fields as well. In this case, the forwardee descriptor is that of a field descriptor, and the behavior has the same semantics as adapting a target field accessor method handle to the type of the bridge descriptor. (If the forwarder field is static, then the field should be static too.) #### Overriding of forwarders Capturing forwarding information declaratively enables us to detect when a class overrides a forwarder descriptor with a non-forwarder (which indicates that the subclass is out of date with its supertypes) and redirect the override to the actual method (with arguments and return values adapted.) Given a forwarder in a class `A` with name `N` and descriptor `D` that forwards to descriptor `E`, suppose a subclass `B` overrides the forwarder with `N(D)`. Let `M` be the method handle that corresponds to the `Code` attribute of `B.N(D)`. We would like it to behave as if `B` had instead specified a method `N(E)`, whose `Code` attribute corresponded to `M.asType(E)`. #### Additional adaptations The uses we anticipate for L100 all can be done with `asType()` adaptations (in fact, with a subset of `asType()` adaptations). However, if we wish to support user-provided migrations (such as migrating libraries that use `Date` to `LocalDateTime`) or migrate complex JDK APIs such as `Stream`, we may need to provide additional adaptation logic in the `ForwardingBridge` attribute. Let's extend the `Forwarding` attribute: ``` Forwarding { u2 name; u4 length; u2 forwardeeDescriptor; u2 adapter; } ``` where `adaptor` is the constant pool index of a method handle whose type is `(MethodHandle;MethodType;)MethodHandle;` (note that the method handle for `MethodHandle::asType` has this shape). If `adapter` is zero, we use the built-in adaptations; if it is nonzero, we use the referred-to method handle to adapt between the forwarder and forwardee descriptors (in both directions). #### Adaptation failures and limitations Whatever adaptations we are prepared to do between forwarder and forwardee, we need to be prepared to do them in both directions; if a method `m(int)` is migrated to `m(long)`, invocation arguments will be adapted `int` to `long`, but if overridden, we'll do the reverse adaptation on the (out of date) overrider `m(int)`. Given that most adaptations are not between isomorphic domains, there will be cases in one direction or the other that cannot be represented (`long` to `int` is lossy; `Integer` to `int` can NPE; `Object` to `String` can CCE.) Our guidance is that adaptations should form a projection/embedding pair; this gives us the nice property that we can repeat adaptations with impunity (if the first adaptation doesn't fail, adapting back and back again is guaranteed to be an identity.) Even within this, though, there are often multiple ways to implement the adaptation; an embedding can throw on an out-of-range value, or it could pick an in-range target and map to that. So, for example, if we migrated `Collection::size` to return `long`, for `int`-desiring clients, we could clamp values greater than `MAX_VALUE` to `MAX_VALUE`, rather than throwing -- and this would likely be a better outcome for most clients. The choice of adaptation should ultimately be left to metadata present at the declaration of the migrated method. #### Type checking and corner cases A forwarder should always forward to a non-forwarder method (concrete or abstract) _in the same class_. (Because they are in the same class, there is no chance that separate compilation can cause a forwarder to point to another forwarder.) In general, we expect that forwarders are only ever overridden by non-forwarder methods (and then, only in out-of-date classfiles). (This means that invocations that resolve to the forwarder will generally select the forwarder.) - If a forwarder method is overridden by another forwarder method, this is probably a result of a migration happening in a subclass and then later the same migration happens in a superclass. We can let the override proceed. - If a forwarder is overridden by a legacy bridge, we have a few bad choices. We could accept the bridge (which would interfere with forwarding), or discard the bridge (which could cause other anomalies.) If we leave existing bridge generation alone, this case is unlikely and accepting the bridge is probably a reasonable answer; if we migrate bridges to use forwarding, we'd probably want to err in the other direction. - If a forwarder has a forwardee descriptor that is exactly the same as the forwarder, the forwarder should be discarded. (These can arise from specialization situations.) [jep181]: https://openjdk.java.net/jeps/181