[LW100] Specialized generics -- translation and binary compatibility issues

Brian Goetz Wed, 17 Oct 2018 12:39:28 -0700

Number 2 of 100 in a series of “What we learned in Phase I of ProjectValhalla.” This one focuses on the challenges of evolving a class to beany-generic, while interacting with existing erased code. No solutionshere, just recaps of problems and challenges.


Let’s imagine a class today:

|interface Boxy<T> { T get(); void set(T t); } class Foo<T> implementsBoxy<T> { public T t; public T[] tArray; public Foo(T t) { set(t); }public static<T> Foo<T> of(T t) { return new Foo(t); } T get() { returnt; } void set(T t) { this.t = t; this.tArray = (T[]) new Object[1] { t}; } } |


and client code

|Foo<String> fs = new Foo<>("boo"); println(fs.t); println(fs.tArray);println(fs.get()); Foo<?> wc = fs; if (wc instanceof Foo) { ... } |


 * Foo extends Bar
 * instanceof/checkcast Foo
 * new Foo
 * anewarray Foo[]
 * getfield Foo.t:Object
 * invokevirtual Foo.get():Object
 * Method descriptors of |Foo::of|

We translate raw |Foo|, |Foo<String>|, and |Foo<?>| all the same waytoday — |LFoo|.



       Tentative simplification: reference instantiations are always erased

The specialization transform takes a template class and a set of typeparameters and produces a specialized class. This can cause member (andsupertype) signatures to change; for example, if we have


|T get() |

which erases to

|Object get() |

when we specialize with T=int, we’ll have

|int get() |

In theory, there’s nothing to stop us from specializing ListwithT=String. However, in the earlier exploration, we settled on thetentative simplification of always erasing reference instantiations, andonly specializing value instantiations. This is a tradeoff; we’re stillthrowing away potentially useful type information (erasure haters willbe disappointed), in exchange for much greater sharing, and avoidingsome compatibility issues (existing generic code is rife with trickslike “casting through wildcards” to coerce a |Foo<A>| to |Foo<B>|, whichonly works as long as we erase; dirty tricks like this are oftennecessary as there are some things that are hard to express in thegeneric type system, even though the programmer knows them.)

Ignoring multiple type parameters for the moment, when |Foo| becomesspecializable, our model is that it will have an /erased/ species — callit |Foo<erased>|. (If you ask it what its type parameters are, it willsay “erased”. That is, we reify the fact that it is erased…) Whilemigrating from erased to specialized generics requires source changesand recompilation at the generic class declaration, it should notrequire any changes or recompilation for clients. That means that legacyclient classfiles that talk about |Foo| must be considered to be talkingabout |Foo<erased>|. (Hierarchies can be specialized from the top down,so it is OK to specialize |Bar| before |Foo|, but not the other way around.)

While the generic specialization machinery will have no problem withspecializing to L-types, I think its a simplification we should hold onto, that we treat all L type parameters as “erased” for purposes ofspecialization.



       Additional simplification: let’s not worry about primitives

In Burlington, we concluded that as long as there’s a Pox class for eachprimitive, we can convert primitives to/from poxes through sourcecompiler transforms, and not worry about specializing over primitives.Instead, when the user wants to specialize List, we instead specializefor int’s pox. Except for those pesky arrays … more on that later.



       Assumption: wild means wild

On the other hand, one of the non-simplifying assumptions we want tomake is that a wildcard type — |Foo<?>| — should describe anyinstantiation of |Foo|, even when the wildcard-using code doesn’t knowabout specialization. (Same with raw usages of |Foo|.) For example, ifthe user has written a method:


|takeFoo(Foo<?> anyFoo) { anyFoo.m(); } |

in legacy (erased) code, we should be able to call |takeFoo()| with botherased and specialized instances of |Foo|. As we’ll see, thiscomplicates member access, and really complicates arrays.


We will find utterances like

|invokevirtual Foo.get()Object getfield Foo.m:Object |

in legacy code; we want these to work against any specialization of |Foo|.

In the case where the instance is erased, things obviously have a decentchance of lining up properly, as the erased members will not have beenspecialized away. If our receiver is a specialized |Foo|, it getsharder, as the member signatures will have changed due to specialization.

Starting in Model 2, we handled this with bridge methods; for eachspecialized method, we also had an erased bridge. This is possiblebecause there’s an easy coercion from |QPoint| to |LObject|. (There areother ways to get there besides bridges.)



       Wildcards

One of the central challenges of pushing specialization into the VM ishow we’re going to handle wildcards. Given a generic class |Foo|, thewildcard type |Foo<?>| is a supertype of any instantiation |Foo<X>| of|Foo|. The wildcard type also erases to |LFoo|.

In Model 2, we modeled wildcards as interfaces, with lots and lots ofbridges, but this still fell short in a number of ways: no support fornon-public methods or for fields, and we had to deal with fields byhoisting them into virtual bridges on the interface.

Note that the wildcard subtyping also matters to the verifier, inaddition to handling bytecodes; the verifier must know that anyspecialization of |Foo| is a subtype of the wildcard |LFoo|.



       But what does |LFoo| mean?

Careful readers will notice that we’ve been playing fast and loose withthe meaning of |Foo|; sometimes it means the class, sometimes thewildcard, and sometimes the erased species.


The best intuition we’ve been able to come up with is:

 * There are /classes/ and /crasses/.
 * A crass describes a single runtime type; it has a layout, methods,
   constructors, etc.
 * A (template) class describes a family of runtime types.
 * A (template) class is like an abstract type; it has members and
   subtypes, but can’t be instantiated directly.
 * All the crasses derived from a class are subtypes of the class.
 * For purposes of instantiation, we interpret |new Foo| as creating an
   instance of the erased species, and a similar game with |<init>|
   methods.


   Model 3 classfile extensions

In Model 3, we extended the constant pool with some new entries:

*TypeVar[n, erasure].* This is a use of a type variable, identified byits index /n/. (There was a table-of-contents attribute listing all thetype variables declared in a generic class or method, including thosedeclared in enclosing generic classes or methods.) Since the erasure ofa type variable is not merely a property of the type variable, but infact a property of how it is used, each use of a type variable carriesaround its own erasure. For field whose type is |T|, the |NameAndType|points not to |Object|, but to |TypeVar[0, Object]|.

When specializing a type variable to |erased|, any uses of that typevariable are replaced with the erasure in the |TypeVar| entry.

*MethodType[D,T…].* This is largely a syntactic mechanism, allowing usto represent method descriptors with holes (but also had the benefit ofcompressing the constant pool somewhat.) The parameter |D| was a methodtype descriptor, except that in addition to the existing types, onecould specify |#| to indicate a hole; the |T...| parameters are CPindexes to other types (which could be UTF8 strings, or |TypeVar|, orthe other type CP entries listed below.)


For example, a method

|int size(T t) |

would have a signature

|#1 = TypeVar[0, Object] #2 = MethodType[(#)I, #1] |

When specializing a |MethodType|, its parameters are recursivelyspecialized, and then the resulting strings concatenated.


*ArrayType[T,rank].* This represents an array of given rank.

We found that as a template language, these types allowed exactly thesort of expressiveness needed, and specialized efficiently down toconcrete descriptors (though in the M3 prototype, we had concretedescriptors of the form |List$0=I| to describe |List<int>|, obviously wedon’t want that here.) But these designs captured all the complexity weneeded (especially that of erasure), and allowed a mechanicaltranslation int Java 8 classfiles.

[LW100] Specialized generics -- translation and binary compatibility issues

Reply via email to