But i agree that having all these Q-ref in the bytecode will not help the introduction of generics.
I think that the way to implement the reification of generics is: - do the specialization of the data shape by asking a boostrap method with the restriction that the data shape has to be covariant with the generic description (so the specialization of a L-type can be a Q-type) - pass the type arguments out of band so a call site can call with or without the type arguments - do the specialization of the bytecode at JIT time because at that time you have the type arguments and their usages and you avoid the bytecode explosion of the C++ like templating mechanism. So Brian, i agree with you that the way to describe a generic class is to use only L-type descriptor and aload/astore bytecodes but it doesn't mean that at runtime when we specialize a generic class, we can not say that a field typed as a L-type is not in fact a Q-type (or an array of L-type is not in fact an array of Q-type). Rémi ----- Mail original ----- > De: "Brian Goetz" <[email protected]> > À: "valhalla-spec-experts" <[email protected]> > Envoyé: Lundi 7 Janvier 2019 18:21:26 > Objet: Finding the spirit of L-World > I’ve been processing the discussions at the Burlington meeting. While I think > we made a lot of progress, I think we fell into a few wishful-thinking traps > with regard to the object model that we are exposing to users. What follows > is > what I think is the natural conclusion of the L-World design — which is a > model > I think users can love, but requires us to go a little farther in what the VM > does to support it. > > > > > # Finding the Spirit of L-world > > L-World is, at heart, an attempt to unify reference objects and > values; they're unified under a common top type (`Object`), a common > basic type descriptor (`L`), and a common set of bytecodes (`aload` et > al.) The war cry for L-World should be, therefore, "Everything is an > Object". And users will be thrilled to see such a unification -- > assuming we can live up to the high expectations that such a promise > sets. > > By unifying references and values under a common type descriptor and > supertype, we gain significant benefits for _migration_ -- that > migrating a reference class to a value class does not break the ways > existing code refers to it. > > By unifying under a common set of bytecodes, we gain significant > benefits for _specialization_; the method body output by the compiler > can apply equally to reference and value parameterizations, and all > specialization can be applied on the constant pool only. > > If our war cry is "Everything is an Object", we need to ask ourselves > what behaviors uses should reasonably expect of all objects -- and > ensure that values and references alike conform to those behaviors. > > ## Object model > > In Q-world, we struggled with the fact that there was no true top > type, but most code was written as if `Object` were the top type. This > was trying to square a circle; the options for introducing a new top > type in Q-world were not good (an `Any` superclass provided the > desired unification but a woefully confusing cost model; an > `Objectible` interface shared between `Object` and values would set > off a snail stampede to migrate libraries to use `Objectible` as the > new fake top), but having multiple roots would have further > exacerbated the pain of the existing bipartite type system. > > L-world offers us an out; it makes `Object` a true top type (save for > primitives -- but see "Poxing", below), so existing code that deals > with `Object` can immediately accept values (save for totality -- but > see "Totality", below) without requiring disruptive migration. > > A sensible rationalization of the object model for L-World would be to > have special subclasses of `Object` for references and values: > > ``` > class Object { ... } > class RefObject extends Object { ... } > class ValObject extends Object { ... } > ``` > > We would enforce that `RefObject` is only extended by classes that do > not have the `ACC_VALUE` bit, that `ValObject` is only extended by > classes that do have the `ACC_VALUE` bit, and that classes that claim > to extend `Object` are implicitly reparented according to their > `ACC_VALUE` bit. (Actually, in this scheme, we can ditch the > `ACC_VALUE` bit entirely; at load time, we just look at the > superclass, and if it's `ValObject`, its a value class, otherwise > it's a reference class.) > > Bringing ref-ness and val-ness into the type system in this way has > many benefits: > > - It reinforces the user's understanding of the relationship between > values and references. > - It allows us to declare methods or fields that accept any object, > reference objects only, or value objects only, using existing > concepts. > - It provides a place to declare ref-specific or val-specific methods, > and ref-specific or value-specific implementations of `Object` > methods. (For example, we could implement `Object::wait` as a final > throwing method in `ValObject`, if that's the behavior we want). > - It allows us to express ref-ness or val-ness as generic type > bounds, as in `<T extends RefObject>`. > > We can pull the same move with nullability, by declaring an interface > `Nullable`: > > ``` > interface Nullable { } > ``` > > which is implemented by `RefObject`, and, if we support value classes > being declared as nullable, would be implemented by those value > classes as well. Again, this allows us to use `Nullable` as a > parameter type or field type, or as a type bound (`<T extends > Nullable>`). > > ## Totality > > The biggest pain point in the LW1 model is that we're saying that > everything is an `Object`, but we've had to distort the rules of > `Object` operations in ways that users might find confusing. LW1 says > that equality comparison, identity hash code, locking, and > `Object::wait` are effectively partial, but existing code that deals > in `Object` may be surprised to find this out. Additionally, arrays > of reference objects are covariant with `Object`, but arrays of value > objects are currently not. > > #### Equality > > The biggest and most important challenge is assigning sensible total > semantics to equality on `Object`; the LW1 equality semantics are > sound, but not intuitive. There's no way we can explain why for > values, you don't get `v == v` in a way that people will say "oh, that > makes sense." If everything is an object, `==` should be a reasonable > equality relation on objects. This leads us to a somewhat painful > shift in the semantics of equality, but once we accept that pain, I > think things look a lot better. > > Users will expect (100% reasonably) the following to work: > > ``` > Point p1, p2; > > p1 == p1 // true > > p2 = p1 > p1 == p2 // true > > Object o1 = p1, o2 = p2; > > o1 == o1 // true > o1 == o2 // true > ``` > > In LW1, if we map `==` to `ACMP`, they do not, and this will violate > both user intuition and the spirit of "everything is an object". (If > everything is an object, then when we assign `o1 = p1`, this is just a > widening conversion, not a boxing conversion -- it's the same > underlying object, just with a new static type, so it should behave > the same.) > > The crux of the matter is that interfaces, and `Object` (which for > purposes of this document should be considered an honorary interface) > can hold either a reference or a value, but we've not yet upgraded our > notion of interfaces to reflect this kind-polymorphism. This is what > we have to put on a sounder footing in order to not have users fall > into the chasm of anomalies. To start with: > > - A class is either a ref class or a value class. > - `C implements I` means that instances of `C` are instances of `I`. > - Interfaces are polymorphic over value and ref classes. > > Now we need to define equality. The terminology is messy, as so many > of the terms we might want to use (object, value, instance) already > have associations. For now, we'll describe a _substitutability_ > predicate on two instances: > > - Two refs are substitutable if they refer to the same object > identity. > - Two primitives are substitutable if they are `==` (modulo special > pleading for `NaN` -- see `Float::equals` and `Double::equals`). > - Two values `a` and `b` are substitutable if they are of the same > type, and for each of the fields `f` of that type, `a.f` and `b.f` > are substitutable. > > We then say that for any two objects, `a == b` iff a and b are > substitutable. > > This is an "everything is an object" story that users can love! > Everything is an object, equality is total and intuitive on objects, > interfaces play nicely -— and there are no pesky boxes (except for > primitives, but see below.) The new concept here is that interfaces > abstract over refs and values, and therefore operations that we want > to be total on interfaces -- like equality -- have to take this seam > into account. > > The costs come in two lumps. The first is that if we're comparing two > objects, we first have to determine whether they are refs or values, > and do something different for each. We already paid this cost in > LW1, but here comes the bigger cost: if a value class has fields > whose static types are interfaces, the comparison may have to recur on > substitutability. This is horrifying for a VM engineer, but for users, > this is just a day at the office -- `equals` comparisons routinely > recur. (For values known to (recursively) have no interface fields > and no floating point fields, the VM can optimize comparison to a flat > bitwise comparison.) > > This model eliminates the equality anomalies, and provides users with > an intuitive and sound basis for "same instance". > > One might ask whether we really need to push this into `acmp`, or > whether we can leave `acmp` alone and provide a new API point for > substitutability, and have the compiler generate invocations of that. > While the latter is OK for new code, doing so would cause old code to > behave differently than new when operating on values (or interfaces > that may hold values), and may cause it to change its behavior on > recompile. If we're changing what `Object` means, and what `aload` > can operate on, we should update `acmp` accordingly. > > #### `==` and `equals()` > > Code that knows what type it is dealing with generally uses either > `==` or `equals()`, but not both; generic code (such as `HashMap`) > generally uses the idiom `a == b || a.equals(b)`. Such code _could_ > fall back to just using `equals()`; this idiom arose as an > optimization to avoid the virtual method invocation, but the first > part can be dropped with no semantic loss. > > As the cost of `==` gets higher, this optimization (as optimizations > often do!) may begin to bite back; the `equals()` implementation often > includes an `==` check as well. There are lots of things we can do > here, but it is probably best to wait to see what the actual > performance impact is before doing anything. > > #### Identity hash code > > Because values have no identity, in LW1 `System::identityHashCode` > throws `UnsupportedOperationException`. However, this is > unnecessarily harsh; for values, `identityHashCode` could simply > return `hashCode`. This would enable classes like `IdentityHashMap` > (used by serialization frameworks) to accept values without > modification, with reasonable semantics -- two objects would be deemed > the same if they are `==`. (For serialization, this means that equal > values would be interned in the stream, which is probably what is > wanted.) > > #### Locking > > Locking is a difficult one. On the one hand, it's bad form to lock on > an object that hasn't explicitly invited you to participate in its > locking protocol. On the other hand, there is likely code out there > that does things like lock on client objects, which might expect at > least exclusion with other code that locks the same object, and a > _happens-before_ edge between the release and the acquire. Having > locking all of a sudden throw `IllegalMonitorStateException` would > break such code; while we may secretly root for such code to be > broken, the reality is that such code is likely at the heart of large > legacy systems that are difficult to modify. So we may well be forced > into totalizing locking in some way. (Totalizing locking also means > totalizing the `Object` methods related to locking, `wait`, `notify`, > and `notifyAll`.) > > There are a spectrum of interpretations for totalizing locking, each > with different tradeoffs: > > - Treat locking on a value as an entirely local operation, providing > no exclusion and no happens-before edge. Existing code will > continue to run when provided with values, but may produce > unexpected results. > - Alternately, treat locking on a value as providing no exclusion, > but with acquire and release semantics.) Wait and notify would > still throw. > - Treat locking on a value as acquiring a fat lock (say, a global > value lock, a per-type value lock, etc.) This gives us exclusion > and visibility, with a small risk of deadlock in situations where > multiple such locks are held, and a sensible semantics for wait > and notify (single notify would have to be promoted to `notifyAll`). > - Treat locking on a value as acquiring a proxy lock which is > inflated by the runtime, which assigns a unique lock to each > distinguishable value. > - Put lock-related methods on `ValObject`, whose defaults do one of > the above, and allow implementations to override them. > > While nearly all of these options are horrifying, the goal here is > not to do something _good_, but merely to do something _good enough_ > to avoid crushing legacy code. > > #### Array covariance > > Currently, for any class `C`, `C[] <: Object[]`. This makes > `Object[]` the "top array type". If everything is an object, then an > array of anything should also be an array of `Object`. > > There are two paths to delivering on this vision: extend traditional > array covariance to value arrays (potentially making `aaload` sites > megamorphic), or moving in the direction of "Arrays 2.0" and define a > specializable generic type `Array<T>` where the legacy arrays > implement `Array<T>`, and require clients to migrate from `T[]` to > `Array<T>` before specializing their generic classes. > > ## Poxing > > The Model 3 specializer focused on specializing generics over > primitives, not values (because we hadn't implemented values yet). > Many of the complexities we ran into in that exploration stemmed from > the accidental asymmetries between primitives and objects, including > irregularities in the bytecode set (single vs double slot, `if_icmpeq` > vs `dcmp` + `if`). Having unified references and values, it would be > really nice to unify primitives as well. > > While we can't exactly do that easily, beacause of the intrusion to > the bytecode set, we may be able to come close, using a modified > boxing conversion. The problem with the existing boxing conversion is > that `Integer` is a heavy box with identity -- which means boxing is > expensive. There are two possible paths by which we could mitigate > this pain: > > - Migrate `Integer` to be a value type; > - Create an alternate box for `int`, which is a value class (`ValInt`) > > If we can box primitives to values, then we need not unify primitives > with objects -- we just insert boxing conversions in the places we > already do, and interpret specializations like `List<int>` to mean > "List of int's box". > > Migrating `Integer` to be a value may seem the obvious move, but it is > fraught with compatibility constraints -- there is tons of legacy code > that does things like locking on `Integer` or depending on it's > strange accidental identity. Perhaps if we could totalize locking and > remove the public box constructors, we could get there -- but this > is not a slam-dunk. > > The alternative is creating a value box for primitives (a "pox") and > adjust the compiler's boxing behavior (when boxing to `Object` or an > interface, prefer the pox to the box). This too has some > compatibility concerns, such as code that deals in `Object` that > assumes that primitives are always boxed to legacy boxes. We may be > able to finesse this by a trick -- to teach `instanceof` and > `checkcast` of the relationship between boxes and poxes, so that code > like: > > ``` > if (o instanceof Integer) { > Integer i = (Integer) o; > // use o > } > ``` > > would work on both `Integer` and `int`'s pox (by saying "yes" in > `instanceof` and doing the conversion in `checkcast`.) This move, > while somewhat risky, could allow us to relegate the legacy boxes to > legacy, and eventually deprecate them. (We could then have methods > and intefaces on the poxes, and lift them to the primitives via > poxing, so that `int` could be seen to implement `Comparable<int>` and > you could call `compareTo()` on ints.) While this would not be a true > unification, it would come much closer than we are now. > > Clearly, both alternatives are risky and require more investigation > -- but both have promising payoffs. > > ## Migration > > In both Q-world and L-world, we took care to ensure that for a value > class `C`, the descriptor `LC;` describes a subtype of `Object`. This > is a key part of the story for migrating reference types to values, > since clients of `C` will describe it with `LC;` and we don't want to > require a flag day on migration. In Q-world, `LC;` is the (nullable) > box for `C`; in L-world, it is a nullable `C`. > > This is enough that we can migrate a value-based class to a value and > _existing binary clients_ will not break, even if they stuff a null > into an `LC;`. However, there are other migration compatibility > concerns which we need to take up (which I'll do in a separate > document.) > > ## Generics > > In Q-world, because values and references were so different, > specializable generic classes had to be compiled with additional > constraints. For a specializable type variable `T`, we enforced: > > - Cannot compare a `T` to `null` > - Cannot assign `null` to a `T` > - Cannot assign a `T` to `Object` > - Cannot assign a `T[]` to `Object[]` > - Cannot lock on a `T` > - Cannot `==` on a `T` > > In L-world, the need for most of these can go away. Because > everything is an object, we can assign values to `Object`, and > `acmp_null` should work on all objects, so comparing with `null` is > OK. If we have array covariance, the array assignment restriction > goes away. If we totalize locking and equality, those restrictions go > away. The only restriction that remains is the assignment to `null`. > But now the VM can express the difference between nullable values and > non-nullable values, and we can express this in the source type system > with `Nullable`. So all the Q-world restrictions go away, and they > are replaced by an indication that a given type variable (or perhaps > an entire generic class) is erased or reifiable, and we treat erased > type variables as if they have an implicit `Nullable` bound. Then the > compile-time null-assignment restriction reduces to "does `T` have a > `Nullable` bound", and the restriction against instantiating an erased > generic class with a Q-type reduces to a simple bounds violation. > > (There's lots more to cover on generics -- again, separate document.)
