> The key questions are around the mental model of what we're trying to
> accomplish and how to make it easy (easier?) for users to migrate to use
> value types or handle when their pre-value code is passed a valuetype.
> There's a cost for some group of users regardless of how we address each of
> these issues. Who pays these costs? Those migrating to use the new value
> types functionality? Those needing to address the performance costs of
> migrating to a values capable runtime (JDK-N?).
Indeed, this is the question. And, full disclosure, my thoughts have evolved
since we started this exercise.
We initially started with the idea that value types were this thing “off to the
side” — a special category of classes that only experts would ever use, and so
it was OK if they had sharp edges. But this is the sort of wishful thinking
one engages in when you are trying to do something that seems impossible; you
bargain with the problem.
When we did generics, there was a pervasive believe that the complexity of
generics could be contained to where only experts would have to deal with it,
and the rest of us could happily use our strongly typed collections without
having to understand wildcards and such. This turned out to be pure wishful
thinking; generics are part of the language, and in order to be an effective
Java programmer, you have to understand them. (And this only gets more true;
the typing of lambdas builds on generics.)
The first experiments (Q world) were along the lines of value types being off
to the side. While it was possible to build the VM that way, we ran into
problem after another as we tried to use them in Java code. Value types would
be useless if you can’t put them in an ArrayList or HashMap, so we were going
to have migrate our existing libraries to be value-aware. And with the myriad
distinctions between values and objects (different top types, different
bytecodes, different type signatures), it was a migration nightmare.
In the early EG meetings, Kevin frequently stood up and said things like “it’s
bad enough that we have a type system split in two; are you really trying to
sell me one split in three? You can’t do that to the users.” (Thank you,
Kevin.)
The problems of Q-world were in a sense the problems of erased generics — we
were trying to minimize the disruption to the VM (a worthy goal), but the cost
was that sharp edges were exposed to the users in ways they couldn’t avoid.
And the solution of L World is: push more of it into the VM. (Obviously
there’s a balance to be struck here.) And I believe that we are finally close
to a substrate on which we can build a strong, stable tower, where we can
compatibly migrate our existing billions of lines of code with minimal
intrusion. So this is encouraging.
The vision of being able to “flatten all the way down”, and having values
interact cleanly with all the other language features is hard to argue against.
But as you say, the question is, who pays.
> One concern writ large across our response is performance. I know we're
> looking at user model here but performance is part of that model. Java has a
> well understood performance model for array access, == (acmp), and it would
> be unfortunate if we damaged that model significantly when introducing value
> types.
I agree that this is an expensive place to be making tradeoffs. Surely if the
cost were that ACMP got .0000001% slower, it’s a slam dunk “who cares”, and if
ACMP got 100000x slower, it’s a slam-dunk the other way. The real numbers (for
which we’ll need data) will not be at either of these extremes, and so some
hard decisions are in our future.
> Is this a fair statement of the projects goals: to improve memory locality
> in Java by introducing flattenable data? The rest of where we've gotten to
> has been working all the threads of that key desire through the rest of the
> java platform. The L/Q world design has come about from starting from a VM
> perspective based on what's implementable in ways that allows the JVM to
> optimize the layout.
It’s a fair summary, but I would like to be more precise.
Value types offer the user the ability to trade away some programming
flexibility (mutability, subtyping) for flatter and denser memory layouts. And
we want value types to interact cleanly with the other features of the
platform, so that when you (say) put value types in an ArrayList, you still get
flat and dense representations. So I think a good way to think about it is
“enabling flattening all the way down”. (Flattenability also maps fairly
cleanly to scalarizability, so the same tradeoffs that give us flattenability
on the heap give us scalarization on the stack.)
Those are the performance goals. But there are also some “all the way up”
goals I’d like to state. Programming with value types should interact cleanly
with the rest of the platform; writing code that is generic over references and
values should only be slightly harder than writing code that is generic only
over erased references. Users should be able to reason about the properties of
Object, which means reasoning about the union of references and values.
Otherwise, we may gain performance, but we’ve turned Java into C++ (or worse),
and one of the core values of the platform will be gone.
Balancing these things is a very tricky balance, and I think we’re still
spiraling into the right balance. Q World was way too far off in one
direction; it gave the experts what they needed but at the cost of making
everyone’s language far more complex and hard to code in, and creating
intractable migration problems. I think L World is much closer to where we
want to be, but I think we’re still a little too much focused on bottom-up
decision making, and we need to temper that with some top-down “what language
do we get, and is it the one we want” thinking. I am optimistic, but I’m not
declaring victory yet.
> One of the other driving factors has been the desire to have valuetypes work
> with existing collections classes. And a further goal of enabling generic
> specialization to allow those collections to get the benefits of the
> flattened data representations (ie: backed by flattened data arrays).
Yes. I think this is “table stakes” for this exercise. Not being able to use
HashMap with values, except via boxing, would be terrible; not being able to
generify over all the types would be equally terrible. And one of the biggest
assets of the Java ecosystem is the rich set of libraries; having to throw them
all out and rewrite them (and deal with the migration mess from OldList to
NewList) could well be the death sentence.
We don’t have to get there all at once; the intermediate target (L10) is
“erased generics over values’, which gives us reuse and reasonable calling
conventions but not yet flattening. But that has to lead to a sane generics
model where values are first-class type arguments, with flattening all the way
down.
> The other goal we discussed in Burlington was that pre-value code should be
> minimally penalized when values are introduced, especially for code that
> isn't using them. Otherwise, it will be a hard sell for users to take a new
> JDK release that regresses their existing code.
Yes, I think the question here is “what is minimal.” And the answer is going
to be hard to quantify, because there are slippery slopes and sharp cliffs
everywhere. If we have some old dusty code and just run unchanged on a future
JVM, there probably won’t be many value types flying around, so speculation
might get us 99% of the way there. But once you start mixing that old legacy
code with some new code that uses values, it might be different.
Also, bear in mind that values might provide performance benefits to
non-value-using code. For example, say we rewrite HashMap using values as
entries. That makes for fewer indirections in everyone’s code, even if they
never see a value in the wild. Do we count that when we are counting the
“value penalty” for legacy code?
So, we have to balance the cost to existing code (that never asked for values)
with the benefits to future code that can do amazing new things with values.
> Does that accurate sum up the goals we've been aiming for?
With some caveats, its a good starting point :)
>
> A sensible rationalization of the object model for L-World would be to
> have special subclasses of `Object` for references and values:
>
> ```
> class Object { ... }
> class RefObject extends Object { ... }
> class ValObject extends Object { ... }
> ```
>
> Would the intention here be to retcon existing Object subclasses to instead
> subclass RefObject? While this is arguable the type hierarchy we'd have if
> creating Java today, it will require additional speculation from the JIT on
> all Object references in the bytecode to bias the code one way or the other.
> Some extra checks plus a potential performance cliff if the speculation is
> wrong and a single valuetype hits a previous RefObject only calcite.
That was what I was tossing out, yes. This is one of those nice-to-haves that
we might ultimately compromise on because of costs, but we should be aware what
the costs are. It has some obvious benefits (clear statement of reality,
brings value-ness into the type system.) And the fact that value-ness wasn’t
reflected in the type system in Q world was a real problem; it meant we had
modifiers on code and type variables like “val T” that might have been decent
prototyping moves, but were not the language we wanted to work with.
That said, if the costs are too high, we can revisit.
> ```
> interface Nullable { }
> ```
>
> which is implemented by `RefObject`, and, if we support value classes
> being declared as nullable, would be implemented by those value
> classes as well. Again, this allows us to use `Nullable` as a
> parameter type or field type, or as a type bound (`<T extends
> Nullable>`).
> I'm still unclear on the nullability story.
Me too :) Some recent discussions have brought us to a refined view of this
problem, which is: what’s missing from the object model right now is not
necessarily nullable values (we already have these with L-types!), but classes
which require initialization through their constructor in order to be valid.
This is more about “initialization safety” than nullability. Stay tuned for
some fresh ideas here.
>
>
> #### Equality
>
> The biggest and most important challenge is assigning sensible total
> semantics to equality on `Object`; the LW1 equality semantics are
> sound, but not intuitive. There's no way we can explain why for
> values, you don't get `v == v` in a way that people will say "oh, that
> makes sense." If everything is an object, `==` should be a reasonable
> equality relation on objects. This leads us to a somewhat painful
> shift in the semantics of equality, but once we accept that pain, I
> think things look a lot better.
>
> Users will expect (100% reasonably) the following to work:
>
> ```
> Point p1, p2;
>
> p1 == p1 // true
>
> p2 = p1
> p1 == p2 // true
>
> Object o1 = p1, o2 = p2;
>
> o1 == o1 // true
> o1 == o2 // true
> ```
> We ran into this problem with PackedObjects which allowed creating multiple
> "detached" object headers that could refer to the same data. While early
> users found this painful, it was usually a sign they had deeper problems in
> their code & understanding. One of the difficulties was that depending on
> how the PackedObjects code was written, == might be true in some cases. We
> found a consistent answer was better - and helped to define the user model.
I am deeply concerned that this is wishful thinking based on performance
concerns — and validated with a non-representative audience. I’d guess that
most of the Packed users were experts who were reaching for packed objects
because they had serious performance problems to solve. (What works in a pilot
school for gifted students with hand-picked teachers, doesn’t always scale up
to LA County Unified.)
I think that we muck with the intuitivess of `==` at our peril. Of all the
concerns i have about totality, equality is bigger that all the rest put
together.
> In terms of values, is this really the model we want? Users are already
> used to needing to call .equals() on equivalent objects. By choosing the
> answer carefully here, we help to guide the right user mental model for some
> of the other proposals - locking being a key one.
I think this is probably wishful thinking too. A primary use case for values is
numerics. Are we going to tell people they can’t compare numerics with ==?
And if we base `==` on the static type, then we’ll get different semantics when
you convert to Object. But conversion to Object is not a boxing conversion —
it’s a widening conversion. I’m really worried about this.
>
> While the conceptual model may be clean, it's also, as you point out,
> horrifying. Trees and linked structures of values become very very expensive
> to acmp in ways users wouldn't expect.
I’m not sure about the “expect” part. We’re telling people that values are
“just” their state (even if that state is rich.) Wouldn’t you then expect
equality to be based on state?
>
> If we do this, users will build the mental model that values are interned and
> that they are merely fetching the same instances from some pool of values.
> This kind of model will lead them down rabbit holes - and seems to give
> values an identity. We've all seen abuses of String.intern() - do we want
> values to be subject to that kind of code?
That’s not the mental model that comes to mind immediately for me, so let’s
talk more about this.
>
> The costs here are likely quite large - all objects that might be values need
> to be checked, all interfaces that have ever had a value implement them, and
> of course, all value type fields plus whatever the Nullability model ends up
> being.
I would say that _in the worst case_ the costs could be large, but in the
common cases (e.g., Point), the costs are quite manageable — the cost of a
comparison is a bulk bit comparison. Thats more than a single word comparison,
but it’s not so bad.
I get that this is where the cost is — I said up front, this is the pill to
swallow. Let’s figure out what it really costs.
>
>
> #### Identity hash code
>
> Because values have no identity, in LW1 `System::identityHashCode`
> throws `UnsupportedOperationException`. However, this is
> unnecessarily harsh; for values, `identityHashCode` could simply
> return `hashCode`. This would enable classes like `IdentityHashMap`
> (used by serialization frameworks) to accept values without
> modification, with reasonable semantics -- two objects would be deemed
> the same if they are `==`. (For serialization, this means that equal
> values would be interned in the stream, which is probably what is
> wanted.)
>
> By return `hashCode`, do you mean call a user defined hashCode function?
> Would the VM enforce that all values must implement `hashCode()`? Is the
> intention they are stored (growing the size of the flattened values) or would
> calling the hashcode() method each time be sufficient?
I would prefer to call the "built-in” value hashCode — the one that is
deterministically derived from state. That way, we preserve the invariant that
== values have equal identity hash codes.
>
> The only consistent answer here is to throw on lock operations for values.
> Anything else hides incorrect code, makes it harder for users to debug
> issues, and leaves a mess for the VM. As values are immutable, the lock
> isn't protecting anything. Code locking on unknown objects is fundamentally
> broken - any semantics we give it comes at a cost and doesn't actually serve
> users.
I don’t disagree. The question is, what are we going to do when
Web{Logic,Sphere} turns out to be locking on user objects, and some user passes
in a value? Are we going to tell them “go back to Java 8 if you don’t like
it”? (Serious question.) If so, then great, sign me up!
>
To be continue….