> The key questions are around the mental model of what we're trying to 
> accomplish and how to make it easy (easier?) for users to migrate to use 
> value types or handle when their pre-value code is passed a valuetype.  
> There's a cost for some group of users regardless of how we address each of 
> these issues.  Who pays these costs?  Those migrating to use the new value 
> types functionality?  Those needing to address the performance costs of 
> migrating to a values capable runtime (JDK-N?).

Indeed, this is the question.  And, full disclosure, my thoughts have evolved 
since we started this exercise.  

We initially started with the idea that value types were this thing “off to the 
side” — a special category of classes that only experts would ever use, and so 
it was OK if they had sharp edges.  But this is the sort of wishful thinking 
one engages in when you are trying to do something that seems impossible; you 
bargain with the problem. 

When we did generics, there was a pervasive believe that the complexity of 
generics could be contained to where only experts would have to deal with it, 
and the rest of us could happily use our strongly typed collections without 
having to understand wildcards and such.  This turned out to be pure wishful 
thinking; generics are part of the language, and in order to be an effective 
Java programmer, you have to understand them.  (And this only gets more true; 
the typing of lambdas builds on generics.)  

The first experiments (Q world) were along the lines of value types being off 
to the side.  While it was possible to build the VM that way, we ran into 
problem after another as we tried to use them in Java code.  Value types would 
be useless if you can’t put them in an ArrayList or HashMap, so we were going 
to have migrate our existing libraries to be value-aware.  And with the myriad 
distinctions between values and objects (different top types, different 
bytecodes, different type signatures), it was a migration nightmare.  

In the early EG meetings, Kevin frequently stood up and said things like “it’s 
bad enough that we have a type system split in two; are you really trying to 
sell me one split in three?  You can’t do that to the users.”  (Thank you, 
Kevin.)  

The problems of Q-world were in a sense the problems of erased generics — we 
were trying to minimize the disruption to the VM (a worthy goal), but the cost 
was that sharp edges were exposed to the users in ways they couldn’t avoid.  
And the solution of L World is: push more of it into the VM.  (Obviously 
there’s a balance to be struck here.)  And I believe that we are finally close 
to a substrate on which we can build a strong, stable tower, where we can 
compatibly migrate our existing billions of lines of code with minimal 
intrusion.  So this is encouraging.  

The vision of being able to “flatten all the way down”, and having values 
interact cleanly with all the other language features is hard to argue against. 
 But as you say, the question is, who pays.  

>  One concern writ large across our response is performance.  I know we're 
> looking at user model here but performance is part of that model.  Java has a 
> well understood performance model for array access, == (acmp), and it would 
> be unfortunate if we damaged that model significantly when introducing value 
> types.

I agree that this is an expensive place to be making tradeoffs.  Surely if the 
cost were that ACMP got .0000001% slower, it’s a slam dunk “who cares”, and if 
ACMP got 100000x slower, it’s a slam-dunk the other way.  The real numbers (for 
which we’ll need data) will not be at either of these extremes, and so some 
hard decisions are in our future.  

>  Is this a fair statement of the projects goals: to improve memory locality 
> in Java by introducing flattenable data?  The rest of where we've gotten to 
> has been working all the threads of that key desire through the rest of the 
> java platform.  The L/Q world design has come about from starting from a VM 
> perspective based on what's implementable in ways that allows the JVM to 
> optimize the layout.

It’s a fair summary, but I would like to be more precise.  

Value types offer the user the ability to trade away some programming 
flexibility (mutability, subtyping) for flatter and denser memory layouts.  And 
we want value types to interact cleanly with the other features of the 
platform, so that when you (say) put value types in an ArrayList, you still get 
flat and dense representations.  So I think a good way to think about it is 
“enabling flattening all the way down”.  (Flattenability also maps fairly 
cleanly to scalarizability, so the same tradeoffs that give us flattenability 
on the heap give us scalarization on the stack.)  

Those are the performance goals.  But there are also some “all the way up” 
goals I’d like to state.  Programming with value types should interact cleanly 
with the rest of the platform; writing code that is generic over references and 
values should only be slightly harder than writing code that is generic only 
over erased references.  Users should be able to reason about the properties of 
Object, which means reasoning about the union of references and values.  
Otherwise, we may gain performance, but we’ve turned Java into C++ (or worse), 
and one of the core values of the platform will be gone.  

Balancing these things is a very tricky balance, and I think we’re still 
spiraling into the right balance.  Q World was way too far off in one 
direction; it gave the experts what they needed but at the cost of making 
everyone’s language far more complex and hard to code in, and creating 
intractable migration problems.  I think L World is much closer to where we 
want to be, but I think we’re still a little too much focused on bottom-up 
decision making, and we need to temper that with some top-down “what language 
do we get, and is it the one we want” thinking.  I am optimistic, but I’m not 
declaring victory yet.  

>  One of the other driving factors has been the desire to have valuetypes work 
> with existing collections classes.  And a further goal of enabling generic 
> specialization to allow those collections to get the benefits of the 
> flattened data representations (ie: backed by flattened data arrays).

Yes.  I think this is “table stakes” for this exercise.  Not being able to use 
HashMap with values, except via boxing, would be terrible; not being able to 
generify over all the types would be equally terrible.  And one of the biggest 
assets of the Java ecosystem is the rich set of libraries; having to throw them 
all out and rewrite them (and deal with the migration mess from OldList to 
NewList) could well be the death sentence.  

We don’t have to get there all at once; the intermediate target (L10) is 
“erased generics over values’, which gives us reuse and reasonable calling 
conventions but not yet flattening.  But that has to lead to a sane generics 
model where values are first-class type arguments, with flattening all the way 
down.  

> The other goal we discussed in Burlington was that pre-value code should be 
> minimally penalized when values are introduced, especially for code that 
> isn't using them.  Otherwise, it will be a hard sell for users to take a new 
> JDK release that regresses their existing code.

Yes, I think the question here is “what is minimal.”  And the answer is going 
to be hard to quantify, because there are slippery slopes and sharp cliffs 
everywhere.  If we have some old dusty code and just run unchanged on a future 
JVM, there probably won’t be many value types flying around, so speculation 
might get us 99% of the way there.  But once you start mixing that old legacy 
code with some new code that uses values, it might be different.  

Also, bear in mind that values might provide performance benefits to 
non-value-using code.  For example, say we rewrite HashMap using values as 
entries.  That makes for fewer indirections in everyone’s code, even if they 
never see a value in the wild.  Do we count that when we are counting the 
“value penalty” for legacy code?  

So, we have to balance the cost to existing code (that never asked for values) 
with the benefits to future code that can do amazing new things with values.  

>  Does that accurate sum up the goals we've been aiming for?

With some caveats, its a good starting point :)

>  
> A sensible rationalization of the object model for L-World would be to
> have special subclasses of `Object` for references and values:
> 
> ```
> class Object { ... }
> class RefObject extends Object { ... }
> class ValObject extends Object { ... }
> ```
>  
> Would the intention here be to retcon existing Object subclasses to instead 
> subclass RefObject?  While this is arguable the type hierarchy we'd have if 
> creating Java today, it will require additional speculation from the JIT on 
> all Object references in the bytecode to bias the code one way or the other.  
> Some extra checks plus a potential performance cliff if the speculation is 
> wrong and a single valuetype hits a previous RefObject only calcite.

That was what I was tossing out, yes.  This is one of those nice-to-haves that 
we might ultimately compromise on because of costs, but we should be aware what 
the costs are.  It has some obvious benefits (clear statement of reality, 
brings value-ness into the type system.)  And the fact that value-ness wasn’t 
reflected in the type system in Q world was a real problem; it meant we had 
modifiers on code and type variables like “val T” that might have been decent 
prototyping moves, but were not the language we wanted to work with.  

That said, if the costs are too high, we can revisit.  

>  ```
> interface Nullable { }
> ```
> 
> which is implemented by `RefObject`, and, if we support value classes
> being declared as nullable, would be implemented by those value
> classes as well.  Again, this allows us to use `Nullable` as a
> parameter type or field type, or as a type bound (`<T extends
> Nullable>`).  
> I'm still unclear on the nullability story.  

Me too :)  Some recent discussions have brought us to a refined view of this 
problem, which is: what’s missing from the object model right now is not 
necessarily nullable values (we already have these with L-types!), but classes 
which require initialization through their constructor in order to be valid.  
This is more about “initialization safety” than nullability.  Stay tuned for 
some fresh ideas here.  



> 
> 
> #### Equality
> 
> The biggest and most important challenge is assigning sensible total
> semantics to equality on `Object`; the LW1 equality semantics are
> sound, but not intuitive.  There's no way we can explain why for
> values, you don't get `v == v` in a way that people will say "oh, that
> makes sense."  If everything is an object, `==` should be a reasonable
> equality relation on objects.  This leads us to a somewhat painful
> shift in the semantics of equality, but once we accept that pain, I
> think things look a lot better.
> 
> Users will expect (100% reasonably) the following to work:
> 
> ```
> Point p1, p2;
> 
> p1 == p1  // true
> 
> p2 = p1
> p1 == p2  // true
> 
> Object o1 = p1, o2 = p2;
> 
> o1 == o1  // true
> o1 == o2  // true
> ```
> We ran into this problem with PackedObjects which allowed creating multiple 
> "detached" object headers that could refer to the same data.  While early 
> users found this painful, it was usually a sign they had deeper problems in 
> their code & understanding.  One of the difficulties was that depending on 
> how the PackedObjects code was written, == might be true in some cases.  We 
> found a consistent answer was better - and helped to define the user model.

I am deeply concerned that this is wishful thinking based on performance 
concerns — and validated with a non-representative audience.  I’d guess that 
most of the Packed users were experts who were reaching for packed objects 
because they had serious performance problems to solve.  (What works in a pilot 
school for gifted students with hand-picked teachers, doesn’t always scale up 
to LA County Unified.)  

I think that we muck with the intuitivess of `==` at our peril.  Of all the 
concerns i have about totality, equality is bigger that all the rest put 
together.  

>  In terms of values, is this really the model we want?  Users are already 
> used to needing to call .equals() on equivalent objects.  By choosing the 
> answer carefully here, we help to guide the right user mental model for some 
> of the other proposals - locking being a key one. 

I think this is probably wishful thinking too. A primary use case for values is 
numerics.  Are we going to tell people they can’t compare numerics with ==?  
And if we base `==` on the static type, then we’ll get different semantics when 
you convert to Object.  But conversion to Object is not a boxing conversion — 
it’s a widening conversion.  I’m really worried about this.  

>  
> While the conceptual model may be clean, it's also, as you point out, 
> horrifying.  Trees and linked structures of values become very very expensive 
> to acmp in ways users wouldn't expect.

I’m not sure about the “expect” part.  We’re telling people that values are 
“just” their state (even if that state is rich.)  Wouldn’t you then expect 
equality to be based on state?  

>  
> If we do this, users will build the mental model that values are interned and 
> that they are merely fetching the same instances from some pool of values.  
> This kind of model will lead them down rabbit holes - and seems to give 
> values an identity.  We've all seen abuses of String.intern() - do we want 
> values to be subject to that kind of code?

That’s not the mental model that comes to mind immediately for me, so let’s 
talk more about this.  

>  
> The costs here are likely quite large - all objects that might be values need 
> to be checked, all interfaces that have ever had a value implement them, and 
> of course, all value type fields plus whatever the Nullability model ends up 
> being.

I would say that _in the worst case_ the costs could be large, but in the 
common cases (e.g., Point), the costs are quite manageable — the cost of a 
comparison is a bulk bit comparison.  Thats more than a single word comparison, 
but it’s not so bad.  

I get that this is where the cost is — I said up front, this is the pill to 
swallow.  Let’s figure out what it really costs. 

> 
> 
> #### Identity hash code
> 
> Because values have no identity, in LW1 `System::identityHashCode`
> throws `UnsupportedOperationException`.  However, this is
> unnecessarily harsh; for values, `identityHashCode` could simply
> return `hashCode`.  This would enable classes like `IdentityHashMap`
> (used by serialization frameworks) to accept values without
> modification, with reasonable semantics -- two objects would be deemed
> the same if they are `==`.  (For serialization, this means that equal
> values would be interned in the stream, which is probably what is
> wanted.)
>  
> By return `hashCode`, do you mean call a user defined hashCode function?  
> Would the VM enforce that all values must implement `hashCode()`?  Is the 
> intention they are stored (growing the size of the flattened values) or would 
> calling the hashcode() method each time be sufficient?

I would prefer to call the "built-in” value hashCode — the one that is 
deterministically derived from state.  That way, we preserve the invariant that 
== values have equal identity hash codes.  

>   
> The only consistent answer here is to throw on lock operations for values.  
> Anything else hides incorrect code, makes it harder for users to debug 
> issues, and leaves a mess for the VM.  As values are immutable, the lock 
> isn't protecting anything.  Code locking on unknown objects is fundamentally 
> broken - any semantics we give it comes at a cost and doesn't actually serve 
> users.

I don’t disagree.  The question is, what are we going to do when 
Web{Logic,Sphere} turns out to be locking on user objects, and some user passes 
in a value?  Are we going to tell them “go back to Java 8 if you don’t like 
it”?  (Serious question.)  If so, then great, sign me up!  

> 


To be continue….



Reply via email to