Thanks Dan for putting the work in to provide a credible alternative.
Let me add some background for how we came up with these things. At
some point we asked ourselves, what if we had identity and value classes
from day 1? How would that affect the object model? And we concluded
at the time that we probably wouldn't want the identity-indeterminacy of
Object, but instead would want something like
abstract class Object
class IdentityObject extends Object { }
abstract class ValueObject extends Object { }
So the {Identity,Value}Object interfaces seemed valuable pedagogically,
in that they make the object hierarchy reflect the language division.
At the time, we imagined there might be methods that apply to all value
objects, that could live in ValueObject.
A separate factor is that we were taking operations that were previously
total (locking, weak refs) and making them partial. This is scary! So
we wanted a way to make these expressible in the static type system.
Unfortunately, the interfaces do not really deliver on either goal,
because we can't turn back time. We still have to deal with `new
Object()`, so we can't (yet) make Object abstract. Many signatures will
not be changeable from "Object" to "IdentityObject" for reasons of
compatibility, unless we make IdentityObject erase to Object (which has
its own problems.) If people use it at all for type bounds, we'll see
lots of uses of `Foo<? extends Bar&IdentityObject>`, which will put more
pressure on our weak support for intersection types. And dynamic errors
will still happen, because too much of the world was built using
signatures that don't express identity-ness. (Kevin will see a parallel
to introducing nullness annotations; it might be fine if you build the
world that way from scratch, but the transition is painful when you have
to interpret an unadorned type as "of unspecified identity-ness.")
Several years on, we're still leaning on the same few motivating
examples -- capturing things like "I might lock this" in the type
system. That we haven't come up with more killer examples is notable.
And I grow increasingly skeptical of the value of the locking example,
both because this is not how concurrent code is written, and because we
*still* have to deal with the risk of dynamic errors because most of the
world's code has not been (and will not be) written to use
IdentityObject throughout.
As Dan points out, the main thing we give up by backing off from these
interfaces is the static typing; we don't get to use `IdentityObject` as
a parameter type, return type, or type bound. And the only reason we've
come up with so far to want that is a pretty lame one -- locking.
From a language design perspective, I find that you declare a class
with `value class`, but you express the subclassing constraint with
`extends IdentityObject`, to be pretty leaky.
On 3/22/2022 7:56 PM, Dan Smith wrote:
In response to some encouragement from Remi, John, and others, I've decided to
take a closer look at how we might approach the categorization of value and
identity classes without relying on the IdentityObject and ValueObject
interfaces.
(For background, see the thread "The interfaces IdentityObject and ValueObject must
die" in January.)
These interfaces have found a number of different uses (enumerated below),
while mostly leaning on the existing functionality of interfaces, so there's a
pretty good complexity vs. benefit trade-off. But their use has some rough
edges, and inserting them everywhere has a nontrivial compatibility impact. Can
we do better?
Language proposal:
- A "value class" is any class whose instances are all value objects. An "identity class" is any
class whose instances are all identity objects. Abstract classes can be value classes or identity classes, or neither.
Interfaces can be "value interfaces" or "identity interfaces", or neither.
- A class/interface can be designated a value class with the 'value' modifier.
value class Foo {}
abstract value class Bar {}
value interface Baz {}
value record Rec(int x) {}
A class/interface can be designated an identity class with the 'identity'
modifier.
identity class Foo {}
abstract identity class Bar {}
identity interface Baz {}
identity record Rec(int x) {}
- Concrete classes with neither modifier are implicitly 'identity'; abstract
classes with neither modifier, but with certain identity-dependent features
(instance fields, initializers, synchronized methods, ...) are implicitly
'identity' (possibly with a warning). Other abstract classes and interfaces are
fine being neither (thus supporting both kinds of subclasses).
- The properties are inherited: if you extend a value class/interface, you are
a value/class interface. (Same for identity classes/interfaces.) It's an error
to be both.
- The usual restrictions apply to value classes, both concrete and abstract; and also to
"neither" abstract classes, if they haven't been implicitly made 'identity'.
- An API ('Object.isValueObject()'?) allows for dynamically distinguishing between value
objects and identity objects. The reflection API (in java.lang.Class) allows for
detection of value classes/interfaces, identity classes/interfaces, and
"neither" classes/interfaces.
- TBD whether/how we track these properties statically so that the type system
catch mismatches between non-identity class types and uses that assume identity.
JVM proposal:
- Same conceptual framework.
- Classes can be ACC_VALUE, ACC_IDENTITY, or neither.
- Legacy-version classes are implicitly ACC_IDENTITY. Legacy interfaces are
not. Optionally, modern-version concrete classes are also implicitly
ACC_IDENTITY.
(Trying out this alternative approach to abstract classes: there's no more
ACC_PERMITS_VALUE; instead, legacy-version abstract classes are automatically
ACC_IDENTITY, and modern-version abstract classes permit value subclasses
unless they opt out with ACC_IDENTITY. It's the bytecode generator's
responsibility to set these flags appropriately. Conceptually cleaner, maybe
too risky...)
- At class load time, we inherit value/identity-ness and check for conflicts. It's okay
to have neither flag set but inherit the property from one of your supers. We also
enforce constraints on value classes and "neither" abstract classes.
---
So how does this score as a replacement for the list of features enabled by the
interfaces?
- Dynamic detection: 'obj instanceof ValueObject' is quite straightforward; if
we can replace that with 'obj.isValueObject()', that feels about equally
useful. (I'd be more pessimistic about something like
'Objects.isValueObject(obj)'.)
- Subclass restriction: 'implements IdentityObject' has been replaced with the 'identity'
modifier. Complexity cost of special modifiers seems on par with the complexity of
special rules for inferring and checking the superinterfaces. I think it's a win that we
use the 'value' modifier and "value" terminology for all kinds of
classes/interfaces, not just concrete classes.
- Variable types: I don't see a good way to get the equivalent of an 'IdentityObject' type. It
would involve tracking the 'identity' property through the whole type system, which seems like a
huge burden for the occasional "I'm not sure you can lock on that" error message. So we'd
probably need to be okay letting that go. Fortunately, I'm not sure it's a great loss—lots of code
today seems happy using 'Object' when it means, informally, "object that I've created for the
sole purpose of locking".
- Type variable bounds: this one seems more achievable, by using the 'value' and
'identity' keywords to indicate a new kind of bounds check ('<identity T extends
Runnable>'). Again, it's added complexity, but it's more localized. We should
think more about the use cases, and decide if it passes the cost/benefit analysis. If
not, nothing else depends on this, so it could be dropped. (Or left to a future, more
general feature?)
- Documentation: we've lost the handy javadoc location to put some explanations
about identity & value objects in a place that curious programmers can easily
stumble on. Anything we want to say needs to go in JLS/JVMS (or perhaps the
java.lang.Object javadoc).
- Compatibility: pretty clear win here. No interface injection means tools that
depend on reflection results won't be broken. (We've found a significant number
of these problems in our own code/tests, FWIW.) No new static types means
inference results won't change. There's less risk of incompatibilities when
adding/removing the 'identity' and 'value' keywords (although there can still
be source, binary, and behavioral incompatibilities).