On Aug 17, 2020, at 5:44 PM, Dan Smith <[email protected]> wrote:
Alternative story:
An inline class must not extend IdentityObject through any of its superclasses.
(That's it.)
A non-inline class implicitly implements IdentityObject if it 1) is concrete, 2)
declares an instance field, or 3) has a non-empty <init> method. Abstract
classes can also explicitly implement IdentityObject.
Changing a class so that it implements IdentityObject is a binary incompatible
change.
Now we have a highly-visible concept (IdentityObject) that programmers should
generally be aware of anyway, and they should readily understand a difference
between abstract classes that implement IdentityObject and those that don't.
Refining some of these details. I spent some time looking at abstract classes
in the JDK to figure out what the best strategy is for deciding which implement
IdentityObject.
Something I want to be careful about: adding or removing 'IdentityObject' is a
binary incompatible change. (Adding will break inline subclasses, removing will
break clients depending on subtyping—although, FWIW, the verifier doesn't check
superinterface types.) So we don't want to make it too easy to accidentally
change whether a class implements IdentityObject.
---
First, the low-hanging fruit: concrete non-inline classes (except Object)
always implement IdentityObject. An abstract class indirectly implements
IdentityObject when it extends a concrete class.
In java.base, there are 479 abstract classes, and 455 that don't extend a
concrete class.
In java.desktop (and dependencies other than java.base; built on macOS), there
are 528 abstract classes, and 491 that don't extend a concrete class.
In jdk.compiler (and dependencies other than java.base), there are 479 abstract
classes, and 459 that don't extend a concrete class.
---
Second: instance fields. An inline class can't have a superclass with instance
fields, which makes sense because there's no mechanism for mutating superclass
instance fields, during instance initialization or later in an instance method.
(It would be possible to design a factory-based initialization process,
compartmentalized across multiple superclasses, but we haven't done so and
don't plan to.)
Inner classes have an implicit "enclosing instance" field.
java.base:
455 candidates
125 extend another class with instance fields
17 more are inner classes
164 more declare their own instance fields
= 149 remaining
java.desktop:
491 candidates
118 extend another class with instance fields
16 more are inner classes
173 more declare their own instance fields
= 184 remaining
jdk.compiler:
459 candidates
210 extend another class with instance fields
21 more are inner classes
124 more declare their own instance fields
= 104 remaining
Most abstract classes—at least ⅔ in these code bases—declare or inherit
instance fields.
I think it makes sense that an abstract class that declares instance fields, or
an inner abstract class, automatically implements IdentityObject. We can infer
this both at compile time and class load time.
There's a little risk here that refactorings involving private fields could
cause surprises. Traditionally, factoring something out into a private field
(maybe for performance?) is an implementation detail. But I think we can get
programmers to buy into the mindset that declaring a field is an implicit
opt-out from inline subclasses. It's something new that requires some attention
as you're programming.
---
Third: constructors. There is no delegated instance initialization process for inline
classes, so a superclass of an inline class must not have any instance initialization
logic. Just where we draw the line on "any instance initialization logic" is
unclear.
java.base:
149 candidates
9 with a non-empty constructor (perform SecurityManager checks)
16 with a non-empty super constructor
1 with an empty constructor with a parameter (java.security.cert.CertStoreSpi)
1 with multiple constructors, all empty (java.security.SecureRandomSpi)
6 with an empty constructor with restricted access*
58 with an unrestricted empty no-arg constructor
58 with no constructor
java.desktop:
184 candidates
2 with a non-empty constructor
2 with an empty constructor with restricted access*
46 with an unrestricted empty no-arg constructor
134 with no constructor
jdk.compiler:
104 candidates
4 with a non-empty constructor
1 with an empty constructor with restricted access*
20 with an unrestricted empty no-arg constructor
79 with no constructor
(*Restricted access means a package-access or private constructor in a public
or protected class, and a private constructor in a package-access class.)
As a baseline, all the "no constructor" cases should be inline superclass
candidates, so should not implement IdentityObject.
However, it's quite common to redundantly declare an empty constructor, e.g.
for documentation purposes. (In fact, javac just added an optional lint warning
supporting a coding convention in which constructors should always be
explicit—see JDK-8071961.) Those classes probably shouldn't be IdentityObjects,
either.
Restricted access doesn't necessarily need to prevent inline subclasses, and
doesn't have much to do with identity. We could leave it to the compiler to
check super constructor access. But how to enforce that at run time? (There is
no 'invokespecial' in the bytecode....)
Constructor parameters are a borderline signal that you expect something to
happen on instance initialization, even if you don't actually do anything with
the parameter right now. If you have *multiple* constructors, maybe we only
care about the no-args one.
And then there are constructors with something more than 'super()' in their
body. (Instance initializers, too, although I found no examples of them in any
of the abstract classes I looked at.) These classes clearly need to implement
IdentityObject in order to guarantee the traditional superclass initialization
behavior.
An approach I like, though it may be too disruptive: it's a compile-time error
to declare a non-empty constructor in an abstract class unless that class
implements IdentityObject (explicitly, indirectly, or by virtue of having
fields). That would trigger ~10 compiler errors in java.base, 2 in
java.desktop, and 4 in jdk.compiler. The advantage is that there's no ambiguity
or surprises—for an abstract class without fields, you either opt in to
IdentityObject explicitly, or you implicitly assert that you don't intend to,
and then the compiler catches any problems with that assertion.
Alternatively, we can once again infer 'implements IdentityObject' if a
constructor has parameters or a body consisting of anything other than
'super()'. It's fairly easy to change the implicit 'implements' by doing
something like adding a 'println' statement—but in practice, those sorts of
changes may not be all that likely.
In the JVM:
- We've discussed a flag on <init> methods to indicate that they are "empty". It seems a shame to do that when
the compiler can also generate 'implements IdentityObject'. But this is a "wrong default" situation—abstract classes
need to be presumed IdentityObjects unless they explicitly choose not to be. (I worry, e.g., about bytecode generators that
didn't get the memo and continue to assume checks in their <init> methods will always be executed.) So I think we'll need
a rule that says you implicitly implement IdentityObject at class load time unless you have a special "empty"
<init> method.
- Encoding "empty": current proposal is that an empty <init> is ACC_ABSTRACT
and has no Code attribute. We could revisit if we like, maybe with a new flag, or a class-level
attribute, but this still seems pretty reasonable.
- How to handle access control? I do think it's worth supporting limited access somehow
(totally reasonable to allow inline subclasses in the current package, but reject subclasses
elsewhere). I don't love any of the options, but maybe the best approach is to add a
preparation- or initialization-time check that the superclass has a "<init>()V"
method with appropriate access? Or maybe we could have some kind of class-level attribute that
controls access for subclassing (this is sort of like sealing)?
Hmm, the more I think about this, the less enthusiastic I am about encoding important class
properties in an "<init>" method that's never going to be referenced...
---
Finally: synchronized methods. These strike me as a pretty subtle
implementation detail. But they're also *very* closely tied to the semantics of
IdentityObject. Among our remaining candidate classes, they're pretty rare:
java.base:
123 candidates
7 with synchronized methods
= 116 can safely allow inline subclasses (24% of all abstracts)
java.desktop:
182 candidates
7 with synchronized methods
= 175 can safely allow inline subclasses (33% of all abstracts)
jdk.compiler:
100 candidates
0 with synchronized methods
= 100 can safely allow inline subclasses (21% of all abstracts)
So synchronized methods on these field-free, initialization-free classes are
pretty rare, but not unheard of. (Examples: java.net.URLStreamHandler,
java.io.InputStream, java.lang.invoke.MethodHandleImpl.)
Again, I think we can go one of two ways:
- It's a compiler error to declare a 'synchronized' method if you don't already
implement IdentityObject.
or
- Classes with synchronized methods implicitly implement IdentityObject.
The first strategy would produce 7 new compiler errors in java.base and 7 more
in java.desktop... but fixing those errors might prompt some reflection about
the tradeoff between synchronizing on 'this' and allowing inline subclasses.
The second strategy is frictionless to adopt, but may lead to surprises when
code is refactored (note that 'synchronized' isn't documented by javadoc).
In the JVM: I'm much less concerned about surprising inline subclasses in this
case. The bytecode generator can explicitly implement IdentityObject if they
want, and if they didn't/forgot/weren't aware of it, invoking the method on an
inline class instance will throw an exception, just as it would if they'd used
'monitorenter' in the method body instead.