Re: IdentityObject & abstract superclasses

Brian Goetz Thu, 20 Aug 2020 15:26:25 -0700

One degree of freedom that you've omitted, which might help you wiggleout of some of the uncomfortable corners, is warning-now-error-later.

Your "I like, but might be disruptive" approach is to error on the caseswhere it is not super clear. But by starting with a warning, errorslike the ~10 in java.base would have some time to explicitly say`implements IdObject` without it seeming like a world-breaking change.




On 8/20/2020 6:07 PM, Dan Smith wrote:

On Aug 17, 2020, at 5:44 PM, Dan Smith <[email protected]> wrote:

Alternative story:

An inline class must not extend IdentityObject through any of its superclasses. 
(That's it.)

A non-inline class implicitly implements IdentityObject if it 1) is concrete, 2) 
declares an instance field, or 3) has a non-empty <init> method. Abstract 
classes can also explicitly implement IdentityObject.

Changing a class so that it implements IdentityObject is a binary incompatible 
change.

Now we have a highly-visible concept (IdentityObject) that programmers should 
generally be aware of anyway, and they should readily understand a difference 
between abstract classes that implement IdentityObject and those that don't.

Refining some of these details. I spent some time looking at abstract classes 
in the JDK to figure out what the best strategy is for deciding which implement 
IdentityObject.

Something I want to be careful about: adding or removing 'IdentityObject' is a 
binary incompatible change. (Adding will break inline subclasses, removing will 
break clients depending on subtyping—although, FWIW, the verifier doesn't check 
superinterface types.) So we don't want to make it too easy to accidentally 
change whether a class implements IdentityObject.

---

First, the low-hanging fruit: concrete non-inline classes (except Object) 
always implement IdentityObject. An abstract class indirectly implements 
IdentityObject when it extends a concrete class.

In java.base, there are 479 abstract classes, and 455 that don't extend a 
concrete class.

In java.desktop (and dependencies other than java.base; built on macOS), there 
are 528 abstract classes, and 491 that don't extend a concrete class.

In jdk.compiler (and dependencies other than java.base), there are 479 abstract 
classes, and 459 that don't extend a concrete class.

---

Second: instance fields. An inline class can't have a superclass with instance 
fields, which makes sense because there's no mechanism for mutating superclass 
instance fields, during instance initialization or later in an instance method. 
(It would be possible to design a factory-based initialization process, 
compartmentalized across multiple superclasses, but we haven't done so and 
don't plan to.)

Inner classes have an implicit "enclosing instance" field.

java.base:
455 candidates
125 extend another class with instance fields
17 more are inner classes
164 more declare their own instance fields
= 149 remaining

java.desktop:
491 candidates
118 extend another class with instance fields
16 more are inner classes
173 more declare their own instance fields
= 184 remaining

jdk.compiler:
459 candidates
210 extend another class with instance fields
21 more are inner classes
124 more declare their own instance fields
= 104 remaining

Most abstract classes—at least ⅔ in these code bases—declare or inherit 
instance fields.

I think it makes sense that an abstract class that declares instance fields, or 
an inner abstract class, automatically implements IdentityObject. We can infer 
this both at compile time and class load time.

There's a little risk here that refactorings involving private fields could 
cause surprises. Traditionally, factoring something out into a private field 
(maybe for performance?) is an implementation detail. But I think we can get 
programmers to buy into the mindset that declaring a field is an implicit 
opt-out from inline subclasses. It's something new that requires some attention 
as you're programming.

---

Third: constructors. There is no delegated instance initialization process for inline 
classes, so a superclass of an inline class must not have any instance initialization 
logic. Just where we draw the line on "any instance initialization logic" is 
unclear.

java.base:
149 candidates
9 with a non-empty constructor (perform SecurityManager checks)
16 with a non-empty super constructor
1 with an empty constructor with a parameter (java.security.cert.CertStoreSpi)
1 with multiple constructors, all empty (java.security.SecureRandomSpi)
6 with an empty constructor with restricted access*
58 with an unrestricted empty no-arg constructor
58 with no constructor

java.desktop:
184 candidates
2 with a non-empty constructor
2 with an empty constructor with restricted access*
46 with an unrestricted empty no-arg constructor
134 with no constructor

jdk.compiler:
104 candidates
4 with a non-empty constructor
1 with an empty constructor with restricted access*
20 with an unrestricted empty no-arg constructor
79 with no constructor

(*Restricted access means a package-access or private constructor in a public 
or protected class, and a private constructor in a package-access class.)

As a baseline, all the "no constructor" cases should be inline superclass 
candidates, so should not implement IdentityObject.

However, it's quite common to redundantly declare an empty constructor, e.g. 
for documentation purposes. (In fact, javac just added an optional lint warning 
supporting a coding convention in which constructors should always be 
explicit—see JDK-8071961.) Those classes probably shouldn't be IdentityObjects, 
either.

Restricted access doesn't necessarily need to prevent inline subclasses, and 
doesn't have much to do with identity. We could leave it to the compiler to 
check super constructor access. But how to enforce that at run time? (There is 
no 'invokespecial' in the bytecode....)

Constructor parameters are a borderline signal that you expect something to 
happen on instance initialization, even if you don't actually do anything with 
the parameter right now. If you have *multiple* constructors, maybe we only 
care about the no-args one.

And then there are constructors with something more than 'super()' in their 
body. (Instance initializers, too, although I found no examples of them in any 
of the abstract classes I looked at.) These classes clearly need to implement 
IdentityObject in order to guarantee the traditional superclass initialization 
behavior.

An approach I like, though it may be too disruptive: it's a compile-time error 
to declare a non-empty constructor in an abstract class unless that class 
implements IdentityObject (explicitly, indirectly, or by virtue of having 
fields). That would trigger ~10 compiler errors in java.base, 2 in 
java.desktop, and 4 in jdk.compiler. The advantage is that there's no ambiguity 
or surprises—for an abstract class without fields, you either opt in to 
IdentityObject explicitly, or you implicitly assert that you don't intend to, 
and then the compiler catches any problems with that assertion.

Alternatively, we can once again infer 'implements IdentityObject' if a 
constructor has parameters or a body consisting of anything other than 
'super()'. It's fairly easy to change the implicit 'implements' by doing 
something like adding a 'println' statement—but in practice, those sorts of 
changes may not be all that likely.

In the JVM:

- We've discussed a flag on <init> methods to indicate that they are "empty". It seems a shame to do that when 
the compiler can also generate 'implements IdentityObject'. But this is a "wrong default" situation—abstract classes 
need to be presumed IdentityObjects unless they explicitly choose not to be. (I worry, e.g., about bytecode generators that 
didn't get the memo and continue to assume checks in their <init> methods will always be executed.) So I think we'll need 
a rule that says you implicitly implement IdentityObject at class load time unless you have a special "empty" 
<init> method.

- Encoding "empty": current proposal is that an empty <init> is ACC_ABSTRACT 
and has no Code attribute. We could revisit if we like, maybe with a new flag, or a class-level 
attribute, but this still seems pretty reasonable.

- How to handle access control? I do think it's worth supporting limited access somehow 
(totally reasonable to allow inline subclasses in the current package, but reject subclasses 
elsewhere). I don't love any of the options, but maybe the best approach is to add a 
preparation- or initialization-time check that the superclass has a "<init>()V" 
method with appropriate access? Or maybe we could have some kind of class-level attribute that 
controls access for subclassing (this is sort of like sealing)?

Hmm, the more I think about this, the less enthusiastic I am about encoding important class 
properties in an "<init>" method that's never going to be referenced...

---

Finally: synchronized methods. These strike me as a pretty subtle 
implementation detail. But they're also *very* closely tied to the semantics of 
IdentityObject. Among our remaining candidate classes, they're pretty rare:

java.base:
123 candidates
7 with synchronized methods
= 116 can safely allow inline subclasses (24% of all abstracts)

java.desktop:
182 candidates
7 with synchronized methods
= 175 can safely allow inline subclasses (33% of all abstracts)

jdk.compiler:
100 candidates
0 with synchronized methods
= 100 can safely allow inline subclasses (21% of all abstracts)

So synchronized methods on these field-free, initialization-free classes are 
pretty rare, but not unheard of. (Examples: java.net.URLStreamHandler, 
java.io.InputStream, java.lang.invoke.MethodHandleImpl.)

Again, I think we can go one of two ways:

- It's a compiler error to declare a 'synchronized' method if you don't already 
implement IdentityObject.

or

- Classes with synchronized methods implicitly implement IdentityObject.

The first strategy would produce 7 new compiler errors in java.base and 7 more 
in java.desktop... but fixing those errors might prompt some reflection about 
the tradeoff between synchronizing on 'this' and allowing inline subclasses. 
The second strategy is frictionless to adopt, but may lead to surprises when 
code is refactored (note that 'synchronized' isn't documented by javadoc).

In the JVM: I'm much less concerned about surprising inline subclasses in this 
case. The bytecode generator can explicitly implement IdentityObject if they 
want, and if they didn't/forgot/weren't aware of it, invoking the method on an 
inline class instance will throw an exception, just as it would if they'd used 
'monitorenter' in the method body instead.

Re: IdentityObject & abstract superclasses

Reply via email to