Re: [External] : Re: Draft JVMS changes for Primitive Objects (JEP 401)

Dan Smith Tue, 07 Sep 2021 17:15:35 -0700

> On Aug 11, 2021, at 1:24 PM, Dan Heidinga <heidi...@redhat.com> wrote:
> 
> And continuing on the "long-overdue" theme, here's my long-overdue
> review of the spec changes.
> 
> A big thank you to you, Dan S., for the careful spec writeup efforts.
> I think this captures our discussions well.
> 
> --Dan
> 
> == Section 2.11.5 Object Creation and Manipulation
>> Create a new class instance: new, withfield.
> Should that also include "defaultvalue"? The semantics aren't quite
> the same because of the structural equality of primitive class types
> but it is conceptually very similar.  And in the instruction section,
> we state "The defaultvalue instruction is similar to the new
> instruction" which lends credence to including it in this list.


Hmm, that is a bit inconsistent. There are two conflicting perspectives:
- It's analogous to 'ldc' or 'aconst_null', putting a well-known constant on 
the stack (so belongs in 2.11.2)
- It's analogous to 'new', putting a fresh instance on the stack (so belongs in 
2.11.5)

I think I lean towards emphasizing the "load a constant" nature of the 
operation, but you're right that it's not a consistent view throughout the 
document.

> == Section 4.1 The ClassFile Structure
> The  `ACC_PRIM_SUPER` flag is introduced and restrictions on classes
> with the flag are called out in various sections such as:
> 4.5 Fields > ACC_PRIM_SUPER flag set, each field must have its
> ACC_STATIC flag set.
> 4.6 Methods > In a primitive class, and in an abstract class that has
> its ACC_PRIM_SUPER flag set, a method that has its ACC_SYNCHRONIZED
> flag set must also have its ACC_STATIC flag set.
> 5.3 Creation and Loading > implements PrimitiveObject if the opposite
> is true (ACC_PRIM_SUPER, no instance initialization method).
> 
> I didn't see static constraints called out to enforce these
> restrictions (should they be?).  Having the handling of the
> ACC_PRIM_SUPER in one place would make the VM's job of validating it
> easier.

Most of Chapter 4 provides the specification for format checking—including 4.5 
and 4.6. Compare the restrictions on fields and methods of interfaces appearing 
in 4.5 and 4.6.

There's a perspective shift here from what you're looking for—rather than 
saying "an ACC_PRIM_SUPER class file is validated as follows: ...", we treat 
ACC_PRIM_SUPER as a statement of fact, and then *other* things in the class 
file are validated with that fact in mind. This avoids forcing everything about 
the new feature into a sidebar, as if it's not part of the "core" JVM.

Anyway, I'd suggest implementing & thinking about validation in the same way 
you implement validation of ACC_INTERFACE-related constraints.

> == 4.6 Methods
>> Design discussion: this section requires that unnamed factory methods (named 
>> <new>) are static
>> and have a return type that matches their declaring class or interface. By 
>> restricting the
>> descriptor in this way, clients can rely on a predictable, useful return 
>> type.
>> 
>> Alternatively, we could allow a subtype or supertype as the return type, or 
>> impose no constraints
>> at all. One potential use case is a hidden class, which is incapable of 
>> naming its class type in
>> a descriptor.
> 
> Because these are static methods, I thought we had agreed they could
> name any superclass as the return value due to the hidden class
> requirements.  Even though this allows some strange behaviour (ie:
> after some bytecode manipulation) such as the following pseudo-code
> shows:
> ```
> primitive class Strange {
>    Strange() { //<new>()Ljava/lang/Object;
>        return new String();
>    }
> }
> ```
> The contract on `<new>` is more convention than requirement.  In cases
> where the return value needs to be used as a primitive value, it would
> need to go through a checkcast to validate it when a different return
> type is named.
> 
> While this doesn't give the hidden class full powers to be checked in
> the checkcast, it can still be checked against the PrimitiveObject
> interface or its ACC_PRIM_SUPER type.  Seems like a reasonable setup
> and avoids the VM having to check the name matches on the return type
> of the descriptor.

Something we need to investigate more is how factory methods get surfaced in 
reflection. I could imagine clients like reflection really wanting a guarantee 
that when you call Foo.<new>, you get a Foo. But it depends on how it is 
presented in the API.

So, still an open question, which is why I listed both alternatives.

Hidden classes did indeed push us to be more permissive, but in retrospect, 
that really shouldn't drive the choice: if you insist on putting a factory-like 
method in your hidden class, but can't follow the JVM's rules, you're fully 
capable of spinning your own static method using whatever name you want.

>> A method of a class or interface named <new> (2.9.4) must have its 
>> ACC_STATIC flag set.
> 
> Should interfaces be able to implement `<new>` or should we prevent
> that like we do for `<init>`?  Preventing it now gives us the most
> freedom later if we retcon this as a general object factory.
> 
>> If the name of the method is <new>, then the descriptor must denote a return 
>> type
>> that is a type of the current class or interface. For a primitive class, the 
>> return
>> type must be an inlinable reference type.
> 
> Same questions about requiring the return type to match (I don't think
> we should) and if we should prevent interfaces from implementing it (I
> think yes).

Yeah, more of the same question about what these things are really for, and how 
they are meant to be interpreted. I picked one point in the space ("a factory 
method creates an instance of this class or interface via ad hoc program 
logic"); there are others, ranging from "a factory method is an arbitrary 
method with a funny name" to "a factory method is, exclusively, the 
construction mechanism for a primitive class".

> == 4.7.31 The JavaFlags Attribute
>> We're having some good internal discussions about default values & null, and 
>> will send something out when that settles into something stable.
> I expect (hope?) this will change as the internal discussion
> solidifies.  I'm not a fan of this "kitchen sink" approach as it
> becomes an attractive nuisance to wedge other flags into.  The
> suggestion to use more focused attributes (`PrimitiveClassProperties`
> & `ReferenceDefaultPrimitiveClass`) matches the existing conventions
> for naming / using attributes.

Yes, worth revisiting when we get further along. The idea here is that javac 
just needs some space to stash its special metadata, and things have suffered 
because we don't have that space (see, e.g., ACC_ENUM). But as we tweak things, 
TBD how much stuff ends up being "javac special metadata".

> == 5.3.5 Deriving a Class from a class File Representation
>> Alternatively, we could more uniformly claim that the class is "considered 
>> to implement" the
>> expected interface, regardless of what it implements by inheritance. The 
>> difference in
>> behavior might be observable, say, via reflection.
> I think we need to honour programmer intent here.  If a class
> implements one of the interfaces by inheritance, then the programmer
> has specified their intent and we should go with it (or flag it as an
> error).

There would be no error. The scenario is whether:

class Foo implements IdentityObject
class Bar extends Foo

gets interpreted as

class Bar extends Foo [implements IdentityObject]

or just

class Bar extends Foo

In other words, how much do we "prune" redundant inferred supers? More work for 
you to prune them, but maybe the results of reflection are more intuitive with 
pruning.

>> An abstract class implements IdentityObject if it declares an instance 
>> initialization method
>> and does not have its ACC_PRIM_SUPER flag set; and implements 
>> PrimitiveObject if the opposite
>> is true (ACC_PRIM_SUPER, no instance initialization method). Instance 
>> initialization methods
>> and ACC_PRIM_SUPER represent two channels for subclass instance creation, 
>> and this analysis
>> determines whether only one channel is "open".
> 
> The rules expressed here don't cover the cases outlined in chapter 4 -
> namely that a class that has ACC_PRIM_SUPER must only have static
> fields and only its static methods can be synchronized.

Correct. The Chapter 4 rules are enforced at Step 2 of 5.3.5. The inference of 
supertypes (and related constraints) is left to Step 5.

>> Alternatively, we could ignore instance initialization methods and rely 
>> entirely on
>> ACC_PRIM_SUPER. In practice, abstract classes written in the Java 
>> programming language always
>> have instance initialization methods, so the difference in behavior is only 
>> relevant to classes
>> produced via other languages or tools.
> 
> My preference is for the VM to check the rules against the explicitly
> set ACC_PRIM_SUPER bit.  It means the language can own when to set it
> and the VM only has to do consistency checks on it.

Aesthetically, I like the idea that there are parallel mechanisms for inferring 
IdentityObject and inferring PrimitiveObject. But, yeah, our particular needs 
are met whether you can infer PrimitiveObject or not.

> == 5.4.3.1 Field Resolution
>> Thus, a field reference with a type like Qjava/lang/String; is permitted. 
>> Since it's impossible
>> to declare a field with such a type (see 5.4.2), resolution of the reference 
>> will fail anyway
>> with a NoSuchFieldError.
> 
> I'm a little confused by "resolution of the reference will fail with
> NoSuchFieldError" as my reading of 5.4.2 says we would reject any
> class that has a Q descriptor that doesn't name a Q type.  How would
> resolution of such a field reference ever occur?

The choices are:

1) Explicitly check descriptors of field references at resolution time for Q 
types in the descriptors that don't name valid primitive classes. If any are 
found, resolution fails with an ICCE.

2) Allow resolution to run its course, which will inevitably fail with a NSFE.

Just a question of which error happens (with associated rules for checking).

> == Bytecodes
> new > tolerable because the Identity class requires no initialization.
> Do we need changes to the JVMTI spec to indicate that Identity isn't
> passed to the ClassFileLoadHook and is not modifiable?  And maybe a
> rule in static constraints that says Identity has an abstract <init>
> method? (Or have we dropped that idea?)  We need a way to say, via the
> spec, that code in Identity.<init>()V will never run, no matter how a
> user adds it there.

At a minimum, the spec for 'new' describes what happens if you say 'new 
java/lang/Object', and if you try to hack the Identity class via JVMTI, you 
need to be aware of that spec.

Whether JVMTI wants to do require something more for sanity checking is a good 
question...

For constraining the standard library, eh, I'm pretty happy leaving that to the 
standard library. (And, no, we don't have abstract <init> methods anymore.) No 
need to make sure standard library classes are defined the way they're supposed 
to be defined. If somebody hacks those binaries, again, the spec for 'new' says 
what happens, the hacker shouldn't be surprised.

> withfield > use of the withfield instruction is restricted to
> nestmates of the field's declaring class.
> I'm glad to see this as discussion on this went around and around.
> Limiting to the nest still seems like the right choice to me.  So +1
> to this.

Cool. Perhaps someday we'll expand it with normal access controls, but it 
requires a separate mechanism from the (read-oriented) field access flags.

Re: [External] : Re: Draft JVMS changes for Primitive Objects (JEP 401)

Reply via email to