In the Java language fields can be final or not, and independently can be access controlled at one of four levels of access: public, protected, package, and private.
Final fields cannot be written to except under very narrow circumstances: (a) In an initialization block (static initializer or constructor body), and (b) only if the static compiler can prove there has been no previous write (based on the rules of the language). We are adding inline classes, whose non-static fields are always final. (There are possible meanings for non-final fields of inline classes, but nothing I’m saying today interacts or interferes with any known such meanings.) Behaviorally, an inline class behaves like a class with all-final non-static fields, *and* it has its identity radically suppressed by the JVM. In the language, a constructor for an inline class is approximately indistinguishable from a constructor for a regular class with all-final non-static fields. In particular, a constructor of any class (inline or regular identity) is empowered, by rules of the the language, to set each of its (final, non-static) fields exactly once along any path through the constructor. All of this hangs together nicely. When we translate to the JVM, the reading of any non-static field always uses the getfield instruction, and the access checks built into the JVM enforce the language access rules for that field—and this is true equally for inline and identity classes (the JVM doesn’t care). However, we have to use distinct tactics for translating assignments to fields. The existing putfield instruction has no possible applicability to inline classes, because it assumes you can pass it an instance pointer, execute it, and the *same instance pointer* will refer to the updated instance. This cannot possibly work with inline classes (unless we add a whole new layer of “larval” states to inline classes—which would not be thrifty design). Instead, setting the field of an inline class needs a new bytecode , a new sibling of getfield and putfield, which we call withfield. Its output is a new instance of the same inline class whose field values are all identical to those in the old instance, except for the one field referred to by the withfield instruction. Thus: * getfield consumes a reference and returns a value (I) → (F) * putfield consumes both and returns a side effect (I F) & state → () & state′ * withfield consumes same as putfield and produces a new instance (I F) → (I′) The access checking rules are fairly uniform for all of these instructions. If the field F of C has protection level P, unless a client has access to level P of C, then it cannot execute (cannot even resolve) the instruction that tries to access F. In the case of putfield or withfield, if F is final (and for withfield that is currently always the case, though that could change), then an additional check is made, to ensure that F is only being set in a legitimate context. More in a moment on what “legitimate” means for this “context”. The getfield instruction only has to pass the access check, and then the client has full access to read the value of the field. This works pleasingly like the source-level expression which fetches the field value. Currently, for a non-static final field, both “putfield” and “withfield” are generated only inside of constructors, which have rigid rules, in the source language, that ensure nothing too fishy can happen. For an identity class C, it would be extremely fishy if the classfile of C were able to execute putfield instructions outside of one of C’s constructors. The reason for this is that a constructor of C would be able to produce a supposedly all-final instance of C, but then some other method of C would be (in principle) be able to overwrite one of C’s supposedly final fields with some other value, by executing a putfield instruction in that other method. Now, the JVM doesn’t fully trust final fields even today (because they change state at most once from default to some other value), but if maliciously spun classfiles were able to perform “putfield” at will on fully constructed objects, it might be possible to create paradoxes that could lead to unpredictable behavior. For this reason, not only doesn’t the JVM fully trust final fields, but it also forbids classes from executing putfield on their own final fields, except inside of constructors. In essence, putfield on a final field is a special restricted operating mode of putfield which has unusually tight restrictions on its execution. In this note I’d like to call it out with a special name, putfield-on-a-final. Note that the JVM does *not* fully enforce the Java source language rules for field initialization: At the JVM level, a constructor can run putfield-on-a-final, on some given field, zero, one, or many times, where the Java language requires at most one, and exactly one on normal exits. The JVM simply provides a reasonable backstop check, preventing certain failure modes due either to javac bugs or (what’s more sinister) intentionally broken class files. The main responsibility for ensuring the integrity of some class C is, and always will be, C’s compilation unit C.java, as faithfully compiled by javac into a nest of classes containing at least C.class maybe other nestmates. This is an important point to back up and take notice of: While the JVM can perform some basic checks to help some class C maintain its encapsulation boundary, the responsibility for the meaning of the encapsulation, and the restrictions and/or freedoms within that boundary, are the sole responsibility of the programmer of C.java. If I, the author of C, am claiming that, of two fields, one is always non-null, then it is up to me to enforce those rules in all states of my class, including constructors (start states) and any methods which can create new states (whether constructors or regular methods). A working hypothesis on our project so far has been that withfield is so much like putfield, and inline instance fields are so much like final identity instance fields, that parallel restrictions are appropriate for the two instructions. Penciling this out, we would get to a place where a class C can only issue putfield or withfield instructions inside its own constructors. This is a consistent view, but I do not believe that it is the best view, and I’d like to decouple withfield from putfield-on-a-final to be more like plain old putfield, in some ways. My aim here is to keep withfield alive as a tool for likely future translation strategies (including of non-Java languages), which exposes, not the current envisioned uses of withfield in Java constructors, but its natural set of capabilities in the JVM. What is the natural set of capabilities of withfield? It is more basic and fundamental than putfield-on-a-final, and at the same time does *more* than putfield-on-a-final. Note that putfield-on-a-final is just one operation out of a suite of required operations in a constructor of a class (since you need a putfield-on-a-final for each of the class’s final fields, according to Java rules). Note on the other hand that withfield has the same effect as running a constructor which copies out all the old fields from the old instance and writes the new value into the selected field, then returns the new instance. Seen from this point of view, withfield is both simpler and more powerful than putfield-on-a-final, and does not fit at all into an easy analogy. The withfield instruction is also inherently more secure than putfield-on-a-final, because its design does not allow it to invalidate any pre-existing instance; it can only ever create a new instance. The set of security failure modes for withfield is completely different from putfield-on-a-final. This means that there is no particular reason to restrict withfield to execute only in constructors. What about creating an *invalid* new instance? Well, that’s where the JVM says, “it’s not my responsibility”. As noted above, the sole responsibility for defining and enforcing the invariants of an encapsulation is the human author of the original source file. The JVM protects this encapsulation, not by reading the user’s mind, but by enforcing boundaries, primarily the boundary around the nest of classes that result from the compilation of C.java. Within the nest, any type can access any private member of any nestmate. Outside the nest, private members are strictly inaccessible. (This strict rule can be bent by special reflection modes, and by nestmate injection, but it can’t be broken.) Under this theory, the withfield instruction is the elemental factory mechanism for creating new inline classes. The coder of the source file defining the field has full control to create new instances with arbitrary field settings. In the current language, this still goes only through user-written constructors, but that could change. In any case, the JVM design needs to support the language and *also* natural abilities of the JVM. This leads me to what I think is the right design for withfield. The permission to execute withfield should be derived, not from its placement within a constructor, but rather from its placement in a nest. In effect, when you execute withfield, you should get access checked as if the field you were referring to is private, even if it has some other marking (public, protected, package). That other marking is good and useful, but it pertains only to getfield. This doesn’t call for any change to today’s translation strategies, but it unlocks the JVM’s natural abilities for future strategies and features. Why make the change? After all, restricting withfield like putfield-on-a-final doesn’t hurt anything today. Suppose some language feature in the future requires ad hoc field replacement. (I call one version of such a feature “reconstructors”, and another “with-expressions”.) In that case, javac can contrive synthetic constructors which isolate all required withfield instructions, so that the putfield-on-a-final constraints can be satisfied. But there’s a cost to this: Those synthetic constructors become extra noise in the classfile, and if they are opened outside the nest, they can be security hazards. Another cost is the loss of dynamicity: You can’t inject a hidden class to work on your inline class if the hidden class can only define its own constructors, right? But I think we have learned some lessons about fancy compile-time adapters: They are complex, they obscure the code for the JIT, they can open up surprise encapsulation flaws, they cannot be assigned dynamically. The nestmate work improves all of these problems, by uniformly defining private access to apply equally to all members of a nest, not just to a single class. Although the nestmate access rules themselves are more complex than the original JVM rules for private access, the overall system is better because we can rip out the various synthetic bridges we used to require. The overall model for “what does private mean?” is simpler, not more complex: “private means all nestmates are equal”. On balance this helps security by simplifying the model, so that bridge methods can be dropped. I want to keep the model simple, and not introduce (today) a new kind of access control just for the withfield instruction, nor do I want it to mimic the baroque and complex access control for putfield-on-a-final. To summarize: The simplest rule for access checking a withfield instruction is to say, “pretend the field was declared private, and perform access checks”. That’s it; the rest follows from the rules we have already laid down. Thus, the security analysis of a class can concentrate on the access declarations of its fields. There will be no pressure to generate adapter methods regardless of where the language goes. Other languages can use the natural semantics of “withfield” to create and enforce their own notions of encapsulation. And future versions of Java can use indy, condy, hidden classes, and whatever else to create flexible methods, on the fly, that work with inline classes. There are two anchors to my argument here. One is that the access control of putfield-on-a-final is a bad model to replicate for a new instruction. The other is that we shouldn’t limit ourselves to the current uses of withfield (as a surrogate for putfield-on-a-final). Let’s design for the future, or at least for the natural capabilities of the JVM, not for the exact output of today’s translation strategies. — John
