> On Jul 20, 2020, at 10:27 AM, Brian Goetz <brian.go...@oracle.com> wrote:
> 
> That said, doing so in the language is potentially more viable.  It would 
> mean, for classes that opt into this treatment:
> 
>  - Ensuring that `C.default` evaluates to the right thing
>  - Preventing `this` from escaping the constructor (which might be a good 
> thing to enforce for inline classes anyway)
>  - Ensuring all fields are DA (which we do already), and that assignments to 
> fields in ctors are not their default value 
>  - Translating `new Foo[n]` (and reflective equivalent) with something that 
> initializes the array elements
> 
> The goal is to keep default instances from being observed.  If we lock down 
> `this` from constructors, the major cost here is instantiating arrays of 
> these things, but we already optimize array initialization loops like this 
> pretty well.  
> 
> Overall this doesn't seem terrible.  It means that the cost of this is borne 
> by the users of classes that opt into this treatment, and keeps the 
> complexity out of the VM.  It does mean that "attackers" can generate 
> bytecode to generate bad instances (a problem we have with multiple vectors 
> today.)  
> 
> Call this "L".  

More letters!

Expanding on ways to support Bucket #3 by ensuring initialization of 
fields/arrays:

---

Option L: Language requires field/array initialization

An inline class may be declared to have no default. Fields and arrays of that 
class's inline type must be provably initialized (via compiler analysis) before 
they are read or published.

Instance fields of the class's inline type must be initialized before a method 
call involving 'this' occurs. (It's already illegal to allow the constructor to 
return before initialization.)

Static fields... seem hopeless, so maybe must have a reference type (perhaps 
implicitly). Maybe we can do an analysis that permits some very simple cases, 
but once you allow method calls of almost any sort, you've lost. (We'd have to 
prove that no initialization of *other* classes triggered by <clinit> refers to 
the field before it has been initialized.)

Arrays must be initialized at creation time, either with an array initializer 
("Address[] as = { x, y, z };") or via a trusted API ("Address[] as = 
Arrays.of(i -> x);"). We might introduce a language sugar for the trusted API 
("Address[] as = { i -> x };"). We *could* support two-stage initialization via 
things like 'Arrays.fill', but analysis to track uninitialized arrays from 
creation to filling doesn't seem worthwhile.

This is less expressive, obviously. In particular, many comfortable idioms for 
initializing an array won't work. As a case study: what happens in generic code 
like ArrayList? When it wants to allocate its array (we're in a specialized 
world where T has been specialized to 'QAddress;'), what value does it fill the 
array with? Nothing is available, because at this point the list is empty, and 
it's just allocating storage for later. I guess ArrayList (and similar data 
structures) has to have a special back door, and we're left to trust the author 
not to expose the uninitialized payload.

As with all language features, there's also the question of what happens when a 
class file doesn't conform to the language's rules. Option L can't really stand 
alone—it needs to be backed up by some other option when the language's 
guarantees fail.

---

Option M: JVM requires field/array initialization

Inline class files can indicate that their default instance is invalid. Fields 
and arrays of that class's inline type must be provably initialized (via 
verification or related analysis) before they are read or published.

All the compile-time analysis of Option L applies here, because the language 
compiler needs to be sure its generated class files are valid.

We can use some new verification types to track the initialization status of 
'this', the way we do to require 'super' calls today. You don't have a fully 
formed 'Foo', capable of being passed to other methods, etc., until all fields 
are initialized. This would also apply to 'defaultvalue' for an inline class 
with a field of a default-less inline type.

Again, static fields are hopeless, it's an error to use the inline type as a 
static field type.

'anewarray' of the inline type is illegal, except within a trusted API. That 
API promises to initialize every array component before publishing the array. 
(We won't try to guarantee this with an analysis—the API is trusted because it 
has been vetted by humans.) In addition to some standard factory methods, we 
could decide that the inline class itself is always a trusted API.

(A related approach was discussed at our last EG meeting, but with much less 
expressiveness: inline-typed fields are always illegal, and arrays can only be 
allocated by the class author.)

This closes the backdoor of other bytecode not playing by the language's rules. 
The expressiveness problems of Option L remain—e.g., ArrayList's early 
allocation strategy is impossible.

Reply via email to