Sorry for quietness of late. Some new thoughts.
- Default behaviors of language features should be based *first* on bug-proof-ness; if a user has to opt into safety that means they were not safe. - `null` and nullable types are a very good thing for safety; NPE protects us from more nasty bugs than we can imagine. - A world where *all* user-defined primitive classes must be nullable (following Brian's outline) is completely *sane*, just not optimized. - (We'd like to still be able to fashion a *non-nullable type* when the class itself allows nullability, but this is a general need we already have for ref types and shouldn't have much bearing here. Still working hard on jspecify.org...) - It's awkward that `Instant` would have to add a `boolean valid = true` field, but it's not inappropriate. It has the misfortune that it both can't restrict its range of values *and* has no logical zero/default. - A type that does have a restricted range of legal values, but where that range includes the `.default` value, might do some very ugly tricks to avoid adding that boolean field; not sure what to think about this. - Among all the use cases for primitive classes, the ones where the default value is non-degenerate and expected are the special cases! We use `Complex` as a go-to example, but if most of what we did with complex numbers was divide them by each other then even this would be dubious. We'd be letting an invalid value masquerade as a valid one when we'd rather it just manifest as `null` and be subject to NPEs. - If we don't do something like Brian describes here, then I suppose second-best is that we make a *lot* of these things ref-default (beginning with Instant and not stopping there!) and warn about the dangers of `.val` tl;dr nullable by default! Would be glad to hear what I'm missing or not understanding right. On Wed, Mar 17, 2021 at 8:14 AM Brian Goetz <brian.go...@oracle.com> wrote: > Let me propose another strategy for Bucket 3. It could be implemented at > either the VM or language level, but the latter probably needs some help > from the VM anyway. The idea is that the default value is > _indistinguishable from null_. Strawman: > > - Classes can be marked as default-hostile (e.g., `primitive class X > implements NoGoodDefault`); > - Prior to dereferencing a default-hostile class, a check is made against > the default value, and an NPE is thrown if it is the default value; > - When widening to a reference type, a check is made if it is the default > value, and if so, is converted to null; > - When narrowing from a reference type, a check is made for null, and if > so, converted to the default value; > - It is allowable to compare `x == null`, which is intepreted as "widen x > to X.ref, and compare"; > - (optional) the interface NoGoodDefault could have a method that > optimizes the check, such as by using a pivot field, or the language/VM > could try to automatically pick a pivot field. > > Classes which opt for NoGoodDefault will be slower than those that do not > due to the check, but they will flatten. Essentially, this lets authors > choose between "zero means default" and "zero means null", at some cost. > > A risk here is that ignorant users who don't understand the tradeoffs will > say "oh, great, there's my nullable primitive types", overuse them, and > then say "primitive types are slow, java sucks." The goal here would be to > provide _safety_ for primitive types for which the default is dangerous. > > > On 3/15/2021 11:52 AM, Brian Goetz wrote: > > Picking this issue up again. To summarize Dan's buckets: > > Bucket 1 -- the zero default is in the domain, and is a sensible default > value. Zero for numerics, empty optionals. > > Bucket 2 -- there is a sensible default value, but all-zero-bits isn't > it. > > Bucket 3 -- there simply is no sensible default value. > > > Ultimately, though, this is not about defaults; it is about _uninitialized > variables_. The default only comes into play when the user uses an > uninitialized variable, which usually means (a) uninitialized fields or (b) > uninitialized array elements. It is possible that the language could give > us seat belts to dramatically narrow the chance of uninitialized fields, > but uninitialized array elements are much harder to stamp out. > > It is an attractive distraction to get caught up in designing mechanisms > for supplying an alternate default ("just let the user declare a no-arg > constructor"), but this is focusing on the "writing code" part of the > problem, not the "keeping code safe" part of the problem. > > In some sense, it is the existence (and size) of Bucket 1 that causes the > problem; Bucket 1 is what gives us our sense that it is safe to use > uninitialized variables. In the current language, uninitialized reference > variables are also safe in that if you use them before they are > initialized, you get an exception before anything bad can happen. > Uninitialized primitives in today's language are more dangerous, because we > may interpret the uninitialized value, but this has been a problem we've > been able to live with because today's primitives are pretty limited and > zero is usually a good-enough default in most domains. As we extend > primitives to look more like objects, with behavior, this gets harder. > > > Both buckets 2 and 3 can be remediated without help from the language or > VM, perhaps inconveniently, by careful coding on the part of the author of > the primitive class: > > - don't expose fields to users (a good practice anyway) > - check for zero on entry to each method > > These are options A and E. The difference between Buckets 2 (A) and 3 (E) > in this model is what do we do when we find a zero; for bucket 2, we > substitute some pre-baked value and use that, and for bucket 3, we throw > something (what we throw is a separate discussion.) The various > remediation techniques Dan offers represents a menu which allows us to > trade off reliability/cost/intrusiveness. > > I think we should lean on the model currently implemented by reference > types, where _accessing_ an uninitialized field is OK, but _using_ the > value in the field is not. If we have: > > String s; > > All of the following are fine: > > String t = s; > if (s == null) { ... } > if (s == t) { ... } > > The thing that is not fine is s-dot-something. These are the E/F/G > options, not the H/I options. > > Secondarily, H/I, which attempt to hide the default, create another > problem down the road: when we get to specialized generics, `T.default` > would become partial. > > Some of the solutions for Bucket 3 generalize well enough to Bucket 2 that > we might consider merging them (though there are still messy details). > Option F, for example, injects code at the top of each method body: > > int m() { > if (this == <zero-value>) > throw new NullPointerException(); > /* body of m */ > } > > into the top of each method; a corresponding feature for Bucket 2 might > inject slightly different code: > > int m() { > if (this == <zero-value>) > return <better-default>.m(); > /* body of m */ > } > > > Another thing that has evolved since we started this discussion is > recognizing the difference between .val and .ref projections. Imagine you > could declare your membership in bucket 3: > > __bucket_3 primitive class NGD { ... } > > If, in addition to some way of generating an NPE on dereference (F, G, > etc), we mucked with the conversion of NGD.val to NGD.ref (which the > compiler can inject code on), we could actually put a null on top of the > stack. Then, code like: > > if (ngd == null) { ... } > > would actually work, because to do the comparison, we'd first promote ngd > to a reference type (null is already a reference), and we'd compare two > nulls. > > > > On 7/10/2020 2:23 PM, Dan Smith wrote: > > Brian pointed out that my list of candidate inline classes in the Identity > Warnings JEP (JDK-8249100) includes a number of classes that, despite being > "value-based classes" and disavowing their identity, might not end up as > inline classes. The problem? Default values. > > This might be a good time to revisit the open design issues surrounding > default values and see if we can make some progress. > > Background/status quo: every inline class has a default instance, which > provides the initial value of fields and array components that have the > inline type (e.g., in 'new Point[10]'). It's also the prototype instance used > to create all other instances (start with 'vdefault', then apply 'withfield' > as needed). The default value is, by fiat, the class instance produced by > setting all fields to *their* default values. Often, but not always, this > means field/array initialization amounts to setting all the bits to 0. > Importantly, no user code is involved in creating a default instance. > > Real code is always useful for grounding design discussions, so let's start > there. Among the classes I listed as inline class candidates, we can put them > in three buckets: > > Bucket #1: Have a reasonable default, as declared. > - wrapper classes (the primitive zeros) > - Optional & friends (empty) > - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), > Duration (0s), Period (0d), Year (1 BC, if that's acceptable) > > Bucket #2: Could have a reasonable default after re-interpreting fields. > - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, > ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, > MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days > should be nonzero; null Strings, ZoneIds, HijrahChronologies, and > JapaneseEras require special handling) > - ListN, SetN, MapN (null array interpreted as empty) > > Bucket #3: No good default. > - Runtime.Version (need a non-null List<Integer>) > - ProcessHandleImpl (need a valid process ID) > - List12, Set12, Map1 (need a non-null value) > - All ConstantDesc implementations (need real class & method names, etc.) > > There's some subjectivity between the 2nd and 3rd buckets, but the idea > behind the 2nd is that, with some translation layer between physical fields > and interpretation of those fields, we can come up with an intuitive default > (e.g., "0 means January"; "a null String means time zone 'UTC'"). In > contrast, in the third bucket, any attempt to define a default value is going > to be pretty unintuitive ("A null method name means 'toString'"). > > The question here is how much work the JVM and language are willing to do, or > how much work we're willing to ask clients to do, in order to support use > cases that don't fall into Bucket #1. > > I don't think totally excluding Buckets #2 and #3 is a very good outcome. It > means that, in many cases, inline classes need to be built up exclusively > from primitives or other inline types, because if you use reference types, > your default value will have a null field. (Sometimes, as in Optional, null > fields have straightforward interpretations, but most of the time programs > are designed to prevent them.) > > Whether we support Bucket #2 but not Bucket #3 is a harder question. It > wouldn't be so bad if none of the examples above in Bucket #3 become inline > classes—for the most part they're handled via interfaces, anyway. > (Counterpoint: inline class instances that are immediately typed with > interface types still potentially provide a performance boost.) But I'm also > not sure this is representative. We've noted before that many use cases, like > database records or data structure cursors, don't have meaningful defaults > (what's a default mailing address?). The ConstantDesc classes really > illustrate this, even though they happen to not be public. > > Another observation is that if we support Bucket #3 but not Bucket #2, that's > probably not a big deal—I'm not sure anybody really *wants* to deal with the > default instance; it's just the price you pay for being an inline class. If > there's a way to opt out of that extra weirdness and move from Bucket #2 to > Bucket #3, great. > > With that discussion in mind, here are some summaries of approaches we've > considered, or that I think we ought to consider, for supporting buckets #2 > and #3. (This is as best as I recall. If there's something I've missed, add > it to the list!) > > [Weighing in for myself: my current preference is to do one of F, G, or I. > I'm not that interested in supporting Bucket #2, for reasons given above, > although Option A works for programmers who really want it.] > > > > === Solutions to support Bucket #2 === > > Two broad strategies here: re-interpreting fields (A, B), and re-interpreting > the default instance (C, D). > > --- > > Option A: Encourage programmers to re-interpret fields > > Guidance to programmers: when you declare an inline class, identify any > fields for which the default instance should hold something other than > zero/null; define a mapping for your implementation from zero/null to the > value you want. > > One way to do this is to define a (possibly private) getter for each field, > and include logic like 'return month + 1' or 'return id == null ? "UTC" : > id'. Or maybe you inline that logic, as long as you're careful to do so > everywhere. Importantly, you also need to reverse the logic in your > constructor—for the sake of '==', if somebody manually creates the default > instance, you should set fields to zero/null. > > This doesn't work if you want public fields, but that's life as an OO > programmer. > > In this approach, it would be important that inline classes be expected to > document their default instance in Javadoc (perhaps with a new Javadoc > tag)—the interpretation of the default instance is less apparent to users > than "all zeros". > > Limitations: > > - It's a fairly error-prone approach. Programmers will absolutely forget to > apply the mapping in one place, and everything will be fine until somebody > tries to invoke a particular method on the default instance. Put that bug in > a security-sensitive context, and maybe you have an exploit. (Something that > could help some is choosing good names—call your field 'monthIndex', not > plain 'month', to remind yourself that it's zero-based.) > > - Performance impact of an extra layer of computation on all field accesses. > Probably not a big deal in general, but all those null checks, etc., could > have a negative impact in certain contexts. And the *appearance* of extra > cost might scare programmers away from doing the right thing ("eh, I probably > won't use the default value anyway, I'll just ignore it to make my code > faster"). > > --- > > Option B: Language support for field re-interpretation > > The language allows inline classes to declare fields with mappings to/from an > internal representation. Just like Option A, but with guarantees that the > internal representation isn't inappropriately accessed directly. > > This pulls on a thread we explored a bit for Amber awhile back, some form of > "abstract fields" or "virtual fields". Maybe there's something there, but it > seems like a general-purpose feature, and one we're not likely to reach a > final solution on anytime soon. > > --- > > Option C: Language support for a designated default > > The language provides some way for programmers to declare the "logical" > default instance (something like a special static field). The compiler > inserts a test for the "physical" default on any field/array access, and > replaces it with the logical default. > > That is: > > Point p = points[3]; > > compiles to > > point p$0 = points[3]; > Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0; > > This is much less bug-prone than Option A—the compiler does all the work—and > much more achievable in the short/medium term than Option B. > > Compared to Option B, this pushes the computation overhead from inline class > field accesses to reads of the inline type from fields/arrays. I don't know > if that's good or bad—maybe a wash, heavily dependent on the use case. > > A few big problems: > > - The physical default still exists, and malicious bytecode can use it. If > programmers want strong guarantees, they'll have to check and throw wherever > an untrusted instance is provided. (Clients with access to the inline class's > fields have to do so, too.) > > - Covariant arrays mean every read from any array type that might be > flattened (Object[], Runnable[], ConstantDesc[], ...) has to go through > translation logic. > > - There's an assumption here that the programmer doesn't intend to use the > physical default as a valid non-default instance. That's hard for the > compiler to enforce, and weird stuff happens in fields/arrays if the > programmer doesn't prevent it. (Could be mitigated with extra implicit logic > on field/array writes or in constructors.) > > --- > > Option D: JVM support for a designated default > > The VM allows inline classes to designate a logical default instance, and the > field/array access instructions map from the physical default to the logical > default. The 'vdefault' instruction produces the logical default instance; > something else is used by the class's factories to build from the physical > default. > > This addresses the first two problems with Option C—the VM gives strong > guarantees, and can make the translation a virtual operation of certain > arrays. > > To address the second problem, it seems like we'd need the more complex logic > I hinted at: on writes, map the physical default to the logical default, and > map the logical default to the physical default. Do the reverse on reads. > > The problem here is bytecode complexity/slowdowns. We've already added some > complexity to 'aaload'/'aastore' (covariant flattened arrays), and anticipate > similar changes to 'putfield'/'getfield' (specialized fields), so maybe that > means we might as well do more. Or maybe it means we're already over budget. > :-) > > From the users' perspective, if any performance reduction on reads/writes can > be limited to the inline classes in Bucket #2, *all* the options have a > similar cost, whether imposed by the programmer, language, or VM. So, to a > first approximation, slower opcode execution is fine. > > > > === Solutions to support Bucket #3 === > > Two broad strategies here: rejecting member accesses on the default instance > (E, F, G), and preventing programs from ever seeing the default instance (H, > I). > > --- > > Option E: Encourage programmers to guard against default instances > > Guidance to programmers: if you don't like your class's default instance, > check for it in your methods and throw. Maybe Java SE defines a new > RuntimeException to encourage this. > > The simple way to do this is with some boilerplate at the start of all your > methods: > > if (this == MyClass.default) throw new InvalidDefaultException(); > > More permissive classes could just do some validation on the fields that are > relevant to a particular operation. (E.g., 'getMonth' doesn't care if > 'zoneId' is null.) > > This doesn't work if you want public fields, but that's life as an OO > programmer. > > It's not ideal that an invalid instance can float around a program until > somebody trips on one of these checks, rather than detecting the invalid > value earlier—we're propagating the NPE problem. And it takes some getting > used to that there are two null-like values in the reference type's domain. > > --- > > Option F: Language support for default instance guards > > An inline class declaration can indicate that the default instance is > invalid. The compiler generates guards, as in Option E, at the start of all > instance method bodies, and perhaps on all field accesses outside of those > methods. > > Programmers give up finer-grained control, but get more safety. I'm sure most > would be happy with that trade. > > Improper/separately-compiled bytecode can skip the field access checks, but > that's a minor concern. > > Same issues as Option E regarding adding a "new NPE" to the platform. > > --- > > Option G: JVM support for default instance guards > > Inline class files can indicate that their default instance is invalid. All > attempts to operate on that instance (via field/method accesses, other than > 'withfield') result in an exception. > > This tightens up Option F, making it just as impossible to access members of > the default instance as it is to access members of 'null'. > > Same issues as Option E regarding adding a "new NPE" to the platform. > > --- > > Option H: Language checks on field/array reads > > An inline class declaration can indicate that the default instance is > invalid. Every field and array access that may involved an uninitialized > field/array component of that inline type gets augmented with a check that > rejects reads of the default value (treating it as "you forgot to initialize > this variable"). > > That is: > > Point p = points[3]; > > compiles to > > point p$0 = points[3]; > if (p$0 == [vdefault Point]) throw new UninitializedVariableException(); > Point p = p$0; > > This is much like Option C, and has roughly the same advantages/problems. > There's not a strong guarantee that the default value won't pop up from > untrusted bytecode (or unreliable inline class authors), and lots of array > types need guards. > > --- > > Option I: JVM checks on field/array reads > > Inline class files can indicate that their default instance is invalid. When > reading from a field/array component of the inline type > ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default > value is found (treating it as "you forgot to initialize this variable"). The > 'vdefault' instruction, like 'withfield', is illegal outside of the inline > class's nest. > > Better than Option H in that it can be optimized to occur on only certain > reads, and in that it provides strong guarantees—only the inline class can > ever "see" the default instance. > > Well, unless the inline class chooses to share that instance with the world. > Not sure how we prevent that. But maybe at that point, anything bad/weird > that happens is the author's own fault. (E.g., putting the default value in > an array will make that component effectively "uninitialized" again.) > > Like Option D, there's a question of whether we're willing to add this > complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is > that at least it's less complexity than you have in Option D. > > > > > -- Kevin Bourrillion | Java Librarian | Google, Inc. | kev...@google.com