Re: Revisiting default values

Kevin Bourrillion Tue, 29 Jun 2021 11:01:30 -0700

Sorry for quietness of late.

Some new thoughts.


   - Default behaviors of language features should be based *first* on
   bug-proof-ness; if a user has to opt into safety that means they were not
   safe.
   - `null` and nullable types are a very good thing for safety; NPE
   protects us from more nasty bugs than we can imagine.
   - A world where *all* user-defined primitive classes must be nullable
   (following Brian's outline) is completely *sane*, just not optimized.
   - (We'd like to still be able to fashion a *non-nullable type* when the
   class itself allows nullability, but this is a general need we already have
   for ref types and shouldn't have much bearing here. Still working hard on
   jspecify.org...)
   - It's awkward that `Instant` would have to add a `boolean valid = true`
   field, but it's not inappropriate. It has the misfortune that it both can't
   restrict its range of values *and* has no logical zero/default.
   - A type that does have a restricted range of legal values, but where
   that range includes the `.default` value, might do some very ugly tricks to
   avoid adding that boolean field; not sure what to think about this.
   - Among all the use cases for primitive classes, the ones where the
   default value is non-degenerate and expected are the special cases! We use
   `Complex` as a go-to example, but if most of what we did with complex
   numbers was divide them by each other then even this would be dubious. We'd
   be letting an invalid value masquerade as a valid one when we'd rather it
   just manifest as `null` and be subject to NPEs.
   - If we don't do something like Brian describes here, then I suppose
   second-best is that we make a *lot* of these things ref-default
   (beginning with Instant and not stopping there!) and warn about the dangers
   of `.val`

tl;dr nullable by default!

Would be glad to hear what I'm missing or not understanding right.


On Wed, Mar 17, 2021 at 8:14 AM Brian Goetz <brian.go...@oracle.com> wrote:

> Let me propose another strategy for Bucket 3.  It could be implemented at
> either the VM or language level, but the latter probably needs some help
> from the VM anyway.  The idea is that the default value is
> _indistinguishable from null_.  Strawman:
>
>  - Classes can be marked as default-hostile (e.g., `primitive class X
> implements NoGoodDefault`);
>  - Prior to dereferencing a default-hostile class, a check is made against
> the default value, and an NPE is thrown if it is the default value;
>  - When widening to a reference type, a check is made if it is the default
> value, and if so, is converted to null;
>  - When narrowing from a reference type, a check is made for null, and if
> so, converted to the default value;
>  - It is allowable to compare `x == null`, which is intepreted as "widen x
> to X.ref, and compare";
>  - (optional) the interface NoGoodDefault could have a method that
> optimizes the check, such as by using a pivot field, or the language/VM
> could try to automatically pick a pivot field.
>
> Classes which opt for NoGoodDefault will be slower than those that do not
> due to the check, but they will flatten.  Essentially, this lets authors
> choose between "zero means default" and "zero means null", at some cost.
>
> A risk here is that ignorant users who don't understand the tradeoffs will
> say "oh, great, there's my nullable primitive types", overuse them, and
> then say "primitive types are slow, java sucks."  The goal here would be to
> provide _safety_ for primitive types for which the default is dangerous.
>
>
> On 3/15/2021 11:52 AM, Brian Goetz wrote:
>
> Picking this issue up again.  To summarize Dan's buckets:
>
> Bucket 1 -- the zero default is in the domain, and is a sensible default
> value.  Zero for numerics, empty optionals.
>
> Bucket 2 -- there is a sensible default value, but all-zero-bits isn't
> it.
>
> Bucket 3 -- there simply is no sensible default value.
>
>
> Ultimately, though, this is not about defaults; it is about _uninitialized
> variables_.  The default only comes into play when the user uses an
> uninitialized variable, which usually means (a) uninitialized fields or (b)
> uninitialized array elements.  It is possible that the language could give
> us seat belts to dramatically narrow the chance of uninitialized fields,
> but uninitialized array elements are much harder to stamp out.
>
> It is an attractive distraction to get caught up in designing mechanisms
> for supplying an alternate default ("just let the user declare a no-arg
> constructor"), but this is focusing on the "writing code" part of the
> problem, not the "keeping code safe" part of the problem.
>
> In some sense, it is the existence (and size) of Bucket 1 that causes the
> problem; Bucket 1 is what gives us our sense that it is safe to use
> uninitialized variables.  In the current language, uninitialized reference
> variables are also safe in that if you use them before they are
> initialized, you get an exception before anything bad can happen.
> Uninitialized primitives in today's language are more dangerous, because we
> may interpret the uninitialized value, but this has been a problem we've
> been able to live with because today's primitives are pretty limited and
> zero is usually a good-enough default in most domains.  As we extend
> primitives to look more like objects, with behavior, this gets harder.
>
>
> Both buckets 2 and 3 can be remediated without help from the language or
> VM, perhaps inconveniently, by careful coding on the part of the author of
> the primitive class:
>
>  - don't expose fields to users (a good practice anyway)
>  - check for zero on entry to each method
>
> These are options A and E.  The difference between Buckets 2 (A) and 3 (E)
> in this model is what do we do when we find a zero; for bucket 2, we
> substitute some pre-baked value and use that, and for bucket 3, we throw
> something (what we throw is a separate discussion.)  The various
> remediation techniques Dan offers represents a menu which allows us to
> trade off reliability/cost/intrusiveness.
>
> I think we should lean on the model currently implemented by reference
> types, where _accessing_ an uninitialized field is OK, but _using_ the
> value in the field is not.  If we have:
>
>     String s;
>
> All of the following are fine:
>
>     String t = s;
>     if (s == null) { ... }
>     if (s == t) { ... }
>
> The thing that is not fine is s-dot-something.  These are the E/F/G
> options, not the H/I options.
>
> Secondarily, H/I, which attempt to hide the default, create another
> problem down the road: when we get to specialized generics, `T.default`
> would become partial.
>
> Some of the solutions for Bucket 3 generalize well enough to Bucket 2 that
> we might consider merging them (though there are still messy details).
> Option F, for example, injects code at the top of each method body:
>
>     int m() {
>         if (this == <zero-value>)
>             throw new NullPointerException();
>         /* body of m */
>     }
>
> into the top of each method; a corresponding feature for Bucket 2 might
> inject slightly different code:
>
>     int m() {
>         if (this == <zero-value>)
>             return <better-default>.m();
>         /* body of m */
>     }
>
>
> Another thing that has evolved since we started this discussion is
> recognizing the difference between .val and .ref projections.  Imagine you
> could declare your membership in bucket 3:
>
>     __bucket_3 primitive class NGD { ... }
>
> If, in addition to some way of generating an NPE on dereference (F, G,
> etc), we mucked with the conversion of NGD.val to NGD.ref (which the
> compiler can inject code on), we could actually put a null on top of the
> stack.  Then, code like:
>
>     if (ngd == null) { ... }
>
> would actually work, because to do the comparison, we'd first promote ngd
> to a reference type (null is already a reference), and we'd compare two
> nulls.
>
>
>
> On 7/10/2020 2:23 PM, Dan Smith wrote:
>
> Brian pointed out that my list of candidate inline classes in the Identity 
> Warnings JEP (JDK-8249100) includes a number of classes that, despite being 
> "value-based classes" and disavowing their identity, might not end up as 
> inline classes. The problem? Default values.
>
> This might be a good time to revisit the open design issues surrounding 
> default values and see if we can make some progress.
>
> Background/status quo: every inline class has a default instance, which 
> provides the initial value of fields and array components that have the 
> inline type (e.g., in 'new Point[10]'). It's also the prototype instance used 
> to create all other instances (start with 'vdefault', then apply 'withfield' 
> as needed). The default value is, by fiat, the class instance produced by 
> setting all fields to *their* default values. Often, but not always, this 
> means field/array initialization amounts to setting all the bits to 0. 
> Importantly, no user code is involved in creating a default instance.
>
> Real code is always useful for grounding design discussions, so let's start 
> there. Among the classes I listed as inline class candidates, we can put them 
> in three buckets:
>
> Bucket #1: Have a reasonable default, as declared.
> - wrapper classes (the primitive zeros)
> - Optional & friends (empty)
> - From java.time: Instant (start of 1970-01-01), LocalTime (midnight), 
> Duration (0s), Period (0d), Year (1 BC, if that's acceptable)
>
> Bucket #2: Could have a reasonable default after re-interpreting fields.
> - From java.time: LocalDate, YearMonth, MonthDay, LocalDateTime, 
> ZonedDateTime, OffsetTime, OffsetDateTime, ZoneOffset, ZoneRegion, 
> MinguoDate, HijrahDate, JapaneseDate, ThaiBuddhistDate (months and days 
> should be nonzero; null Strings, ZoneIds, HijrahChronologies, and 
> JapaneseEras require special handling)
> - ListN, SetN, MapN (null array interpreted as empty)
>
> Bucket #3: No good default.
> - Runtime.Version (need a non-null List<Integer>)
> - ProcessHandleImpl (need a valid process ID)
> - List12, Set12, Map1 (need a non-null value)
> - All ConstantDesc implementations (need real class & method names, etc.)
>
> There's some subjectivity between the 2nd and 3rd buckets, but the idea 
> behind the 2nd is that, with some translation layer between physical fields 
> and interpretation of those fields, we can come up with an intuitive default 
> (e.g., "0 means January"; "a null String means time zone 'UTC'"). In 
> contrast, in the third bucket, any attempt to define a default value is going 
> to be pretty unintuitive ("A null method name means 'toString'").
>
> The question here is how much work the JVM and language are willing to do, or 
> how much work we're willing to ask clients to do, in order to support use 
> cases that don't fall into Bucket #1.
>
> I don't think totally excluding Buckets #2 and #3 is a very good outcome. It 
> means that, in many cases, inline classes need to be built up exclusively 
> from primitives or other inline types, because if you use reference types, 
> your default value will have a null field. (Sometimes, as in Optional, null 
> fields have straightforward interpretations, but most of the time programs 
> are designed to prevent them.)
>
> Whether we support Bucket #2 but not Bucket #3 is a harder question. It 
> wouldn't be so bad if none of the examples above in Bucket #3 become inline 
> classes—for the most part they're handled via interfaces, anyway. 
> (Counterpoint: inline class instances that are immediately typed with 
> interface types still potentially provide a performance boost.) But I'm also 
> not sure this is representative. We've noted before that many use cases, like 
> database records or data structure cursors, don't have meaningful defaults 
> (what's a default mailing address?). The ConstantDesc classes really 
> illustrate this, even though they happen to not be public.
>
> Another observation is that if we support Bucket #3 but not Bucket #2, that's 
> probably not a big deal—I'm not sure anybody really *wants* to deal with the 
> default instance; it's just the price you pay for being an inline class. If 
> there's a way to opt out of that extra weirdness and move from Bucket #2 to 
> Bucket #3, great.
>
> With that discussion in mind, here are some summaries of approaches we've 
> considered, or that I think we ought to consider, for supporting buckets #2 
> and #3. (This is as best as I recall. If there's something I've missed, add 
> it to the list!)
>
> [Weighing in for myself: my current preference is to do one of F, G, or I. 
> I'm not that interested in supporting Bucket #2, for reasons given above, 
> although Option A works for programmers who really want it.]
>
>
>
> === Solutions to support Bucket #2 ===
>
> Two broad strategies here: re-interpreting fields (A, B), and re-interpreting 
> the default instance (C, D).
>
> ---
>
> Option A: Encourage programmers to re-interpret fields
>
> Guidance to programmers: when you declare an inline class, identify any 
> fields for which the default instance should hold something other than 
> zero/null; define a mapping for your implementation from zero/null to the 
> value you want.
>
> One way to do this is to define a (possibly private) getter for each field, 
> and include logic like 'return month + 1' or 'return id == null ? "UTC" : 
> id'. Or maybe you inline that logic, as long as you're careful to do so 
> everywhere. Importantly, you also need to reverse the logic in your 
> constructor—for the sake of '==', if somebody manually creates the default 
> instance, you should  set fields to zero/null.
>
> This doesn't work if you want public fields, but that's life as an OO 
> programmer.
>
> In this approach, it would be important that inline classes be expected to 
> document their default instance in Javadoc (perhaps with a new Javadoc 
> tag)—the interpretation of the default instance is less apparent to users 
> than "all zeros".
>
> Limitations:
>
> - It's a fairly error-prone approach. Programmers will absolutely forget to 
> apply the mapping in one place, and everything will be fine until somebody 
> tries to invoke a particular method on the default instance. Put that bug in 
> a security-sensitive context, and maybe you have an exploit. (Something that 
> could help some is choosing good names—call your field 'monthIndex', not 
> plain 'month', to remind yourself that it's zero-based.)
>
> - Performance impact of an extra layer of computation on all field accesses. 
> Probably not a big deal in general, but all those null checks, etc., could 
> have a negative impact in certain contexts. And the *appearance* of extra 
> cost might scare programmers away from doing the right thing ("eh, I probably 
> won't use the default value anyway, I'll just ignore it to make my code 
> faster").
>
> ---
>
> Option B: Language support for field re-interpretation
>
> The language allows inline classes to declare fields with mappings to/from an 
> internal representation. Just like Option A, but with guarantees that the 
> internal representation isn't inappropriately accessed directly.
>
> This pulls on a thread we explored a bit for Amber awhile back, some form of 
> "abstract fields" or "virtual fields". Maybe there's something there, but it 
> seems like a general-purpose feature, and one we're not likely to reach a 
> final solution on anytime soon.
>
> ---
>
> Option C: Language support for a designated default
>
> The language provides some way for programmers to declare the "logical" 
> default instance (something like a special static field). The compiler 
> inserts a test for the "physical" default on any field/array access, and 
> replaces it with the logical default.
>
> That is:
>
> Point p = points[3];
>
> compiles to
>
> point p$0 = points[3];
> Point p = (p$0 == [vdefault Point]) ? Point.DEFAULT : p$0;
>
> This is much less bug-prone than Option A—the compiler does all the work—and 
> much more achievable in the short/medium term than Option B.
>
> Compared to Option B, this pushes the computation overhead from inline class 
> field accesses to reads of the inline type from fields/arrays. I don't know 
> if that's good or bad—maybe a wash, heavily dependent on the use case.
>
> A few big problems:
>
> - The physical default still exists, and malicious bytecode can use it. If 
> programmers want strong guarantees, they'll have to check and throw wherever 
> an untrusted instance is provided. (Clients with access to the inline class's 
> fields have to do so, too.)
>
> - Covariant arrays mean every read from any array type that might be 
> flattened (Object[], Runnable[], ConstantDesc[], ...) has to go through 
> translation logic.
>
> - There's an assumption here that the programmer doesn't intend to use the 
> physical default as a valid non-default instance. That's hard for the 
> compiler to enforce, and weird stuff happens in fields/arrays if the 
> programmer doesn't prevent it. (Could be mitigated with extra implicit logic 
> on field/array writes or in constructors.)
>
> ---
>
> Option D: JVM support for a designated default
>
> The VM allows inline classes to designate a logical default instance, and the 
> field/array access instructions map from the physical default to the logical 
> default. The 'vdefault' instruction produces the logical default instance; 
> something else is used by the class's factories to build from the physical 
> default.
>
> This addresses the first two problems with Option C—the VM gives strong 
> guarantees, and can make the translation a virtual operation of certain 
> arrays.
>
> To address the second problem, it seems like we'd need the more complex logic 
> I hinted at: on writes, map the physical default to the logical default, and 
> map the logical default to the physical default. Do the reverse on reads.
>
> The problem here is bytecode complexity/slowdowns. We've already added some 
> complexity to 'aaload'/'aastore' (covariant flattened arrays), and anticipate 
> similar changes to 'putfield'/'getfield' (specialized fields), so maybe that 
> means we might as well do more. Or maybe it means we're already over budget. 
> :-)
>
> From the users' perspective, if any performance reduction on reads/writes can 
> be limited to the inline classes in Bucket #2, *all* the options have a 
> similar cost, whether imposed by the programmer, language, or VM. So, to a 
> first approximation, slower opcode execution is fine.
>
>
>
> === Solutions to support Bucket #3 ===
>
> Two broad strategies here: rejecting member accesses on the default instance 
> (E, F, G), and preventing programs from ever seeing the default instance (H, 
> I).
>
> ---
>
> Option E: Encourage programmers to guard against default instances
>
> Guidance to programmers: if you don't like your class's default instance, 
> check for it in your methods and throw. Maybe Java SE defines a new 
> RuntimeException to encourage this.
>
> The simple way to do this is with some boilerplate at the start of all your 
> methods:
>
> if (this == MyClass.default) throw new InvalidDefaultException();
>
> More permissive classes could just do some validation on the fields that are 
> relevant to a particular operation. (E.g., 'getMonth' doesn't care if 
> 'zoneId' is null.)
>
> This doesn't work if you want public fields, but that's life as an OO 
> programmer.
>
> It's not ideal that an invalid instance can float around a program until 
> somebody trips on one of these checks, rather than detecting the invalid 
> value earlier—we're propagating the NPE problem. And it takes some getting 
> used to that there are two null-like values in the reference type's domain.
>
> ---
>
> Option F: Language support for default instance guards
>
> An inline class declaration can indicate that the default instance is 
> invalid. The compiler generates guards, as in Option E, at the start of all 
> instance method bodies, and perhaps on all field accesses outside of those 
> methods.
>
> Programmers give up finer-grained control, but get more safety. I'm sure most 
> would be happy with that trade.
>
> Improper/separately-compiled bytecode can skip the field access checks, but 
> that's a minor concern.
>
> Same issues as Option E regarding adding a "new NPE" to the platform.
>
> ---
>
> Option G: JVM support for default instance guards
>
> Inline class files can indicate that their default instance is invalid. All 
> attempts to operate on that instance (via field/method accesses, other than 
> 'withfield') result in an exception.
>
> This tightens up Option F, making it just as impossible to access members of 
> the default instance as it is to access members of 'null'.
>
> Same issues as Option E regarding adding a "new NPE" to the platform.
>
> ---
>
> Option H: Language checks on field/array reads
>
> An inline class declaration can indicate that the default instance is 
> invalid. Every field and array access that may involved an uninitialized 
> field/array component of that inline type gets augmented with a check that 
> rejects reads of the default value (treating it as "you forgot to initialize 
> this variable").
>
> That is:
>
> Point p = points[3];
>
> compiles to
>
> point p$0 = points[3];
> if (p$0 == [vdefault Point]) throw new UninitializedVariableException();
> Point p = p$0;
>
> This is much like Option C, and has roughly the same advantages/problems. 
> There's not a strong guarantee that the default value won't pop up from 
> untrusted bytecode (or unreliable inline class authors), and lots of array 
> types need guards.
>
> ---
>
> Option I: JVM checks on field/array reads
>
> Inline class files can indicate that their default instance is invalid. When 
> reading from a field/array component of the inline type 
> ('getfield'/'getstatic'/'aaload'), an exception is thrown if the default 
> value is found (treating it as "you forgot to initialize this variable"). The 
> 'vdefault' instruction, like 'withfield', is illegal outside of the inline 
> class's nest.
>
> Better than Option H in that it can be optimized to occur on only certain 
> reads, and in that it provides strong guarantees—only the inline class can 
> ever "see" the default instance.
>
> Well, unless the inline class chooses to share that instance with the world. 
> Not sure how we prevent that. But maybe at that point, anything bad/weird 
> that happens is the author's own fault. (E.g., putting the default value in 
> an array will make that component effectively "uninitialized" again.)
>
> Like Option D, there's a question of whether we're willing to add this 
> complexity to the 'getifled'/'getstatic'/'aaload' instructions. My sense is 
> that at least it's less complexity than you have in Option D.
>
>
>
>
>

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kev...@google.com

Re: Revisiting default values

Reply via email to