Nullity (was: User model stacking: current status)

Brian Goetz Mon, 09 May 2022 14:14:29 -0700

Assuming the stacking here is satisfactory, let's talk about .ref and .val.

Kevin made a strong argument for .ref as default, so let's pull on thatstring for a bit.

Universal generics need a way to express .ref at least for typevariables, so if we're going to make .ref the default, we still need away to denote it. Calling the types Foo.ref and Foo.val, where Foo isan alias for Foo.ref, is one way to achieve this.


<wild-speculation>

Now, let's step out onto some controversial territory: how do we spell.ref and .val? Specifically, how good a fit is `!` and `?` (henceforth,emotional types), now that the B3 election is _solely_ about theexistence of a zero-default .val? (Before, it was a poor fit, but nowit might be credible. Yet another reason I wanted to tease apart what"primitive" meant into independent axes.)


Pro: users think they really want emotional types.

Pro: to the extent we eventually acquire full emotional types, and tothe extent these align cleanly with primitive type projections, itavoids weirdnesses like `Foo.val?`, where there are two ways to talkabout nullity.

Con: These will surely not initially be the full emotional types usersthink they want, and so may well be met by "you idiots, these are notthe emotional types we want"Con: To the extent full emotional types do not align clearly withprimitive type projections, we might be painted into a corner and itmight be harder to do emotional types.

Risk: the language treatment of emotional types is one thing, but thereal cost in introducing them into the language is annotating thelibraries. Having them in the language but not annotating the librarieson a timely basis may well be a step backwards.

If we had full emotional types, some would have their non-nullity erased(`String!` erases to the same type descriptor as ordinary `String`) andsome would have it reified (Integer! translates to a separate type, the`I` carrier.) This means that migrating `String` to `String` might bebinary-compatible, but `Integer` to `Integer!` would not be. (This isprobably an acceptable asymmetry.)

But a bigger question is whether an erased `String!` should be backed upby a synthetic null check at the boundary between checked and uncheckedcode, such as method entry points (just as unpacking a T from a genericis backed up by a synthetic cast at the boundary between generic andexplicit code.) This is reasonable (and cheap enough), but may be on acollision course with some interpretations of `String!`.

Initially, we probably would restrict the use of `!` to val-projectionsof primitive classes, but the pressure to extend it would always be justaround the corner (e.g., having them in type patterns would likelyaddress many people's initial discomfort about null handling in patterns).


</wild-speculation>

My goal here is not to dive into the details of "let's design nullabletypes", as that would be a distraction at this point, as much as togauge sentiment on whether this is worth exploring further, and gatherconsiderations I may have missed in this brief summary.



On 5/8/2022 12:32 PM, Brian Goetz wrote:

To track the progress of the spiral:
- We originally came up with the B2/B3 division to carve off B2 asthe "safe subset", where you get less flattening but nulls and moreintegrity. This provided a safe migration target for existing VBCs,as well as a reasonable target for creating new VBCs that want to bemostly class-like but enjoy some additional optimization (and shedaccidental identity for safety reasons.)
- When we put all the flesh on the bones of B2/B3, there were someundesirable consequences, such as (a) tearing was too subtle, and (b)both the semantics and cost model differences between B2/B3 were goingto be hard to explain (and in some cases, users have bad choicesbetween semantics and performance.)
- A few weeks ago, we decided to more seriously consider separatingatomicity out as an explicit thing on its own. This had the benefit ofputting semantics first, and offered a clearer cost model: you couldgive up identity but keep null-default and integrity (B2), furthergive up nulls to get some more density (B3.val), and further furthergive up atomicity to get more flatness (non-atomic B3.) This washonest, but led people to complain "great, now there are four buckets."
- We explored making non-atomicity a cross-cutting concern, so thereare two new buckets (VBC and primitive-like), either of which canchoose their atomicity constraints, and then within the primitive-likebucket, the .val and .ref projections differ only with respect to theconsequences of nullity. This felt cleaner (more orthogonal), but thenotion of a non-atomic B2 itself is kind of weird.
So where this brings us is back to something that might feel like thefour-bucket approach in the third bullet above, but with two bigdifferences: atomicity is an explicit property of a class, rather thana property of reference-ness, and a B3.ref is not necessarily the sameas a B2. This recognizes that the main distinction between B2 or B3is *whether a class can tolerate its zero value.*
More explicitly:

 - B1 remains unchanged
- B2 is for "ordinary" value-based classes. Always atomic, alwaysnullable, always reference; the only difference with B1 is that it hasshed its identity, enabling routine stack-based flattening, andperhaps some heap flattening depending on VM sophistication andheroics. B2 is a good target for migrating many existing value-basedclasses.
- B3 means that a class can tolerate its zero (uninitialized) value,and therefore gives rise to two types, which we'll call B3.ref andB3.val. The former is a reference type and is therefore nullable andnull-default; the latter is a direct/immediate/value type whosedefault is zero.
- B3 classes can further be marked non-atomic; this unlocks greaterflattening in the heap at the cost of tearing under race, and issuitable for classes without cross-field invariants. Non-atomicityaccrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears(and therefore might expose its zero under race, as per friday'sdiscussions.)
Syntactically (reminder: NOT an invitation to discuss syntax at thispoint), this might look like:
    class B1 { }                // identity, reference, atomic

    value-based class B2 { }    // non-identity, reference, atomic
value class B3 { } // non-identity, .ref and .val, bothatomic
non-atomic value class B3 { } // similar to B3, but both arenon-atomic
So, two new (but related) class modifiers, of which one has anadditional modifier. (The spelling of all of these can be discussedafter the user model is entirely nailed down.)
So, there's a monotonic sequence of "give stuff up, get other stuff":

 - B2 gives up identity relative to B1, gains some flattening
- B3 optionally gives up null-defaultness relative to B2, yieldingtwo types, one of which sheds some footprint - non-atomic B3 gives up atomicity relative to B3, gaining moreflatness, for both type projections
On 5/6/2022 10:04 AM, Brian Goetz wrote:
Thinking more about Dan's concerns here ...

On 5/5/2022 6:00 PM, Dan Smith wrote:
This is significant because the primary reason to declare a B2rather than a B3 is to guarantee that the all-zeros value cannot becreated.
This is a little bit of a circular argument; it takes a property thatan atomic B2 has, but a non-atomic B2 lacks, and declares that to be"the whole point" of B2. It may be that exposure of the zero is sobad we may eventually want to back away from the idea, but let's comeup with a fair picture of what a non-atomic B2 means, and ask ifthat's sufficiently useful.
This leads me to conclude that if you're declaring a non-atomic B2,you might as well just declare a non-atomic B3.
Fair point, but let's pull on this string for a moment. Suppose Iwant a null-default, flattenable value, and I'm willing to take thetearing to get there. So you're saying "then declare a B3 and useB3.ref". But B3.ref was supposed to have the same semantics as anequivalent B2! (I realize I'm doing the same thing I just accusedyou of above -- taking an old invariant and positiioning it as "thepoint". Stay tuned.) Which means either that we lose flattening,again, or we create yet another asymmetry between B3.ref and B2.Maybe you're saying that the combination of nullable and full-flat isjust too much to ask, but I am not sure it is; in any case, let'sconvince ourselves of this before we rule it out.
Or maybe, what you're saying is that my claim that B3.ref and B2 arethe same thing is the stale thing here, and we can let it go and getit back in another form. In which case you're positing a model where:
 - B1 is unchanged
 - B2 is always atomic, reference, nullable
- B3 really means "the zero is OK", comes with .ref and .val, and(non-atomic B3).ref is still tearable?
In this model, (non-atomic B3).ref takes the place of (non-atomic B2)in the stacking I've been discussing. Is that what you're saying?
    class B1 { }  // ref, identity, atomic
    value-based class B2 { }  // ref, non-identity, atomic
[ non-atomic ] value class B3 { } // ref or val, zero is ok,both projections share atomicity
If we go with ref-default, then this is a small leap from yesterday'sstacking, because "B3" and "B2" are both reference types, so if youwant a tearable, non-atomic reference type, saying `non-atomic valueclass B3` and then just using B3 gets you that. Then:
 - B2 is like B1, minus identity
- B3 means "uninitialized values are OK, you get two types, azero-default and a non-default" - Non-atomicity is an extra property we can add to B3, to get moreflattening in exchange for less integrity - The use cases for non-atomic B2 are served by non-atomic B3 (when.ref is the default)
I think this still has the properties I want; I can freely choose thereasonable subsets of { identity, has-zero, nullable, atomicity }that I want; the orthogonality of non-atomic across buckets becomesorthogonality of non-atomic with nullity, and the "B3.ref is justlike B2" is shown to be the "false friend."

Nullity (was: User model stacking: current status)

Reply via email to