Assuming the stacking here is satisfactory, let's talk about .ref and .val.

Kevin made a strong argument for .ref as default, so let's pull on that string for a bit.

Universal generics need a way to express .ref at least for type variables, so if we're going to make .ref the default, we still need a way to denote it.  Calling the types Foo.ref and Foo.val, where Foo is an alias for Foo.ref, is one way to achieve this.

<wild-speculation>

Now, let's step out onto some controversial territory: how do we spell .ref and .val?  Specifically, how good a fit is `!` and `?` (henceforth, emotional types), now that the B3 election is _solely_ about the existence of a zero-default .val?  (Before, it was a poor fit, but now it might be credible.  Yet another reason I wanted to tease apart what "primitive" meant into independent axes.)

Pro: users think they really want emotional types.
Pro: to the extent we eventually acquire full emotional types, and to the extent these align cleanly with primitive type projections, it avoids weirdnesses like `Foo.val?`, where there are two ways to talk about nullity.

Con: These will surely not initially be the full emotional types users think they want, and so may well be met by "you idiots, these are not the emotional types we want" Con: To the extent full emotional types do not align clearly with primitive type projections, we might be painted into a corner and it might be harder to do emotional types.

Risk: the language treatment of emotional types is one thing, but the real cost in introducing them into the language is annotating the libraries.  Having them in the language but not annotating the libraries on a timely basis may well be a step backwards.


If we had full emotional types, some would have their non-nullity erased (`String!` erases to the same type descriptor as ordinary `String`) and some would have it reified (Integer! translates to a separate type, the `I` carrier.)  This means that migrating `String` to `String` might be binary-compatible, but `Integer` to `Integer!` would not be.  (This is probably an acceptable asymmetry.)

But a bigger question is whether an erased `String!` should be backed up by a synthetic null check at the boundary between checked and unchecked code, such as method entry points (just as unpacking a T from a generic is backed up by a synthetic cast at the boundary between generic and explicit code.)  This is reasonable (and cheap enough), but may be on a collision course with some interpretations of `String!`.

Initially, we probably would restrict the use of `!` to val-projections of primitive classes, but the pressure to extend it would always be just around the corner (e.g., having them in type patterns would likely address many people's initial discomfort about null handling in patterns).

</wild-speculation>

My goal here is not to dive into the details of "let's design nullable types", as that would be a distraction at this point, as much as to gauge sentiment on whether this is worth exploring further, and gather considerations I may have missed in this brief summary.


On 5/8/2022 12:32 PM, Brian Goetz wrote:
To track the progress of the spiral:

 - We originally came up with the B2/B3 division to carve off B2 as the "safe subset", where you get less flattening but nulls and more integrity.  This provided a safe migration target for existing VBCs, as well as a reasonable target for creating new VBCs that want to be mostly class-like but enjoy some additional optimization (and shed accidental identity for safety reasons.)

 - When we put all the flesh on the bones of B2/B3, there were some undesirable consequences, such as (a) tearing was too subtle, and (b) both the semantics and cost model differences between B2/B3 were going to be hard to explain (and in some cases, users have bad choices between semantics and performance.)

 - A few weeks ago, we decided to more seriously consider separating atomicity out as an explicit thing on its own. This had the benefit of putting semantics first, and offered a clearer cost model: you could give up identity but keep null-default and integrity (B2), further give up nulls to get some more density (B3.val), and further further give up atomicity to get more flatness (non-atomic B3.)  This was honest, but led people to complain "great, now there are four buckets."

 - We explored making non-atomicity a cross-cutting concern, so there are two new buckets (VBC and primitive-like), either of which can choose their atomicity constraints, and then within the primitive-like bucket, the .val and .ref projections differ only with respect to the consequences of nullity.  This felt cleaner (more orthogonal), but the notion of a non-atomic B2 itself is kind of weird.

So where this brings us is back to something that might feel like the four-bucket approach in the third bullet above, but with two big differences: atomicity is an explicit property of a class, rather than a property of reference-ness, and a B3.ref is not necessarily the same as a B2.  This recognizes that the main distinction between B2 or B3 is *whether a class can tolerate its zero value.*

More explicitly:

 - B1 remains unchanged

 - B2 is for "ordinary" value-based classes.  Always atomic, always nullable, always reference; the only difference with B1 is that it has shed its identity, enabling routine stack-based flattening, and perhaps some heap flattening depending on VM sophistication and heroics.  B2 is a good target for migrating many existing value-based classes.

 - B3 means that a class can tolerate its zero (uninitialized) value, and therefore gives rise to two types, which we'll call B3.ref and B3.val.  The former is a reference type and is therefore nullable and null-default; the latter is a direct/immediate/value type whose default is zero.

 - B3 classes can further be marked non-atomic; this unlocks greater flattening in the heap at the cost of tearing under race, and is suitable for classes without cross-field invariants.  Non-atomicity accrues equally to B3.ref and B3.val; a non-atomic B3.ref still tears (and therefore might expose its zero under race, as per friday's discussions.)

Syntactically (reminder: NOT an invitation to discuss syntax at this point), this might look like:

    class B1 { }                // identity, reference, atomic

    value-based class B2 { }    // non-identity, reference, atomic

    value class B3 { }          // non-identity, .ref and .val, both atomic

    non-atomic value class B3 { }  // similar to B3, but both are non-atomic

So, two new (but related) class modifiers, of which one has an additional modifier.  (The spelling of all of these can be discussed after the user model is entirely nailed down.)

So, there's a monotonic sequence of "give stuff up, get other stuff":

 - B2 gives up identity relative to B1, gains some flattening
 - B3 optionally gives up null-defaultness relative to B2, yielding two types, one of which sheds some footprint  - non-atomic B3 gives up atomicity relative to B3, gaining more flatness, for both type projections






On 5/6/2022 10:04 AM, Brian Goetz wrote:
Thinking more about Dan's concerns here ...

On 5/5/2022 6:00 PM, Dan Smith wrote:
This is significant because the primary reason to declare a B2 rather than a B3 is to guarantee that the all-zeros value cannot be created.

This is a little bit of a circular argument; it takes a property that an atomic B2 has, but a non-atomic B2 lacks, and declares that to be "the whole point" of B2.  It may be that exposure of the zero is so bad we may eventually want to back away from the idea, but let's come up with a fair picture of what a non-atomic B2 means, and ask if that's sufficiently useful.

This leads me to conclude that if you're declaring a non-atomic B2, you might as well just declare a non-atomic B3.

Fair point, but let's pull on this string for a moment.  Suppose I want a null-default, flattenable value, and I'm willing to take the tearing to get there.  So you're saying "then declare a B3 and use B3.ref".  But B3.ref was supposed to have the same semantics as an equivalent B2!  (I realize I'm doing the same thing I just accused you of above -- taking an old invariant and positiioning it as "the point".  Stay tuned.)  Which means either that we lose flattening, again, or we create yet another asymmetry between B3.ref and B2. Maybe you're saying that the combination of nullable and full-flat is just too much to ask, but I am not sure it is; in any case, let's convince ourselves of this before we rule it out.

Or maybe, what you're saying is that my claim that B3.ref and B2 are the same thing is the stale thing here, and we can let it go and get it back in another form.  In which case you're positing a model where:

 - B1 is unchanged
 - B2 is always atomic, reference, nullable
 - B3 really means "the zero is OK", comes with .ref and .val, and (non-atomic B3).ref is still tearable?

In this model, (non-atomic B3).ref takes the place of (non-atomic B2) in the stacking I've been discussing.  Is that what you're saying?

    class B1 { }  // ref, identity, atomic
    value-based class B2 { }  // ref, non-identity, atomic
    [ non-atomic ] value class B3 { }  // ref or val, zero is ok, both projections share atomicity

If we go with ref-default, then this is a small leap from yesterday's stacking, because "B3" and "B2" are both reference types, so if you want a tearable, non-atomic reference type, saying `non-atomic value class B3` and then just using B3 gets you that. Then:

 - B2 is like B1, minus identity
 - B3 means "uninitialized values are OK, you get two types, a zero-default and a non-default"  - Non-atomicity is an extra property we can add to B3, to get more flattening in exchange for less integrity  - The use cases for non-atomic B2 are served by non-atomic B3 (when .ref is the default)

I think this still has the properties I want; I can freely choose the reasonable subsets of { identity, has-zero, nullable, atomicity } that I want; the orthogonality of non-atomic across buckets becomes orthogonality of non-atomic with nullity, and the "B3.ref is just like B2" is shown to be the "false friend."



Reply via email to