Re: User model stacking: current status

Brian Goetz Wed, 29 Jun 2022 10:28:22 -0700

I think you have done a good job describing the pro of that model butweirdly not list the cons of that model.

I think we described the con pretty clearly: .val is ugly, and this putsit in people's face. This point was mentioned multiple times during thediscussions. But the notable thing is: no one has raised other cons. The con is syntax.

All your points here are basically a dressed-up version of this sameissue: at least in some cases, some users will be grumpy that the goodname goes to the thing they don't want. And this is a point we arepainfully aware of, so none of this is particularly new.

And we have explored all the positions on this (Point is ref, Point isval, let the user pick two names, let the declarer choose, etc), andthey all have downsides. Specifically, we explored having `ref-default`and `val-default` as declaration-site options; this "gives the user morecontrol" (developers love knobs!) But it also imposes a significantcognitive load on all developers: people no longer know what `Point`means. Is it nullable? Is it a reference? You have to look it up, or"carry around a mental database." If anyone has the choices, theneveryone has more responsibility. And given that the performancedifferences between Point.ref and Point.val accrue pretty muchexclusively in the heap, which is to say, apply only to implementationcode and not API, sticking the implementation with this burden seemsresaonable.

Honestly, I think this is entirely a syntax concern; .val is ugly. Opento better ideas here, though many attempts have already been made. (Ifwe're at the "all we have left to complain about is syntax" point, thenwe're winning!)




On 6/29/2022 10:38 AM, Remi Forax wrote:



------------------------------------------------------------------------

    *From: *"Brian Goetz" <[email protected]>
    *To: *"Kevin Bourrillion" <[email protected]>
    *Cc: *"daniel smith" <[email protected]>,
    "valhalla-spec-experts" <[email protected]>
    *Sent: *Thursday, June 23, 2022 9:01:24 PM
    *Subject: *Re: User model stacking: current status


    On 6/15/2022 12:41 PM, Kevin Bourrillion wrote:

        All else being equal, the idea to use "inaccessible value
        type" over "value type doesn't exist" feels very good and
        simplifying, with the main problem that the syntax can't help
        but be gross.


    A few weeks in, and this latest stacking is still feeling pretty good:

     - There are no coarse buckets any more; there are just identity
    classes and value classes.
     - Value classes have ref and val companion types with the obvious
    properties.  (Notably, refs are always atomic.)
     - For `value class C`, C as a type is an alias for `C.ref`.
     - The bucket formerly known as B2 becomes "value class, whose
    .val type is private."  This is the default for a value class.
     - The bucket formerly known as B3a is denoted by explicitly
    making the val companion public, with a public modifier on a
    "member" of the class.
     - The bucket formerly known as B3n is denoted by explicitly
    making the val companion public and non-atomic, again using
    modifiers.

    I went and updated the State of the Values document to use the new
    terminology, test-driving some new syntax. (Usual rules: syntax
    comments are premature at this time.)  I was very pleased with the
    result, because almost all the changes were small changes in
    terminology (e.g., "value companion type"), and eliminating the
    clumsy distinction between value classes and primitive classes. 
    Overall the structure remains the same, but feels more compact and
    clean.  MD source is below, for review.

    Kevin's two questions remain, but I don't think they get in the
    way of refining the model in this way:

     - Have we made the right choices around == ?
     - Are we missing a big opportunity by not spelling Complex.val

with a bang?

I think you have done a good job describing the pro of that model butweirdly not list the cons of that model.

I see three reasons your proposed model, let's call it the companionclass model, needs improvements.It fails our moto, the companion class model and the VM models are notaligned and the performance model is a "sigil for performance" model.



It fails our moto (code like a class, works like an int):

If i say that an Image is an array of pixels with each pixel havethree colors,

the obvious translation is not the right one:

   class Image {
     Pixel[][] pixels;
   }
   value record Pixel(Color red, Color green, Color blue) {}
   value record Color(byte value) {}

because a value class is nullable, only it's companion class is notnullable, the correct code is

  class Image {
     Pixel.val[][] pixels;
   }
   value record Pixel(Color.val red, Color.val green, Color.val blue) {}
   value record Color(byte value) {}

Color and byte does not work the same way, it's not code like a classworks like an int but code like a class, works like an Integer.



The VM models and the Java model are not aligned:

For the VM model, L-type and Q-type on equal footing, not one is moreimportant than the other, but the companion class model you proposemakes the value class a first citizen and the companion class a secondcitizen.We know that when the Java model and the VM model are not aligned,bugs will lie in between. Those can be mild bugs, by example you canthrow a checked exception from a method not declaring that exceptionor painful bugs in the case of generics or serialization.I think we should list all the cases where the Java Model and the VMmodel disagree to see the kind of bugs we will ask the futuregeneration to solve.By example, having a value class with a default constructor and publiccompanion class looks like a lot like a deserialization bug to me, inboth case you are able to produce an instance that bypass the constructor.The other problem is for the other languages than Java. Do thoselanguages will have to define a companion class or a companion classis purely a javac artifact the same way an attribute like InnerClass is.


The proposed performance model is a "sigil for performance" model.

There is a tradeoff between the safety of the reference vs theperformance of flattened value type. In the proposed model, the choiceis not done by the maintainer of the class but by the user of theclass. This is not fully true, the maintainer of the class can makethe companion class private choosing safety but it can not chooseperformance. The performance has to be chosen by the user of the class.This is unlike everything we know in Java, this kind of model wherethe user choose performance is usually called "sigil for performance",the user has to add some magical keywords or sigil to get performance.A good example of such performance model is the keyword "register" inC. You have to opt-in at use site to get performance.Moreover unlike in C, in Java we also have to take care of the factthat adding .val is not a backward compatible change, if a value classis used in a public method a user can not change it to its companionclass after the fact.We know from the errors of past that a "sigil for performance" modelis a terrible model.

Overall, i don't think it's the wrong model, but it over-rotates onthe notion of reference value class, it's refreshing because in thepast we had the tendency to over-rotate on the notion of flattenedvalue class.I really think that this model can be improved by allowing top-levelvalue class to be declared either as reference or as value and thecompanion class to be either a value class projection or a referenceclass projection so the Java model and the VM model will be more in sync.


Rémi




    # State of Valhalla
    ## Part 2: The Language Model {.subtitle}

    #### Brian Goetz {.author}
    #### June 2022 {.date}

    > _This is the second of three documents describing the current
    State of
      Valhalla.  The first is [The Road to Valhalla](01-background); the
      third is [The JVM Model](03-vm-model)._

    This document describes the directions for the Java _language_
    charted by
    Project Valhalla.  (In this document, we use "currently" to
    describe the
    language as it stands today, without value classes.)

    Valhalla started with the goal of providing user-programmable
    classes which can
    be flat and dense in memory.  Numerics are one of the motivating
    use cases;
    adding new primitive types directly to the language has a very
    high barrier.  As
    we learned from [Growing a Language][growing] there are infinitely
    many numeric
    types we might want to add to Java, but the proper way to do that
    is via
    libraries, not as a language feature.

    ## Primitive and reference types in Java today

    Java currently has eight built-in primitive types. Primitives
    represent pure
    _values_; any `int` value of "3" is equivalent to, and
    indistinguishable from,
    any other `int` value of "3".  Primitives are monolithic (their
    bits cannot be
    addressed individually) and have no canonical location, and so are
    _freely
    copyable_. With the exception of the unusual treatment of exotic
    floating point
    values such as `NaN`, the `==` operator performs a
    _substitutibility test_ -- it
    asks "are these two values the same value".

    Java also has _objects_, and each object has a unique _object
    identity_. Because
    of identity, objects are not freely copyable; each object lives in
    exactly one
    place at any given time, and to access its state we have to go to
    that place.
    But we mostly don't notice this because objects are not
    manipulated or accessed
    directly, but instead through _object references_. Object
    references are also a
    kind of value -- they encode the identity of the object to which
    they refer, and
    the `==` operator on object references asks "do these two
    references refer to
    the same object."  Accordingly, object _references_ (like other
    values) can be
    freely copied, but the objects they refer to cannot.

    Primitives and objects differ in almost every conceivable way:

    | Primitives                                 |
    Objects                            |
    | ------------------------------------------ |
    ---------------------------------- |
    | No identity (pure values)                  |
    Identity                           |
    | `==` compares values                       | `==` compares
    object identity      |
    | Built-in                                   | Declared in
    classes                |
    | No members (fields, methods, constructors) | Members (including
    mutable fields) |
    | No supertypes or subtypes                  | Class and interface
    inheritance    |
    | Accessed directly                          | Accessed via object
    references     |
    | Not nullable                               |
    Nullable                           |
    | Default value is zero                      | Default value is
    null              |
    | Arrays are monomorphic                     | Arrays are
    covariant               |
    | May tear under race                        | Initialization
    safety guarantees   |
    | Have reference companions (boxes)          | Don't need
    reference companions    |

    The design of primitives represents various tradeoffs aimed at
    maximizing
    performance and usability of the primtive types. Reference types
    default to
    `null`, meaning "referring to no object"; primitives default to a
    usable zero
    value (which for most primitives is the additive identity). 
    Reference types
    provide initialization safety guarantees against a certain
    category of data
    races; primitives allow tearing under race for larger-than-32-bit
    values.
    We could characterize the design principles behind these tradeoffs
    are "make
    objects safer, make primitives faster."

    The following figure illustrates the current universe of Java's
    types.  The
    upper left quadrant is the built-in primitives; the rest of the
    space is
    reference types.  In the upper-right, we have the abstract
    reference types --
    abstract classes, interfaces, and `Object` (which, though
    concrete, acts more
    like an interface than a concrete class).  The built-in primitives
    have wrappers
    or boxes, which are reference types.

    <figure>
      <a href="field-type-zoo.pdf" title="Click for PDF">
        <img src="field-type-zoo-old.png" alt="Current universe of
    Java field types"/>
      </a>
    </figure>

    Valhalla aims to unify primitives and objects in that they can both be
    declared with classes, but maintains the special runtime
    characteristics
    primitives have.  But while everyone likes the flatness and
    density that
    user-definable value types promise, in some cases we want them to
    be more like
    classical objects (nullable, non-tearable), and in other cases we
    want them to
    be more like classical primitives (trading some safety for
    performance).

    ## Value classes: separating references from identity

    Many of the impediments to optimization that Valhalla seeks to
    remove center
    around _unwanted object identity_.  The primitive wrapper classes
    have identity,
    but it is a purely accidental one.  Not only is it not directly
    useful, it can
    be a source of bugs.  For example, due to caching, `Integer` can
    be accidentally
    compared correctly with `==` just often enough that people keep
    doing it.
    Similarly, [value-based classes][valuebased] such as `Optional`
    have no need for
    identity, but pay the costs of having identity anyway.

    Our first step is allowing class declarations to explicitly
    disavow identity, by
    declaring themselves as _value classes_.  The instances of a value
    class are
    called _value objects_.

    ```
    value class ArrayCursor<T> {
        T[] array;
        int offset;

        public ArrayCursor(T[] array, int offset) {
            this.array = array;
            this.offset = offset;
        }

        public boolean hasNext() {
            return offset < array.length;
        }

        public T next() {
            return array[offset];
        }

        public ArrayCursor<T> advance() {
            return new ArrayCursor(array, offset+1);
        }
    }
    ```

    This says that an `ArrayCursor` is a class whose instances have no
    identity --
    that instead they have _value semantics_.  As a consequence, it
    must give up the
    things that depend on identity; the class and its fields are
    implicitly final.

    But, value classes are still classes, and can have most of the
    things classes
    can have -- fields, methods, constructors, type parameters,
    superclasses (with
    some restrictions), nested classes, class literals, interfaces,
    etc.  The
    classes they can extend are restricted: `Object` or abstract
    classes with no
    instance fields, empty no-arg constructor bodies, no other
    constructors, no instance
    initializers, no synchronized methods, and whose superclasses all
    meet this same
    set of conditions.  (`Number` meets these conditions.)

    Classes in Java give rise to types; the class `ArrayCursor` gives
    rise to a type
    `ArrayCursor` (actually a parametric family of instantiations
    `ArrayCursor<T>`.)
    `ArrayCursor` is still a reference type, just one whose references
    refer to
    value objects rather than identity objects. For the types in the
    upper-right
    quadrant of the diagram (interfaces, abstract classes, and
    `Object`), references
    to these types might refer to either an identity object or a value
    object.
    (Historically, JVMs were effectively forced to represent object
    references with
    pointers; for references to value objects, JVMs now have more
    flexibility.)

    Because `ArrayCursor` is a reference type, it is nullable (because
    references
    are nullable), its default value is null, and loads and stores of
    references are
    atomic with respect to each other even in the presence of data
    races, providing
    the initialization safety we are used to with classical objects.

    Because instances of `ArrayCursor` have value semantics, `==`
    compares by state
    rather than identity.  This means that value objects, like
    primitives, are
    _freely copyable_; we can explode them into their fields and
    re-aggregate them
    into another value object, and we cannot tell the difference. 
    (Because they
    have no identity, some identity-sensitive operations, such as
    synchronization,
    are disallowed.)

    So far we've addressed the first two lines of the table of
    differences above;
    rather than identity being a property of all object instances,
    classes can
    decide whether their instances have identity or not.  By allowing
    classes that
    don't need identity to exclude it, we free the runtime to make
    better layout and
    compilation decisions -- and avoid a whole category of bugs.

    In looking at the code for `ArrayCursor`, we might mistakenly
    assume it will be
    inefficient, as each loop iteration appears to allocate a new cursor:

    ```
    for (ArrayCursor<T> c = Arrays.cursor(array);
         c.hasNext();
         c = c.advance()) {
        // use c.next();
    }
    ```

    One should generally expect here that _no_ cursors are actually
    allocated.
    Because an `ArrayCursor` is just its two fields, these fields will
    routinely get
    scalarized and hoisted into registers, and the constructor call in
    `advance`
    will typically compile down to incrementing one of these registers.

    ### Migration

    The JDK (as well as other libraries) has many [value-based
    classes][valuebased]
    such as `Optional` and `LocalDateTime`.  Value-based classes
    adhere to the
    semantic restrictions of value classes, but are still identity
    classes -- even
    though they don't want to be.  Value-based classes can be migrated
    to true value
    classes simply by redeclaring them as value classes, which is both
    source- and
    binary-compatible.

    We plan to migrate many value-based classes in the JDK to value
    classes.
    Additionally, the primitive wrappers can be migrated to value
    classes as well,
    making the conversion between `int` and `Integer` cheaper; see the
    section
    "Legacy Primitives" below.  (In some cases, this may be _behaviorally_
    incompatible for code that synchronizes on the primitive
    wrappers.  [JEP
    390][jep390] has supported both compile-time and runtime warnings for
    synchronizing on primitive wrappers since Java 16.)

    <figure>
      <a href="field-type-zoo.pdf" title="Click for PDF">
        <img src="field-type-zoo-mid.png" alt="Java field types adding
    value classes"/>
      </a>
    </figure>

    ### Equality

    Earlier we said that `==` compares value objects by state rather
    than by
    identity.  More precisely, two value objects are `==` if they are
    of the same
    type, and each of their fields are pairwise equal, where equality
    is given by
    `==` for primitives (except `float` and `double`, which are
    compared with
    `Float::equals` and `Double::equals` to avoid anomalies), `==` for
    references to
    identity objects, and recursively with `==` for references to
    value objects.  In
    no case is a value object ever `==` to a reference to an identity
    object.

    ### Value records

    While records have a lot in common with value classes -- they are
    final and
    their fields are final -- they are still identity classes. 
    Records embody a
    tradeoff: give up on decoupling the API from the representation,
    and in return
    get various syntactic and semantic benefits.  Value classes embody
    another
    tradeoff: give up identity, and get various semantic and
    performance benefits.
    If we are willing to give up both, we can get both sets of benefits.

    ```
    value record NameAndScore(String name, int score) { }
    ```

    Value records combine the data-carrier idiom of records with the
    improved
    scalarization and flattening benefits of value classes.

    In theory, it would be possible to apply `value` to certain enums
    as well, but
    this is not currently possible because the `java.lang.Enum` base
    class that
    enums extend do not meet the requirements for superclasses of
    value classes (it
    has fields and non-empty constructors).

    ## Unboxing values for flatness and density

    Value classes shed object identity, gaining a host of performance and
    predictability benefits in the process.  They are an ideal
    replacement for many
    of today's value-based classes, fully preserving their semantics
    (except for the
    accidental identity these classes never wanted).  But
    identity-free reference
    types are only one point a spectrum of tradeoffs between
    abstraction and
    performance, and other desired use cases -- such as numerics --
    may want a
    different set of tradeoffs.

    Reference types are nullable, and therefore must account for null
    somehow in
    their representation, which may involve additional footprint. 
    Similarly, they
    offer the initialization safety guarantees for final fields that
    we come to
    expect from identity objects, which may entail limits on
    flatness.  For certain
    use cases, it may be desire to additionally give up something else
    to make
    further flatness and footprint gains -- and that something else is
    reference-ness.

    The built-in primitives are best understood as _pairs_ of types: a
    primitive
    type (e.g., `int`) and its reference companion or box (`Integer`),
    with
    conversions between the two (boxing and unboxing.)  We have both
    types because
    the two have different characteristics.  Primitives are optimized
    for efficient
    storage and access: they are not nullable, they tolerate
    uninitialized (zero)
    values, and larger primitive types (`long`, `double`) may tear
    under racy
    access.  References err on the side of safety and flexibility;
    they support
    nullity, polymorphism, and offer initialization safety (freedom
    from tearing),
    but by comparison to primitives, they pay a footprint and
    indirection cost.

    For these reasons, value classes give rise to pairs of types as
    well: a
    reference type and a _value companion type_.  We've seen the
    reference type so
    far; for a value class `Point`, the reference type is called
    `Point`.  (The full
    name for the reference type is `Point.ref`; `Point` is an alias
    for that.)  The
    value companion type is called `Point.val`, and the two types have
    the same
    conversions between them as primitives do today with their boxes. 
    (If we are
    talking explicitly about the value companion type of a value
    class, we may
    sometimes describe the corresponding reference type as its _reference
    companion_.)

    ```
    value class Point implements Serializable {
        int x;
        int y;

        Point(int x, int y) {
            this.x = x;
            this.y = y;
        }

        Point scale(int s) {
            return new Point(s*x, s*y);
        }
    }
    ```

    The default value of the value companion type is the one for which
    all fields
    take on their default value; the default value of the reference
    type is, like
    all reference types, null.

    In our diagram, these new types show up as another entity that
    straddles the
    line between primitives and identity-free references, alongside
    the legacy
    primitives:

    ** UPDATE DIAGRAM **

    <figure>
      <a href="field-type-zoo.pdf" title="Click for PDF">
        <img src="field-type-zoo-new.png" alt="Java field types with
    extended primitives"/>
      </a>
    </figure>

    ### Member access

    Both the reference and value companion types are seen to have the
    same instance
    members.  Unlike today's primitives, value companion types can be
    used as
    receivers to access fields and invoke methods, subject to
    accessibility
    constraints:

    ```
    Point.val p = new Point(1, 2);
    assert p.x == 1;

    p = p.scale(2);
    assert p.x == 2;
    ```

    ### Polymorphism

    When we declare a class today, we set up a subtyping (is-a)
    relationship between
    the declared class and its supertypes.  When we declare a value
    class, we set up
    a subtyping relationship between the _reference type_ and the declared
    supertypes. This means that if we declare:

    ```
    value class UnsignedShort extends Number
                              implements Comparable<UnsignedShort> {
       ...
    }
    ```

    then `UnsignedShort` is a subtype of `Number` and
    `Comparable<UnsignedShort>`,
    and we can ask questions about subtyping using `instanceof` or
    pattern matching.
    What happens if we ask such a question of the value companion type?

    ```
    UnsignedShort.val us = ...
    if (us instanceof Number) { ... }
    ```

    Since subtyping is defined only on reference types, the
    `instanceof` operator
    (and corresponding type patterns) will behave as if both sides
    were lifted to
    the approrpriate reference type, and we can answer the question
    that way.  (This
    may trigger fears of expensive boxing conversions, but in reality
    no actual
    allocation will happen.)

    We introduce a new relationship based on `extends` / `implements`
    clauses, which
    we'll call "extends"; we define `A extends B` as meaning `A <: B`
    when A is a
    reference type, and `A.ref <: B` when A is a value companion
    type.  The
    `instanceof` relation, reflection, and pattern matching are
    updated to use
    "extends".

    ### Arrays

    Arrays of reference types are _covariant_; this means that if `A
    <: B`, then
    `A[] <: B[]`.  This allows `Object[]` to be the "top array type",
    at least for
    arrays of references.  But arrays of primitives are currently left
    out of this
    story.   We can unify the treatment of arrays by defining array
    covariance over
    the new "extends" relationship; if A extends B, then `A[] <:
    B[]`.  For a value
    class P, `P.val[] <: P.ref[] <: Object[]`, finally making
    `Object[]` the top
    type for all arrays.

    ### Equality

    Just as with `instanceof`, we define `==` on values by appealing
    to the
    reference companion (though no actual boxing need occur). 
    Evaluating `a == b`,
    where one or both operands are of a value companion type, can be
    defined as if
    the operands are first converted to their corresponding reference
    type, and then
    comparing the results.  This means that the following will succeed:

    ```
    Point.val p = new Point(3, 4);
    Point pr = p;
    assert p == pr;
    ```

    The base implementation of `Object::equals` delegates to `==`,
    which is a
    suitable default for both reference and value classes.

    ### Serialization

    If a value class implements `Serializable`, this is also really a
    statement
    about the reference type.  Just as with other aspects described here,
    serialization of value companions can be defined by converting to the
    corresponding reference type and serializing that, and reversing
    the process at
    deserialization time.

    Serialization currently uses object identity to preserve the
    topology of an
    object graph.  This generalizes cleanly to objects without
    identity, because
    `==` on value objects treats two identical copies of a value
    object as equal.
    So any observations we make about graph topology prior to
    serialization with
    `==` are consistent with those after deserialization.

    ### Identity-sensitive operations

    Certain operations are currently defined in terms of object
    identity.  As we've
    already seen, some of these, like equality, can be sensibly
    extended to cover
    all instances.  Others, like synchronization, will become partial.
    Identity-sensitive operations include:

      - **Equality.**  We extend `==` on references to include
    references to value
        objects.  Where it currently has a meaning, the new definition
    coincides
        with that meaning.

      - **System::identityHashCode.**  The main use of
    `identityHashCode` is in the
        implementation of data structures such as `IdentityHashMap`. 
    We can extend
        `identityHashCode` in the same way we extend equality --
    deriving a hash on
        primitive objects from the hash of all the fields.

      - **Synchronization.**  This becomes a partial operation.  If we can
        statically detect that a synchronization will fail at runtime
    (including
        declaring a `synchronized` method in a value class), we can
    issue a
        compilation error; if not, attempts to lock on a value object
    results in
        `IllegalMonitorStateException`.  This is justifiable because it is
        intrinsically imprudent to lock on an object for which you do
    not have a
        clear understanding of its locking protocol; locking on an
    arbitrary
        `Object` or interface instance is doing exactly that.

      - **Weak, soft, and phantom references.**  Capturing an exotic
    reference to a
        value object becomes a partial operation, as these are
    intrinsically tied to
        reachability (and hence to identity).  However, we will likely
    make
        enhancements to `WeakHashMap` to support mixed identity and
    value keys.

    ### What about Object?

    The root class `Object` poses an unusual problem, in that every
    class must
    extend it directly or indirectly, but it is also instantiable
    (non-abstract),
    and its instances have identity -- it is common to use `new
    Object()` as a way
    to obtain a new object identity for purposes of locking.

    ## Why two types?

    It is sensible to ask: why do we need companion types at all? 
    This is analogous
    to the need for boxes in 1995: we'd made one set of tradeoffs for
    primitives,
    favoring performance (non-nullable, zero-default, tolerant of
    non-initialization, tolerant of tearing under race, unrelated to
    `Object`), and
    another for references, favoring flexibility and safety.  Most of
    the time, we
    ignored the primitive wrapper classes, but sometimes we needed to
    temporarily
    suppress one of these properties, such as when interoperating with
    code that
    expects an `Object` or the ability to express "no value".  The
    reasons we needed
    boxes in 1995 still apply today: sometimes we need the affordances of
    references, and in those cases, we appeal to the reference companion.

    Reasons we might want to use the reference companion include:

     - **Interoperation with reference types.**  Value classes can
    implement
       interfaces and extend classes (including `Object` and some
    abstract classes),
       which means some class and interface types are going to be
    polymorphic over
       both identity and primitive objects.  This polymorphism is
    achieved through
       object references; a reference to `Object` may be a reference
    to an identity
       object, or a reference to a value object.

     - **Nullability.**  Nullability is an affordance of object
    _references_, not
       objects themselves.  Most of the time, it makes sense that
    primitive types
       are non-nullable (as the primitives are today), but there may
    be situations
       where null is a semantically important value.  Using the
    reference companion
       when nullability is required is semantically clear, and avoids
    the need to
       invent new sentinel values for "no value."

       This need comes up when migrating existing classes; the method
    `Map::get`
       uses `null` to signal that the requested key was not present in
    the map. But,
       if the `V` parameter to `Map` is a primitive class, `null` is
    not a valid
       value.  We can capture the "`V` or null" requirement by
    changing the
       descriptor of `Map::get` to:

       ```
       public V.ref get(K key);
       ```

       where, whatever type `V` is instantiated as, `Map::get` returns
    the reference
       companion. (For a type `V` that already is a reference type,
    this is just `V`
       itself.) This captures the notion that the return type of
    `Map::get` will
       either be a reference to a `V`, or the `null` reference. (This is a
       compatible change, since both erase to the same thing.)


     - **Self-referential types.**  Some types may want to directly or
    indirectly
       refer to themselves, such as the "next" field in the node type
    of a linked
       list:

       ```
       class Node<T> {
           T theValue;
           Node<T> nextNode;
       }
       ```

       We might want to represent this as a value class, but if the
    type of
       `nextNode` were `Node.val<T>`, the layout of `Node` would be
       self-referential, since we would be trying to flatten a `Node`
    into its own
       layout.

     - **Protection from tearing.**  For a value class with a
    non-atomic value
       companion type, we may want to use the reference companion in
    cases where we
       are concerned about tearing; because loads and stores of
    references are
       atomic, `P.ref` is immune to the tearing under race that
    `P.val` might be
       subject to.

     - **Compatibility with existing boxing.**  Autoboxing is
    convenient, in that it
       lets us pass a primitive where a reference is required.  But
    boxing affects
       far more than assignment conversion; it also affects method
    overload
       selection.  The rules are designed to prefer overloads that
    require no
       conversions to those requiring boxing (or varargs)
    conversions.  Having both
       a value and reference type for every value class means that
    these rules can
       be cleanly and intuitively extended to cover value classes.

    ## Refining the value companion

    Value classes have several options for refining the behavior of
    the value
    companion type and how they are exposed to clients.

    ### Classes with no good default value

    For a value class `C`, the default value of `C.ref` is the same as
    any other
    reference type: `null`.  For the value companion type `C.val`, the
    default value
    is the one where all of its fields are initialized to their
    default value.

    The built-in primitives reflect the design assumption that zero is
    a reasonable
    default.  The choice to use a zero default for uninitialized
    variables was one
    of the central tradeoffs in the design of the built-in
    primitives.  It gives us
    a usable initial value (most of the time), and requires less
    storage footprint
    than a representation that supports null (`int` uses all 2^32 of
    its bit
    patterns, so a nullable `int` would have to either make some 32
    bit signed
    integers unrepresentable, or use a 33rd bit).  This was a
    reasonable tradeoff
    for the built-in primitives, and is also a reasonable tradeoff for
    many (but not
    all) other potential value classes (such as complex numbers, 2D
    points,
    half-floats, etc).

    But for others potential value classes, such as `LocalDate`, there
    _is_ no
    reasonable default.  If we choose to represent a date as the
    number of days
    since some some epoch, there will invariably be bugs that stem from
    uninitialized dates; we've all been mistakenly told by computers
    that something
    will happen on or near 1 January 1970.  Even if we could choose a
    default other
    than the zero representation, an uninitialized date is still
    likely to be an
    error -- there simply is no good default date value.

    For this reason, value classes have the choice of encapsulating or
    exposing
    their value companion type.  If the class is willing to tolerate an
    uninitialized (zero) value, it can freely share its `.val`
    companion with the
    world; if uninitialized values are dangerous (such as for
    `LocalDate`), it can
    be encapsulated to the class or package.

    Encapsulation is accomplished using ordinary access control.  By
    default, the
    value companion is `private`, and need not be declared explicitly;
    a class that
    wishes to share its value companion can make it public:

    ```
    public value record Complex(double real, double imag) {
        public value companion Complex.val;
    }
    ```

    ### Atomicity and tearing

    For the primitive types longer than 32 bits (long and double), it
    is not
    guaranteed that reads and writes from different threads (without
    suitable
    coordination) are atomic with respect to each other. The result is
    that, if
    accessed under data race, a long or double field or array element
    can be seen to
    "tear", and a read might see the low 32 bits of one write and the
    high 32 bits
    of another.  (Declaring the containing field `volatile` is
    sufficient to restore
    atomicity, as is properly coordinating with locks or other
    concurrency control,
    or not sharing across threads in the first place.)

    This was a pragmatic tradeoff given the hardware of the time; the
    cost of 64-bit
    atomicity on 1995 hardware would have been prohibitive, and
    problems only arise
    when the program already has data races -- and most numeric code
    deals with
    thread-local data.  Just like with the tradeoff of nulls vs zeros,
    the design of
    the built-in primitives permits tearing as part of a tradeoff between
    performance and correctness, where primitives chose "as fast as
    possible" and
    reference types chose more safety.

    Today, most JVMs give us atomic loads and stores of 64-bit
    primitives, because
    the hardware makes them cheap enough.  But value classes bring us
    back to
    1995; atomic loads and stores of larger-than-64-bit values are
    still expensive
    on many CPUs, leaving us with a choice of "make operations on
    primitives slower"
    or permitting tearing when accessed under race.

    It would not be wise for the language to select a
    one-size-fits-all policy about
    tearing; choosing "no tearing" means that types like `Complex` are
    slower than
    they need to be, even in a single-threaded program; choosing
    "tearing" means
    that classes like `Range` can be seen to not exhibit invariants
    asserted by
    their constructor.  Class authors have to choose, with full
    knowledge of their
    domain, whether their types can tolerate tearing.  The default is
    no tearing
    (safe by default); a class can opt for greater flattening at the
    cost of
    potential tearing by declaring the value companion as `non-atomic`:

    ```
    public value record Complex(double real, double imag) {
        public non-atomic value companion Complex.val;
    }
    ```

    For classes like `Complex`, all of whose bit patterns are valid,
    this is very
    much like the choice around `long` in 1995.  For other classes
    that might have
    nontrivial representational invariants, they likely want to stick
    to the default
    of atomicity.

    ## Migrating legacy primitives

    As part of generalizing primitives, we want to adjust the built-in
    primitives to
    behave as consistently with value classes as possible. While we
    can't change
    the fact that `int`'s reference companion is the oddly-named
    `Integer`, we can give them
    more uniform aliases (`int.ref` is an alias for `Integer`; `int`
    is an alias for
    `Integer.val`) -- so that we can use a consistent rule for naming
    companions.
    Similarly, we can extend member access to the legacy primitives,
    and allow
    `int[]` to be a subtype of `Integer[]` (and therefore of `Object[]`.)

    We will redeclare `Integer` as a value class with a public value
    companion:

    ```
    value class Integer {
        public value companion Integer.val;

        // existing methods
    }
    ```

    where the type name `int` is an alias for `Integer.val`.  The
    primitive array
    types will be retrofitted such that arrays of primitives are
    subtypes of arrays
    of their boxes (`int[] <: Integer[]`).

    ## Unifying primitives with classes

    Earlier, we had a chart of the differences between primitive and
    reference
    types:

    | Primitives                                 |
    Objects                            |
    | ------------------------------------------ |
    ---------------------------------- |
    | No identity (pure values)                  |
    Identity                           |
    | `==` compares values                       | `==` compares
    object identity      |
    | Built-in                                   | Declared in
    classes                |
    | No members (fields, methods, constructors) | Members (including
    mutable fields) |
    | No supertypes or subtypes                  | Class and interface
    inheritance    |
    | Accessed directly                          | Accessed via object
    references     |
    | Not nullable                               |
    Nullable                           |
    | Default value is zero                      | Default value is
    null              |
    | Arrays are monomorphic                     | Arrays are
    covariant               |
    | May tear under race                        | Initialization
    safety guarantees   |
    | Have reference companions (boxes)          | Don't need
    reference companions    |

    The addition of value classes addresses many of these directly. 
    Rather than
    saying "classes have identity, primitives do not", we make
    identity an optional
    characteristic of classes (and derive equality semantics from
    that.)  Rather
    than primitives being built in, we derive all types, including
    primitives, from
    classes, and endow value companion types with the members and
    supertypes
    declared with the value class.  Rather than having primitive arrays be
    monomorphic, we make all arrays covariant under the `extends`
    relation.

    The remaining differences now become differences between reference
    types and
    value types:

    | Value types                                   | Reference
    types                  |
    | --------------------------------------------- |
    -------------------------------- |
    | Accessed directly                             | Accessed via
    object references   |
    | Not nullable                                  |
    Nullable                         |
    | Default value is zero                         | Default value is
    null            |
    | May tear under race, if declared `non-atomic` | Initialization
    safety guarantees |


    ### Choosing which to use

    How would we choose between declaring an identity class or a value
    class, and
    the various options on value companiones?  Here are some quick
    rules of thumb:

     - If you need mutability, subclassing, or aliasing, choose an
    identity class.
     - If uninitialized (zero) values are unacceptable, choose a value
    class with
       the value companion encapsulated.
     - If you have no cross-field invariants and are willing to
    tolerate tearing to
       enable more flattening, choose a value class with a non-atomic
    value
       companion.

    ## Summary

    Valhalla unifies, to the extent possible, primitives and
    objects.   The
    following table summarizes the transition from the current world
    to Valhalla.

    | Current World                               | Valhalla |
    | ------------------------------------------- |
    --------------------------------------------------------- |
    | All objects have identity                   | Some objects have
    identity                                |
    | Fixed, built-in set of primitives           | Open-ended set of
    primitives, declared via classes        |
    | Primitives don't have methods or supertypes | Primitives are
    classes, with methods and supertypes       |
    | Primitives have ad-hoc boxes                | Primitives have
    regularized reference companions          |
    | Boxes have accidental identity              | Reference
    companions have no identity                     |
    | Boxing and unboxing conversions             | Primitive
    reference and value conversions, but same rules |
    | Primitive arrays are monomorphic            | All arrays are
    covariant                                  |


    [valuebased]:
    
https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html
    [growing]: https://dl.acm.org/doi/abs/10.1145/1176617.1176621
    [jep390]: https://openjdk.java.net/jeps/390

Re: User model stacking: current status

Reply via email to