On 15 Dec 2021, at 10:42, Kevin Bourrillion wrote:

The main problem I think we can't escape is that we'll still need some word that means only the eight predefined types. (For the sake of argument let's assume we can pick one and lean hard on it, whether that's "predefined",
"built-in", "elemental", "leaf type", or whatever.)

As others have said, we’ll pick a term for this. The idea of calling out a “leaf” in a data graph is compelling to me. As you say, people are going to wonder what is the foundation of the whole scheme. (No it’s not objects all the way down, at least that’s not what we are aiming for.)

(But—spoiler alert—the division between leaf/scalar/basic type and composite/class type is *less important in daily practice* than the ad hoc mental models programmers make about which types they choose to view as composite and which are indivisible. Typical example: Most programmers choose to regard `String` as a sort of nullable primitive. I’ll pick up that thread later.)

I like the term “basic type”, and (as we already discussed) I like “scalar” also, because “scalar” correctly suggests something about how it’s processed in hardware.

Here’s a point I think is also important and has not been discussed much yet: A concept like “basic type” (or “scalar type”) should include references as well as Java’s eight current primitive types. Like an `int` or other basic primitive, a reference is copied by value, processed efficiently (probably in a hardware register), and is a “leaf item” with respect to a single object layout or method type signature. Also, like `int`, a reference has its own special operators in the language and special bytecodes in the JVM. Like `int`, it has a default value `null` (instead of `0`).

The main difference of a reference from an `int` is the fact that it has a far end: You can often (not always) find other values by indirecting the reference and loading a field or calling a method or querying a super type. (Because it has a far end, it also has a nominal subtype to classify what might be at the far end. But I’m speaking here about references per se, apart from their subtypes.) Despite their “far end”, people treat some reference types, like `String`, as if they were leaves; you stop at the `String` and don’t bother thinking about its fields. Users don’t care that there’s an array somewhere on the other end, unless they are engineering the string class itself. So a reference has a far end, unlike an `int`, but, like an `int`, a reference *often* is treated like an unstructured value, in code.

Bottom line: There are a handful of built-in basic types. These are used to compose classes. They are the primitives and the references. When we consider a reference apart from its class (say, as `jl.Object`), it can be comfortably called a *basic type*, and then that handful of built-in basic types consists of the (basic) primitives and references.

OK, that’s enough on that. Whether “reference” is a basic type is less important than how we choose to extend (or not extend) the reach of the term “primitive”.

For historic reasons we use the word ~~fruit~~ *primitive* to mean a basic type other than a reference. Now that we have user-defined `int`-like things, we have to decide whether and how to connect the old word to the new things. Since user-defined `int`-like things are (we think) very like `int` in many ways, a term like “extended primitive” makes sense.

This is how I get to the terms “basic primitive” and “extended primitive”. Or “scalar primitive” and “extended primitive”.

As I read your messages, you would prefer to keep the term “primitive” narrow, because of the possible confusion of telling users “hey, what you think of as primitives are now the ~~heirloom~~ basic primitives.” Personally, I think users will say, to our unveiling “extended primitives”, something like this:

Well, that’s not exactly what the dictionary says primitive means, if you can make new composite ones. But I do know that Java has non-reference types and calls them “primitive”. And I also know it would be really cool to define new types that work like `int`, such as `UnsignedInt` or `HalfFloat` or the like. I get why they don’t want to build all such types into the language; in fact maybe I’d like to try my hand someday at defining my own. So, “extended primitive”. It’s on: The Java primitives are now an open-ended set just like the Java objects.

In other words, in saying “extended primitive” (and also “basic primitive”) we lean away from the dictionary definition of “primitive” and into the Java definition. That feels like a non-confusing choice to me.


Definitely, our trying to minimize their specialness is virtuous.

Yep.  We also call this “healing the rift”, sometimes.

So we have to attempt to shift users' understanding of "primitive" while at the same time injecting a new term to mean exactly what primitive used to mean. That's the old Indiana Jones switch and I don't have to tell you how
that turned out for him.

So, no, it’s not the Indy switch, at all. Users know what ~~fruit~~ primitives are in Java, and they will have no problem with adding new ~~imported exotic apples~~ extended primitive to the familiar set of primitive types. And in exchange for this infusion of wonderful new types, they will learn a new term for the old types, which is ~~pears~~ basic primitives (or scalar primitives).


It would be difficult to pull off in a world where we were just pushing some new server and the whole world gets the new model at once. But in this universe where every version of Java ever made all have to coexist, it's
looking to me like a guaranteed source of never-ending confusion.

I also think it robs us of our ability to smoothly portray the real changes of Valhalla. We want to be able to say "elements are still elements! now we
have molecules too".

There are two kinds of users w.r.t. the question of “what’s a primitive” and you can’t please both. You and I want to please different kinds. The user I want to please is one who thinks of “Java primitive” as a kind of non-nullable scalar number (or boolean or char). The user you want to please thinks of “Java primitive” as “all leaves in the Big Graph”. The latter user will be disappointed if we say “Java primitives” can be non-leaves. The former user will be delighted. The latter user sees a `String` and wants to crack out its underlying array, in a Gollum-like quest for the roots of the mountains. The latter user treats a `String` as a primitive. There are more of the former than the latter; we should cater to them. It’s the former who I was channeling above, concluding with “The Java primitives are now an open-ended set just like the Java objects.”

Pedagogically that is always preferable to "elements
aren't really what you thought they were". Okay, the real comparison is a
little more nuanced than that, but I'll get to that now.

An alternative that seems to work fine, in my mental model at least, is:

- Primitive types are examples of value types, and have always been.
   - Java never supported any other kinds of value types before, so we
   didn't distinguish the terms before.
   - Everything you associate with primitive types remains true.
   - But most of those traits really come from their value-type-ness.

(I plan to make the above shifts to my model document already.)

The term “value” can be applied to composites in B3 alone, to composites in B2 alone, or to both. (Or neither.) All the basic types, including references, are values as well.

This is big choice, where to “spend” the term “value”.

Our choice will be informed and supported by our account about what *we mean* by the term “value”.

If the word value means “a primitive thing that can be stored in a register”, then we can’t extend it. So that won’t fly.

For us the word value means something like that but adjusted, “a thing that is freely copyable and can be stored in one or more registers”.

But look how that affects B2 and B3:

B3 are values, obviously; there is no reference to confuse their free copying. (There is also no reference to help us adjoin `null` to the value set, and no reference to help us perform safe publication.)

B2 are references to… well, values as well. They might be on the heap, or they might be elsewhere; we don’t care because the freely copyable values are not also accompanied by object identity.

Both B1 and B2 *references* (per se) are, confusingly, also values, since basic types (and/or references) are freely copyable.

But a B2 reference is a value, which refers to another value. (Proof they are distinct values: One is possibly null, the other isn’t.) And like a user using `String`, the value-ness of a B2 reference can be treated as a single, simple, atomic thing, without further reference to substructure. In particular, because it’s not B1, there’s no possibility of state under the B2 reference; there’s just the value you care about.

I think, because the term value applies in so many places (including B1 references), it will be tricky to use it as a classification (like “pear”) instead of an assertion of use (like “fruit”).

But given the choice between using the term “value” to classify types, distinguishing them from B1 types, I think the correct choice is to apply the term to B2, as “value object” vs. “identity object”.

The value-ness of B3 (as loose aggregates) and B1 (as references) is going to add a bit of confusion. Dan did a round of naming where he used the term “pure object” as the opposite of “identity object”; now we are at “value object” vs. “identity object”, I think.


   - Now we have user-defined value types too.
   - The way we user-define a type is with a class, so a value type is
   defined by a "value class" (sorry B2).
   - The primitive types will now each get a value class.
- These 8 classes will look as much like user-defined types as Object
   does.
- They, like Object, will have a "cheat" in their source code that no one else gets to use. (Object's is that there is no implied `extends Object` or `super();`; these need no fields because the data they store is
   magically handled by the VM. These feel like similar cheats.)

I don’t disagree with any of the above, but I think the value classes live in B2 not in B3. The B3 types are derived from the B2 types, by “dumping out” the class fields. Note that every single B3 type (non-reference) has a unique companion B2 type (reference). The semantic difference between those types is like the semantic difference between `int` and `Integer`. Narrow but useful.

Separate question: Does the declaring form for a B3/B2 type pair “look like” a B2-only declaration, but with an added mode switch? Or does it “look like” a B3-declaration, something that’s not a full-on class-that-defines-objects? We could go either way on that. Either way, one declaration will define two related types.

Suppose we have this B2-only class declaration syntax:

```
__ByValue class NamedInt { String name; int value; … }
```

Then a B2-tilted syntax for a B3/B2 pair might look like:

```
__ByValue __AlsoPrimitive class Point { double x, y; … }
```

And a B3-tilted syntax for the same pair might look like:

```
__ExtendedPrimitive Point { double x, y; … }
```

(F.D.: I think the B3-tilted syntax is less likely to succeed.)

Either way, you can draw out a B3 type from the first and a B2 type from the second.

As a sort of mental experiment, you can also imagine a “two headed” declaration syntax that would provide independent specification of the names of both types:

```
__PrimitiveType int &  /*int is B3*/
__PrimitiveBox class Integer /*int.ref=Integer is B2*/
    extends Comparable<Integer> {
  … one body with two heads …
}
```

Why do that? Well, it makes it clear that a one-headed declaration could in principle start with either the B3 or the B2 end of the stick. Also it helps us think, a little, about retrofitting the very odd legacy wrapper names.



Then mopping up the rest:

- Existing classes probably need a term like "reference classes" (in the model I'm going to circulate that doubles down on values-are-not-objects, then this wants to be "object classes", even though that feels weird at
   first).
   - I think the term for bucket 2 classes really ought to center on
identitylessness, e.g. "noid", "noident", "idfree", or something. Anything else is getting away from the essential meaning of the bucket; plus, we
   want people to call bucket 1 classes "identity classes", don't we?

If we spend the good word “value” on B3, we must then find a word like “noid” for B2. But since I think “value-ness” is centered in B2 from the start, I’d rather find a one-off term for B3! (And that’s “primitive” as argued above.)

But let’s grant, for a moment, that we don’t want “value” for B2. What term characterizes B2 types? As you say, they are objects but they don’t have identity, so “noid”, etc. That’s a true description. But it’s not the main point of B2 types. The point of B2 types is not that we dislike object identity (we like it a lot in many cases!). The point of B2 types is they can be regarded as tidy bundles of field values, and/or tidy abstractions (like `String`) of simple values, without confounding state changes. After looking at this from many angles, I prefer to say that, while B2 has the *negative* characteristic of being identity-free, it has the *positive* characteristic of being *freely copyable*. The “freely” is so free that copying often happens outside of the JVM heap. In fact, a B2 type is a value.

Maybe there’s a different way of characterizing the *positive* nature of B2, but I think it comes down to, “B2 types are plain values”. Until I get an even better account for B2’s special power (one that doesn’t begin with the word “not” or “no” or “doesn’t”), I’m going to be very happy to declare B2 types as “value classes” and work with their instances as “value objects”.

So, while I see why you want to avoid the paradox of “extended primitives”, and your very correct identification of “values” in B3, I prefer to talk about B3 as primitives (primitive values) and B2 as value objects.

BTW, I agree that B3 values should not be objects; maybe we can call them instances, although instance/class/object are terms that usually appear together. Obviously both B1 and B2 contain instances/classes/objects.

BTW again, I updated my own Zoo of Field Types diagram here, and you might wish to give it a look, since it’s relevant to this discussion:

http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf

(that’s cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf if the URL police got the previous line)

Footnote: for a more concrete manifestation of this problem: I am sure we
cannot possibly get away with Class.isPrimitive() being true for these
classes. Right?

Yeah, `Class::isPrimitive` is a query on types, not classes. In other words, the `Class` mirror, for this call, is serving to reflect a type, for example one of `int.class` or `Integer.class`. If we apply the term “primitive” to classes, then we will need a not-so-good name, like `Class::isPrimitiveClass`. However, if we choose to make extended primitives reflect very similarly to basic primitives, then we can choose to have `Class::isPrimitive` to return true *for their non-reference types*.

There is no reference type for which `Class::isPrimitive` is true. Despite my fondness for the concept of “basic types” there is no `Class::isBasicType`. There could be, in the future, though I don’t think it pulls its weight. We could also have `Class::isBasicPrimitive`. Or we could choose to break less code by keeping `Class::isPrimitive` true only for nine mirrors, and define `Class::isReferenceType` and/or `Class::isNonReferenceType` to provide the query for ~~fruit~~ basic or extended primitive types.

Reply via email to