Re: We have to talk about "primitive".

John Rose Wed, 15 Dec 2021 19:15:55 -0800

On 15 Dec 2021, at 10:42, Kevin Bourrillion wrote:

…
The main problem I think we can't escape is that we'll still need somewordthat means only the eight predefined types. (For the sake of argumentlet'sassume we can pick one and lean hard on it, whether that's"predefined",
"built-in", "elemental", "leaf type", or whatever.)

As others have said, we’ll pick a term for this. The idea of callingout a “leaf” in a data graph is compelling to me. As you say,people are going to wonder what is the foundation of the whole scheme.(No it’s not objects all the way down, at least that’s not what weare aiming for.)

(But—spoiler alert—the division between leaf/scalar/basic type andcomposite/class type is *less important in daily practice* than the adhoc mental models programmers make about which types they choose to viewas composite and which are indivisible. Typical example: Mostprogrammers choose to regard `String` as a sort of nullable primitive.I’ll pick up that thread later.)

I like the term “basic type”, and (as we already discussed) I like“scalar” also, because “scalar” correctly suggests somethingabout how it’s processed in hardware.

Here’s a point I think is also important and has not been discussedmuch yet: A concept like “basic type” (or “scalar type”) shouldinclude references as well as Java’s eight current primitive types.Like an `int` or other basic primitive, a reference is copied by value,processed efficiently (probably in a hardware register), and is a“leaf item” with respect to a single object layout or method typesignature. Also, like `int`, a reference has its own special operatorsin the language and special bytecodes in the JVM. Like `int`, it has adefault value `null` (instead of `0`).

The main difference of a reference from an `int` is the fact that it hasa far end: You can often (not always) find other values by indirectingthe reference and loading a field or calling a method or querying asuper type. (Because it has a far end, it also has a nominal subtype toclassify what might be at the far end. But I’m speaking here aboutreferences per se, apart from their subtypes.) Despite their “farend”, people treat some reference types, like `String`, as if theywere leaves; you stop at the `String` and don’t bother thinking aboutits fields. Users don’t care that there’s an array somewhere on theother end, unless they are engineering the string class itself. So areference has a far end, unlike an `int`, but, like an `int`, areference *often* is treated like an unstructured value, in code.

Bottom line: There are a handful of built-in basic types. These areused to compose classes. They are the primitives and the references.When we consider a reference apart from its class (say, as `jl.Object`),it can be comfortably called a *basic type*, and then that handful ofbuilt-in basic types consists of the (basic) primitives and references.

OK, that’s enough on that. Whether “reference” is a basic type isless important than how we choose to extend (or not extend) the reach ofthe term “primitive”.

For historic reasons we use the word ~~fruit~~ *primitive* to mean abasic type other than a reference. Now that we have user-defined`int`-like things, we have to decide whether and how to connect the oldword to the new things. Since user-defined `int`-like things are (wethink) very like `int` in many ways, a term like “extendedprimitive” makes sense.

This is how I get to the terms “basic primitive” and “extendedprimitive”. Or “scalar primitive” and “extended primitive”.

As I read your messages, you would prefer to keep the term“primitive” narrow, because of the possible confusion of tellingusers “hey, what you think of as primitives are now the ~~heirloom~~basic primitives.” Personally, I think users will say, to ourunveiling “extended primitives”, something like this:

Well, that’s not exactly what the dictionary says primitive means,if you can make new composite ones. But I do know that Java hasnon-reference types and calls them “primitive”. And I also knowit would be really cool to define new types that work like `int`,such as `UnsignedInt` or `HalfFloat` or the like. I get why theydon’t want to build all such types into the language; in fact maybeI’d like to try my hand someday at defining my own. So,“extended primitive”. It’s on: The Java primitives are now anopen-ended set just like the Java objects.

In other words, in saying “extended primitive” (and also “basicprimitive”) we lean away from the dictionary definition of“primitive” and into the Java definition. That feels like anon-confusing choice to me.


Definitely, our trying to minimize their specialness is virtuous.


Yep.  We also call this “healing the rift”, sometimes.

…
So we have to attempt to shift users' understanding of "primitive"while atthe same time injecting a new term to mean exactly what primitive usedtomean. That's the old Indiana Jones switch and I don't have to tell youhow
that turned out for him.

So, no, it’s not the Indy switch, at all. Users know what ~~fruit~~primitives are in Java, and they will have no problem with adding new~~imported exotic apples~~ extended primitive to the familiar set ofprimitive types. And in exchange for this infusion of wonderful newtypes, they will learn a new term for the old types, which is ~~pears~~basic primitives (or scalar primitives).

It would be difficult to pull off in a world where we were justpushingsome new server and the whole world gets the new model at once. But inthisuniverse where every version of Java ever made all have to coexist,it's
looking to me like a guaranteed source of never-ending confusion.
I also think it robs us of our ability to smoothly portray the realchangesof Valhalla. We want to be able to say "elements are still elements!now we
have molecules too".

There are two kinds of users w.r.t. the question of “what’s aprimitive” and you can’t please both. You and I want to pleasedifferent kinds. The user I want to please is one who thinks of “Javaprimitive” as a kind of non-nullable scalar number (or boolean orchar). The user you want to please thinks of “Java primitive” as“all leaves in the Big Graph”. The latter user will be disappointedif we say “Java primitives” can be non-leaves. The former user willbe delighted. The latter user sees a `String` and wants to crack outits underlying array, in a Gollum-like quest for the roots of themountains. The latter user treats a `String` as a primitive. There aremore of the former than the latter; we should cater to them. It’s theformer who I was channeling above, concluding with “The Javaprimitives are now an open-ended set just like the Java objects.”

Pedagogically that is always preferable to "elements
aren't really what you thought they were". Okay, the real comparisonis a
little more nuanced than that, but I'll get to that now.
An alternative that seems to work fine, in my mental model at least,is:
- Primitive types are examples of value types, and have alwaysbeen.
   - Java never supported any other kinds of value types before, so we
   didn't distinguish the terms before.
   - Everything you associate with primitive types remains true.
   - But most of those traits really come from their value-type-ness.

(I plan to make the above shifts to my model document already.)

The term “value” can be applied to composites in B3 alone, tocomposites in B2 alone, or to both. (Or neither.) All the basic types,including references, are values as well.


This is big choice, where to “spend” the term “value”.

Our choice will be informed and supported by our account about what *wemean* by the term “value”.

If the word value means “a primitive thing that can be stored in aregister”, then we can’t extend it. So that won’t fly.

For us the word value means something like that but adjusted, “a thingthat is freely copyable and can be stored in one or more registers”.


But look how that affects B2 and B3:

B3 are values, obviously; there is no reference to confuse their freecopying. (There is also no reference to help us adjoin `null` to thevalue set, and no reference to help us perform safe publication.)

B2 are references to… well, values as well. They might be on theheap, or they might be elsewhere; we don’t care because the freelycopyable values are not also accompanied by object identity.

Both B1 and B2 *references* (per se) are, confusingly, also values,since basic types (and/or references) are freely copyable.

But a B2 reference is a value, which refers to another value. (Proofthey are distinct values: One is possibly null, the other isn’t.)And like a user using `String`, the value-ness of a B2 reference can betreated as a single, simple, atomic thing, without further reference tosubstructure. In particular, because it’s not B1, there’s nopossibility of state under the B2 reference; there’s just the valueyou care about.

I think, because the term value applies in so many places (including B1references), it will be tricky to use it as a classification (like“pear”) instead of an assertion of use (like “fruit”).

But given the choice between using the term “value” to classifytypes, distinguishing them from B1 types, I think the correct choice isto apply the term to B2, as “value object” vs. “identityobject”.

The value-ness of B3 (as loose aggregates) and B1 (as references) isgoing to add a bit of confusion. Dan did a round of naming where heused the term “pure object” as the opposite of “identityobject”; now we are at “value object” vs. “identity object”, Ithink.

   - Now we have user-defined value types too.
   - The way we user-define a type is with a class, so a value type is
   defined by a "value class" (sorry B2).
   - The primitive types will now each get a value class.
- These 8 classes will look as much like user-defined types asObject
   does.
- They, like Object, will have a "cheat" in their source code thatnoone else gets to use. (Object's is that there is no implied`extendsObject` or `super();`; these need no fields because the data theystore is
   magically handled by the VM. These feel like similar cheats.)

I don’t disagree with any of the above, but I think the value classeslive in B2 not in B3. The B3 types are derived from the B2 types, by“dumping out” the class fields. Note that every single B3 type(non-reference) has a unique companion B2 type (reference). Thesemantic difference between those types is like the semantic differencebetween `int` and `Integer`. Narrow but useful.

Separate question: Does the declaring form for a B3/B2 type pair“look like” a B2-only declaration, but with an added mode switch?Or does it “look like” a B3-declaration, something that’s not afull-on class-that-defines-objects? We could go either way on that.Either way, one declaration will define two related types.


Suppose we have this B2-only class declaration syntax:

```
__ByValue class NamedInt { String name; int value; … }
```

Then a B2-tilted syntax for a B3/B2 pair might look like:

```
__ByValue __AlsoPrimitive class Point { double x, y; … }
```

And a B3-tilted syntax for the same pair might look like:

```
__ExtendedPrimitive Point { double x, y; … }
```

(F.D.: I think the B3-tilted syntax is less likely to succeed.)

Either way, you can draw out a B3 type from the first and a B2 type fromthe second.

As a sort of mental experiment, you can also imagine a “two headed”declaration syntax that would provide independent specification of thenames of both types:


```
__PrimitiveType int &  /*int is B3*/
__PrimitiveBox class Integer /*int.ref=Integer is B2*/
    extends Comparable<Integer> {
  … one body with two heads …
}
```

Why do that? Well, it makes it clear that a one-headed declarationcould in principle start with either the B3 or the B2 end of the stick.Also it helps us think, a little, about retrofitting the very odd legacywrapper names.

Then mopping up the rest:
- Existing classes probably need a term like "reference classes"(in themodel I'm going to circulate that doubles down onvalues-are-not-objects,then this wants to be "object classes", even though that feelsweird at
   first).
   - I think the term for bucket 2 classes really ought to center on
identitylessness, e.g. "noid", "noident", "idfree", or something.Anythingelse is getting away from the essential meaning of the bucket;plus, we
   want people to call bucket 1 classes "identity classes", don't we?

If we spend the good word “value” on B3, we must then find a wordlike “noid” for B2. But since I think “value-ness” is centeredin B2 from the start, I’d rather find a one-off term for B3! (Andthat’s “primitive” as argued above.)

But let’s grant, for a moment, that we don’t want “value” forB2. What term characterizes B2 types? As you say, they are objects butthey don’t have identity, so “noid”, etc. That’s a truedescription. But it’s not the main point of B2 types. The point ofB2 types is not that we dislike object identity (we like it a lot inmany cases!). The point of B2 types is they can be regarded as tidybundles of field values, and/or tidy abstractions (like `String`) ofsimple values, without confounding state changes. After looking at thisfrom many angles, I prefer to say that, while B2 has the *negative*characteristic of being identity-free, it has the *positive*characteristic of being *freely copyable*. The “freely” is so freethat copying often happens outside of the JVM heap. In fact, a B2 typeis a value.

Maybe there’s a different way of characterizing the *positive* natureof B2, but I think it comes down to, “B2 types are plain values”.Until I get an even better account for B2’s special power (one thatdoesn’t begin with the word “not” or “no” or “doesn’t”),I’m going to be very happy to declare B2 types as “value classes”and work with their instances as “value objects”.

So, while I see why you want to avoid the paradox of “extendedprimitives”, and your very correct identification of “values” inB3, I prefer to talk about B3 as primitives (primitive values) and B2 asvalue objects.

BTW, I agree that B3 values should not be objects; maybe we can callthem instances, although instance/class/object are terms that usuallyappear together. Obviously both B1 and B2 containinstances/classes/objects.

BTW again, I updated my own Zoo of Field Types diagram here, and youmight wish to give it a look, since it’s relevant to this discussion:


http://cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf

(that’s cr.openjdk.java.net/~jrose/values/type-kinds-venn.pdf if theURL police got the previous line)

Footnote: for a more concrete manifestation of this problem: I am surewe
cannot possibly get away with Class.isPrimitive() being true for these
classes. Right?

Yeah, `Class::isPrimitive` is a query on types, not classes. In otherwords, the `Class` mirror, for this call, is serving to reflect a type,for example one of `int.class` or `Integer.class`. If we apply the term“primitive” to classes, then we will need a not-so-good name, like`Class::isPrimitiveClass`. However, if we choose to make extendedprimitives reflect very similarly to basic primitives, then we canchoose to have `Class::isPrimitive` to return true *for theirnon-reference types*.

There is no reference type for which `Class::isPrimitive` is true.Despite my fondness for the concept of “basic types” there is no`Class::isBasicType`. There could be, in the future, though I don’tthink it pulls its weight. We could also have`Class::isBasicPrimitive`. Or we could choose to break less code bykeeping `Class::isPrimitive` true only for nine mirrors, and define`Class::isReferenceType` and/or `Class::isNonReferenceType` to providethe query for ~~fruit~~ basic or extended primitive types.

Re: We have to talk about "primitive".

Reply via email to