minimal value types proposal

John Rose Fri, 26 Aug 2016 22:33:07 -0700

Brian and I have been working on finding a minimal subset
of value-type functionality that will allow current experiments
to move forward.  Here is what we have come up with.


Please let us know what you think.

— John

Link:  http://cr.openjdk.java.net/~jrose/values/shady-values.html

#### MARKDOWN SOURCE ####

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";><!--*-markdown-*-->
<html xmlns="http://www.w3.org/1999/xhtml";>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Minimal Value Types (Shady Edition)</title>
<style type="text/css">
  body          { font-family: Times New Roman; font-size: 16px;
                  line-height: 125%; width: 36em; margin: 2em; }
  code, pre     { font-family: Courier New; }
  blockquote    { font-size: 14px; line-height: 130%; }
  pre           { font-size: 14px; line-height: 120%; }
  h1            { font-size: 24px; }
  h3            { font: inherit; font-weight:bold; }
  pre           { padding: 1ex; background: #eee; width: 40em; }
  h3            { margin: 1.5em 0 0; }
  ol, ul, pre   { margin: 1em 1ex; }
  ul ul, ol ol  { margin: 0; }
  blockquote    { margin: 1em 4ex; }
  p             { margin: .5em 0 .5em 0; }
  h4            { margin: 0; }
  a             { text-decoration: none; }
  div.smaller   { font-size: 85%; }
  span.smaller  { font-size: 95%; }
</style>
</head>
<body>

<!-- This document is in Markdown format:
     http://daringfireball.net/projects/markdown/
     $ pandoc --smart shady-values.md -o shady-values.html
     H/T practicaltypography.com
 -->

# Minimal Value Types

#### August 2016: Shady Edition
<!-- recent changes in this edition:
  (none to speak of yet)
-->

#### John Rose, Brian Goetz

_"What we do in the shadows reveals our true values."_

## Background

In the two years since the [first public proposal [values]][values]
there have been [vigorous discussions [valhalla-dev]][valhalla-dev] of
how to get there, and what specific changes to make to the JVM and its
classfile format, in order to unify primitives, references, and values
in a common platform that supports efficient generic, object-oriented
programming.

Much of the discussion has [concentrated on generic specialization
[goetz-jvmls15]][goetz-jvmls15], as a way of implementing full
parametric polymorphism in Java and the JVM.  This concentration has
been intentional and fruitful, since it exposes all the ways in which
primitives fail to align sufficiently with references, and forces us
to expand the bytecode model.  After solving for `List<int>`, it will
be simpler to manage `List<Complex<int>>`.

Other discussions have concentrated on [details of value semantics
[valsem-0411]][valsem-0411] and specific tactics [implementing
[simms-vbcs]][simms-vbcs] new bytecodes which work with values.  A few
experiments have employed value-like APIs to perform useful tasks like
[vectorizing loops [graves-jvmls16]][graves-jvmls16].

Most recently, at the JVM Language Summit (2016), and at the Valhalla
EG meeting that week, we got repeated calls for an early-access
version of value types that would be suitable for vector, Panama, and
GPU experiments.  This document outlines a subset of experimental
value type support in the JVM (and to a smaller degree, language and
libraries), that would be suitable for early adopters.

Looking back, it is reasonable to estimate that there have been many
thousands of engineer-hours devoted to mapping out this complex
future.  Now is the time to take this vision and choose a first
version, a sort of "hello world" system for value types.

The present document proposes a minimized but viable subset of
value-type functionality with the following goals:

  * simple to implement in the HotSpot JVM (the reference implementation)
  * does not constrain future developments for the Java language or VM
  * usable by power-users for early experimentation and prototyping
  * minimum changes to the JVM classfile format
  * use of such changes can be firewalled in experimental areas only

Our non-goals are complementary to the goals:

  * does not support all known-good language constructs for value types
  * does not commit to a Java language syntax or even a bytecode design
  * does not support Java programmers to code value types in Java code
  * does not propose a final bytecode format
  * will not be deployed for general use (not initially, and likely never)

In other words, before releasing our values to the full light of day,
we will prototype with them in the shady area between armchair
speculation and public specification.  Such a prototype, though
limited, is far from useless.  It will allow us to experiment with
various approaches to the design and implementation of value types.
We can also discard approaches as needed!  We can also begin to make
better estimates of performance and usability, as power-users (most of
whom will work closely with the designers and implementors) exercise
various early use cases.

## Features

The specific features of our minimum (but viable) support for value
types can be summarized as follows:

  * A few **value-capable classes** (`Int128`, etc.) from which the VM
    may derive value types.
  * **Descriptor** syntax ("`Q`-types") for describing new value types
    in class-files.
  * Enhanced **constants** in the constant pool, to interoperate with
    these descriptors.
  * **Bytecode instructions** (`vload`, etc.) for moving value types
    between JVM locals and stack.
  * **Reflection** for value types (similar to `int.class`).
  * **Boxing** and unboxing, to represent values (like primitives) in
    terms of Java's universal `Object` type.
  * Method handle **factories** to provide access to value operations
    (member access, etc.)

Standard Java source code, including generic classes and methods, will
be able to refer to values only in their boxed form.  However, both
method handles and specially-generated bytecodes will be able to work
with values in their native, unboxed form.

This work relates to the JVM, not to the language.  Therefore
non-goals include:

  * Syntax for defining or using value types directly from Java code.
  * Specialized generics in Java code which can store or process
    unboxed values (or primitives).
  * Library value types or evolved versions of value-based classes
    like `java.util.Optional`.
  * Access to value types from arbitrary modules.  (Typically,
    value-capable classes will not be exported.)

Given the slogan _"codes like a class, works like an int,"_ which
captures the overall vision for value types, this minimal set will
deliver something more like _"works like an int, if you can catch
one"_.

By limiting the scope of this work, we believe useful experimentation
can be enabled in a production JVM much earlier than if the entire
value-type stack were delivered all at once.

The rest of this document goes into the proposed features in detail.

## Value-capable classes

A class may be marked with a special annotation `@DeriveValueType` (or
perhaps an attribute).  A class with this marking is called
_value-capable_, meaning it has been endowed with a value type, beyond
the class type itself.

> <span class="smaller">
(The details are TBD, but will be similar to the restrictions on
internal annotations like `@Contended` or `@PolymorphicSignature`.)

Example:

    @jvm.internal.value.DeriveValueType
    public final class DoubleComplex {
      public final double re, im;
      private DoubleComplex(double re, double im) {
        this.re = re; this.im = im;
      }
      ... // toString/equals/hashCode, accessors, math functions, etc.
    }

The semantics of the marked class will be the same as if the annotation
were not present.  But, the annotation will enable the JVM, _in addition_,
to consider the marked class as a source for an _associated value type_.

As with the full value type proposal, the value-capable class may
define fields and methods and implement interfaces.  The fields and
methods will be available directly on both boxed and unboxed values.

> <span class="smaller">
(Until there are bytecodes for directly accessing members of unboxed
value, method handles will be available for the purpose, as they are for
members of regular objects.  See below.)

The super-class of a value-capable class must be `Object`, and even
that should be omitted in the source code of the class.

A class marked as value-capable must in qualify as [value-based][],
because its instances will serve as boxes for values of the associated
value type.  In particular, the class, and all its fields, must be
marked `final`, and constructors must be private.

A class marked as value-capable must not use any of the methods
provided on `Object` on any instance of itself, since that would
produce indeterminate results on a boxed version of a value.  The
`equals`, `hashCode`, and `toString` methods must be replaced
completely, with no call via `super` to `Object` methods.

As an exception, the `getClass` method may be used freely; it behaves
as if it were replaced in the value-capable class by a
constant-returning method.

The other object methods (`clone`, `finalize`, `wait`, `notify`,
and `notifyAll`) may not be used, and will not be visible on the
forthcoming value type derived from the value-capable class.

Here is a larger example for a "super-long":

    final class Int128 extends Comparable<Int128> {
      private final long x0, x1;
      private Int128(long x0, long x1) { ... }
      public static Int128 zero() { ... }
      public static Int128 from(int x) { ... }
      public static Int128 from(long x) { ... }
      public static Int128 from(long hi, long lo) { ... }
      public static long high(Int128 i) { ... }
      public static long low(Int128 i) { ... }
      // possibly array input/output methods
      public static boolean equals(Int128 a, Int128 b) { ... }
      public static int hashCode(Int128 a) { ... }
      public static String toString(Int128 a) { ... }
      public static Int128 plus(Int128 a, Int128 b) { ... }
      public static Int128 minus(Int128 a, Int128 b) { ... }
      // more arithmetic ops, bit-shift ops
      public int compareTo(Int128 i) { ... }
      public boolean equals(Int128 i) { ... }
      public int hashCode() { ... }
      public boolean equals(Object x) { ... }
      public String toString() { ... }
    }

[Similar types [Long2.java]][Long2.java] have been used in a loop
vectorization prototype.  This example has been defined in a prototype
version of the `java.lang` package.  But value-capable types defined
as part of this minimal proposal will _not_ appear in any standard
API.  Instead, at first, they will be segregated somewhere hard to
reach, in a package like `jdk.experimental.value`.

> <span class="smaller">
Initial value-capable classes are likely to be extensions of numeric
types like `long`.  As such they should have a standard and consistent
set of arithmetic and bitwise operations.  There is no such set
codified at present, and creating one is beyond the scope of the
minimal set.  Eventually we will need to create a set of interfaces
that captures the common operation structure between numeric
primitives and numeric values.

### Scoping of these features

A crucial part of being able to provide an experimental release is the
ability to mark features as experimental and subject to change.  While
the ideas expressed in this document are reasonably well baked, it is
entirely foreseeable that they might change between an experimental
release and a full Valhalla release.

Within a single version of the JVM, the experimental features are
further restricted to classes loaded into the JVM's initial module
layer, or a module selected by a command line option, and is otherwise
ignored.  These modules are called _value-capable modules_.

In addition, the class-file format features may be enabled only
in class files of a given major and minor version, such as 53.1.
In that case, the JVM class loader would ensure that classes of
that version were loaded only into value-capable modules, and
then consult only the version number when validating and loading
the experimental extended features proposed here.  It is possible
that some minor versions will be used _only_ for experimental
features, and _never_ appear in production specifications.

Any use of any part of any feature of this prototype must originate
from a class in a value-capable module.  The JVM is free to detect and
reject attempts from non-value-capable modules.  Annotations like
`@DeriveValueType` may be silently ignored.

However, a prototype implementation of this specification may omit
checks for such usage, and seem to work (or at least, fail to throw a
suitable error).  Any such non-rejection would be a bug, not an
invitation.

## Value descriptors

In value-capable modules, the class-file descriptor language is
extended to include so-called _Q-types_, which directly denote unboxed
value types.  The descriptor syntax is "`Q`_InternalName_`;`", where
_InternalName_ is the internal form of a class name.  (The internal
form substitutes slashes for dots.)  The class name must be that of a
value-capable class.

By comparison, a standard reference type descriptor may be called an
_L-type_.  For a value-capable class _C_, we may speak of both the
Q-type and the L-type of _C_.  Note that usage of L-types is not
correlated in any way with usage of Q-types.  For example, they can
appear together in method types, in arbitrary mixtures.

A Q-type descriptor may appear as the type of a class field defined in
a value-capable module.  But the same descriptor may not appear in a
field reference (`CONSTANT_Fieldref`) for that field (even in a
value-capable module).  Thus, the `getfield` family of instructions
does not enter into the implementation of this proposal.

> <span class="smaller">
(Method handle factories, described below, will support field loads
and updates.)

A Q-type descriptor may appear as an array element type in a class of
a value-capable module.  (Again, this is only in a value-capable
module, and probably in a specific experimental class-file version.
Let's stop repeating this, since the limitation has already
been set down as a blanket statement.)  There are no bytecodes for
creating, reading, or writing such arrays, but the prototype makes
method handles available for these functions.

A field or array of a Q-type is initialized to the _default value_ of
that value type, rather than null.  This default value is defined
(at least for now) as a value all of whose fields are themselves of
default value.  Such a default may be obtained from a suitable method
handle, such as the `MethodHandles.empty` combinator.

A Q-type descriptor may appear as the parameter or return type of a
method defined in a class file.  As described below, the verifier
enforces the corresponding stacked value for such a parameter or
return value to match the Q-type (not the corresponding L-type or any
other type).

Any method reference (a constant tagged `CONSTANT_Methodref` or
`CONSTANT_InterfaceMethodref`) may mention Q-types in its descriptor.
After resolution of such a constant, the definition
of such a method may not be native, and must use new bytecodes to
work directly with the Q-typed values.

Likewise, a `CONSTANT_Fieldref` constant may mention a Q-type in its
descriptor.

Note that the Java language does not provide any direct way to mention
Q-types in class files.  However, bytecode generators may mention such
types and work with them.  It is also likely that work in the Valhalla
project will create experimental language features to allow source
code to work with Q-types.

## Enhanced constants

Since our value types will have names and members like reference
types, but are distinct from all reference types, it is necessary to
extend some constant pool structures to interoperate with Q-types.

Naturally, as a result of extending descriptor syntax, method and
field descriptors can mention Q-types.  Doing this requires no
additional format changes in the constant pool.

However, some occurrences of types in the constant pool mention "raw"
class names, without the normal descriptor envelope characters (`L`
before and `;` after).  Specifically, a `CONSTANT_Class` constant
refers to such a raw class name, and is defined to produce (at
present) an L-type with no provision for requesting the corresponding
Q-type.  What is a class-file to do if it needs to mention the Q-type?

There is a simple answer: Pick a character which is illegal as a
prefix to class names, and use it as an escape prefix within the UTF8
string argument to a `CONSTANT_Class` constant.  If the escape prefix
is present, the rest of the UTF8 string is a descriptor, not a class
name.

In order to preserve normalization of names, UTF8 strings for
`CONSTANT_Class` constants may not begin with "`;L`" or "`;[`".

> <span class="smaller">
(To avoid confusion between current forms of class names and these
additional forms, there will only be one way to express any particular
type as a `CONSTANT_Class` string.  Therefore, the descriptor itself
may not begin with `L` or `[`, since type names that begin with those
descriptors are already expressible, today, as "raw" class names in a
`CONSTANT_Class` constant.  Otherwise, `Class[";[Lfoo;"]` and
`Class["[Lfoo;"]` would mean the same thing, which is surely
confusing.)

This minimal prototype adopts this answer, using semicolon `;` (ASCII
decimal code 59) as the escape character.  Thus, the types `int.class`
and `void.class` may now be obtained class-file constants with the
UTF8 strings "`;I`" and "`;V`".  The choice of semicolon is natural here,
since a class name cannot contain a semicolon (unless it is an array
type), and descriptor syntax is often found following semicolons in
class files.

> <span class="smaller">
(Alternatively, we could repurpose `CONSTANT_MethodType` to resolve to
a `Class` value, since it already takes a descriptor argument, albeit
a method descriptor.  But this seems more disruptive than extending
`CONSTANT_Class`.)

The L-type and Q-type for the example `Int128` can now be expressed as
twin `CONSTANT_Class` constants, with UTF8 strings like "`pkg/Int128`"
and "`;Qpkg/Int128;`" (where `pkg` is something like
`jdk/experimental/value`).

When used with the `ldc` or `ldc_w` bytecodes, or as a bootstrap
method static argument, a `CONSTANT_Class` beginning with an escaped
descriptor resolves to the `Class` object for the given type (which,
apart from a Q-type, must be a primitive type or `void`).  The
resolution process is similar to that applied to the descriptor
components of a `CONSTANT_MethodType` constant.

When used as the class component of a `CONSTANT_Methodref` or
`CONSTANT_Fieldref` constant, a `CONSTANT_Class` for a Q-type implies
that the receiver will be a Q-type instead of the normal L-type.
Eventually there may be bytecodes which use such member references
directly.  (These may be some `vinvoke`, `vgetfield`, or just an
overloading on `invokespecial` and `getfield`)  For now, as noted
below, such member references are limited to the specification of
`CONSTANT_MethodHandle` constants.

### Resolution of method constants

When resolving a `CONSTANT_Methodref` against a Q-type, none of the
methods of `java.lang.Object` may appear; the JVM or method handle
runtime may require special filtering logic to enforce this.

As an exception, the `Object.getClass` method may be permitted, but it
must return the corresponding L-type, as a constant.

> <span class="smaller">
There does not yet appear to be any advantage to customizing the
`getClass` method on a Q-type to return the Q-type itself, and
the dangers of confusion are significant.

In the full value-type design, a Q-type must inherit `default` methods
from its interface supertypes.  This is a key form of
interoperabilility between values and generic algorithms and data
structures (like sorting and `TreeMap`).  Making this work in the
minimal version requires boxing the value and running the default
method on the box.  Further steps are necessary but not part of this
minimal design: The execution of default methods must be optimized to
each specific value type.  Also, there must a framework for ensuring
that the interface methods themselves are appropriate to value-based
types (no nulls or synchronization, limited `==`, etc.).

### JVM changes to support Q-types

Q-types, like other type descriptor types, can be mentioned in many
places.  The basic list is:

  * method and field definitions (UTF8 references in the `method_info`
    and `field_info` structures)
  * method and field symbolic references (a UTF8 component of
    `CONSTANT_NameAndType`)
  * type names (UTF8 references in `CONSTANT_Class` constants)
  * array component types (after left bracket `[`) in any descriptor
  * types in verifier stack maps (via `CONSTANT_Class` references)
  * an operand (a `CONSTANT_Class`) of some bytecodes (described below)

The JVM might use invisible boxing of Q-types to simplify the
prototyping of many execution paths.  This of course works against a
key value proposition of values, the flattening of data in the heap.
In fact, the minimal model requires special processing of Q-types in
array elements and object (or value) fields, at least enough special
processing to initialize such fields to the default value of the
Q-type, which is not (and cannot be) the default `null` of an L-type.

So when the class loader loads an object whose fields are Q-types, it
must resolve (and perhaps load) the classes of those Q-types, and
inquire enough information about the Q-type definition to lay out the
new class which contains the Q-type field.  This information includes
at least the size of the type, and may eventually include alignment and
a map of managed references contained in the Q-type.

> <span class="smaller">
(The minimal model will probably not support putting references in
value-types, in order to simplify connections to the GC.  But object
references stored in values are just as necessary to the final design
as values in objects.)

Array types must be created whose component type is a Q-type.  They
will differ from arrays of corresponding L-types just as
`Integer[].class` differs from `int[].class`.  Likewise, the
super-type of a value-bearing array will (like a `int[]`) be `Object`
only, and not a different array type.  Such arrays will not convert
any other array type, and must be manipulated by explicitly obtained
method handles.

> <span class="smaller">
(In the minimal model, we will not attempt to make value-bearing
arrays inherit from interfaces implemented by the value types.
Although it seems desirable, further work on JVM type structure is
needed to make this happen.  Interface types are firmly in the L-type
camp, at present, and interface arrays are arrays of references.)

## Value bytecodes

The following new bytecode instructions are added:

  * `vload` pushes a value (a Q-type) from a local onto the stack.
  * `vstore` pops a value (a Q-type) from the stack into a local.
  * `vreturn` pops a value (a Q-type) from the stack and returns it
    from the current method.

> <span class="smaller">
(N.B. These are macro-instructions, encoded with a prefix.  Read on.)

Values are stored in single locals, not groups of locals as with
`long` and `double` (which consume pairs of locals).  (The slot
pairing convention for `long` and `double` is likely to go away by the
time specialized generics are introduced.)

The syntax of these instructions uses a _bytecode type prefix_ syntax,
with a bytecode called `typed` analogous to the `wide` bytecode, but
taking a constant pool reference as a parameter.  The type prefix must
be followed by one of the standard bytecodes `aload`, `astore`, or
`areturn`, to compose `vload`, `vstore`, or `vreturn` bytecodes.

> <span class="smaller">
(Although it is most intriguing to think of other uses for bytecode
type prefixes, this proposal defines only these three specific usages.
In addition, more code points may be allocated, either to represent
`vload`, etc., more directly, or to perform other operations.
In addition, if a bytecode instruction can incorporate a type
prefix, it has considerably more use cases than just Q-types.
Such "universal instructions" may be though of in terms of names
like `uload` instead of `vload` or legacy codes like `iload` or `aload`.
It may be possible to retire or repurpose the existing data movement
bytecodes with a more general type model.  More experiments are inevitable!)

The code point for `typed` is decimal 212 (hex 0xd4), just as the code
point for `wide` is decimal 196 (hex 0xc4).  Every `typed` bytecode is
followed immediately by a two-byte reference into the constant pool.

The referenced constant must be of type `CONSTANT_Class` (not
`CONSTANT_Utf8` as for "naked" descriptors).  The class constant must
be for a Q-type (other types may be allowed in the future).
Thus, its UTF string _must_ be of the form "`;Q`_InternalName_`;`".

The first use of such a prefix resolves the given class constant to
the corresponding Q-type.  This process ensures that it in fact the
underlying class is value-capable.  As usual, a `LinkageError` is
thrown if this resolution process fails.

> <span class="smaller">
(A resolution step is not appropriate for `CONSTANT_Utf8` constants in
some JVM implementations such as HotSpot, which is why the prefix
cannot refer directly to a UTF8 constant.  If there were a
`CONSTANT_Descriptor` constant we would use that, but `CONSTANT_Class`
is close enough.  This encoding requires that `CONSTANT_Class`
constants be enhanced to resolve to types other than L-types, which is
a separate part of this proposal.)

The JVM may use Q-type resolution to acquire information about the
Q-type's size and alignment requirements, so as to properly "pack" it
into the interpreter stack frame.  Or the JVM may simply use boxed
representations (L-types) internally and ignore sizing information.

Initially, the only valid use of a Q-type as the class component of a
`CONSTANT_Methodref` is as a `CONSTANT_MethodHandle` constant.

In the minimal prototype, the receiver of an `invokevirtual` or
`invokeinterface` instruction may _not_ be a Q-type, even though the
constant pool structure can express this (by referring to a Q-type as
the class component of a `CONSTANT_Methodref`).  Method handles and
`invokedynamic` will always allow bytecode to invoke methods on
Q-types, and this is sufficient for a start.  Such a method handle may
in fact internally box up the Q-type and run the corresponding L-type
method, but this is a tactic that can be improved and optimized in
Java support libraries, without pervasive cuts to the interpreter.

### Verifier interactions

When setting up the entry state for a method, if a Q-type appears in
the method's argument descriptors, the verifier notes that the Q-type
(not the L-type!) is present in the corresponding local at entry.

When returning from a method, if the method return type is a Q-type,
the same Q-type must be present at the top of the stack.

When performing an invocation (in any mode), the stack must contain
matching Q-types at the positions corresponding to any Q-types in the
argument descriptors of the method reference.  After the invocation,
if the return type descriptor was a Q-type, the stack will be found
to contain that Q-type at the top.

As with the primitive types `int` and `float`, a Q-type will not
convert to any other verification type than itself, or the
verification super-types `oneWord` or `top`.  This affects matching of
values at method calls, and also at control flow merge points.
Q-types do not convert to L-types, not even their boxes or the
supertypes (`Object`, interfaces) of their L-types.

### Q-types and bytecodes

Bytecodes which interact with Q-types are only these:

  * `typed` (operand is a class which _must_ be a Q-type)
  * any bytecode validly prefixed by `typed`: `areturn`, `aload`,
    `astore`, and slot-specific variants)
  * all invocation bytecodes: any argument or return value may be a
    Q-type; the receiver (class component of `Methodref`) may not,
    not even for static members
  * `ldc` and `ldc_w` (of a Q-type, or perhaps a dynamically generated
    constant)

Many existing bytecodes take operands which are constant pool
references, any of which might directly or indirectly refer to a
Q-type.  Unless specified otherwise, these bytecodes will reject
occurrences of Q-types.  They include:

  * `getfield` and its variants (use accessor method handles instead)
  * `aaload` and its variants (use accessor method handles instead)
  * `new`, `anewarray`, `multianewarray` (use factory method handles instead)
  * `checkcast`, `instanceof` (Q-types like primitives do not exhibit
    polymorphism)

In a fuller implementation of value types, some of these (but not all)
are candidates for interoperation with Q-types.

## Value type reflection

The public, all-static class `jdk.experimental.value.ValueTypeSupport`
(in an internal module) will contain all methods of the runtime
support for values in this initial prototype.

`ValueTypeSupport` will contain the following public member class with
public methods for reflecting Q-types:

    static class ValueType<T> {
      static boolean classHasValueType(Class<T> x);
      static ValueType<T> forClass(Class<T> x);
      Class<T> valueClass();
      Class<T> boxClass();
    }

The predicate `classHasValueType` is true if the argument represents
either a Q-type or (the L-type of) a value-capable class.  The factory
`forClass` returns the Q-type for the L-type of a value-capable class.
(If given a Q-type class, it returns it directly.  If given any other
type, it throws `IllegalArgumentException`; users might want to test
with `classHasValueType` first to avoid the exception.)

The two accessors `valueClass` and `boxClass` return distinct
`java.lang.Class` objects for the Q-type and the original
(value-capable) L-type, respectively.

> <span class="smaller">
(Note that the original value-capable class does not have special
status with respect to this API; from the point of view of someone
working with value types, it is merely the box class for the value.
Eventually, value types will be directly defined by class files, and
the box type will be derived indirectly.)

The legacy lookup method `Class.forName` will continue to return the
L-type, for reasons of compatibility.  This condition is likely to
persist.  (In the future, the source language construct `T.class` is
likely to produce something more natural to the source code type
assigned to `T`, under the slogan "works like an int".)

The pseudo-class returned from `valueClass` is distinct from (unequal
to) the class returned from `boxClass`, or perhaps originally passed
to `forClass` (e.g., from code which has no other access to Q-types).
This pseudo-class directly reflects the Q-type just as a pseudo-class
like `int.class` or `void.class` directly reflects a primitive type
(or even `void`).

> <span class="smaller">
(Note: The use of pseudo-classes has precedent, with the primitive
pseudo-classes like `int.class`.  But it is not yet clear whether
pseudo-classes for Q-types will be a permanent part of the design.
For now, they are necessary to enable use of existing reflection
mechanisms, such as `MethodType` objects to encode Q-types for
the lookup of method handles.)

The members reflected by a Q-class are identical to those reflected by
the corresponding L-class, except their "declaring class" properties
(e.g., `Method.getDeclaringClass`) refer back to the Q-class instead
of the L-class.

As is normal with reflection, invoking the methods of a Q-class must
work exclusively with boxed forms of the receiver, arguments, return
values, and field values.

Classes for Q-types may appear in reflective APIs wherever primitive
pseudo-types (like `int.class`) can appear.  These APIs include both
core reflection (`Class` and the types in `java.lang.reflect`) and
also the newer APIs in `java.lang.invoke`, such as `MethodType` and
`MethodHandles.Lookup`.  Constant pool constants that work with these
types can refer to Q-types as well as L-types, and the distinctions
are surfaced, reflectively, as suitable choices of `Class` objects
(either box or value).

It is undefined (in this proposal) how or whether legacy wrapper types
(`java.lang.Integer`) or primitive pseudo-types (`int.class`) interact
with the methods of `ValueType`.

> <span class="smaller">
(When pseudo-classes need to be distinguished from normal
`java.lang.Class` objects, we can use the shorthand term "crass",
where the "r" sound suggests that the thing exists only to reify a
distinction necessary at runtime.  The main class is the thing
returned by `Class.forName`, and which represents a class file in 1-1
correspondence; a "crass" is anything else typed as `java.lang.Class`.
A [more principled approach to reflection
[cimadamore-refman]][cimadamore-refman] uses "type mirrors" of a
suitably refined interface type hierarchy.)

You can use the reflective APIs to create and manipulate arrays, load
and store fields, invoke methods, and obtain method handles.

Method handle transforms which change types (such as `asType`) will
support value-type boxing and unboxing just as they can express
primitive boxing and unboxing.  Thus, the following code creates a
method handle which will box a `DoubleComplex` value into an object:

    Class<DoubleComplex> lt = DoubleComplex.class;
    Class<DoubleComplex> qt = ValueType.forClass(lt).valueClass();
    MethodHandle mh = identity(qt).asType(methodType(Object.class, qt));

Of course, the type-converting method `MethodHandle.invoke` will allow
users to work with method handles over Q-types, either in terms of
L-types as supported by the current Java language, or (in suitable
bytecodes) more directly in terms of Q-types.

## Boxed values

As noted before, instances of a value-capable class (which is an
L-type) serve as boxes for values of the corresponding Q-type.  The
various reflective APIs work directly with these boxes.  The method
handle APIs also allow conversion operators to be surfaced as method
handles or applied implicitly for argument conversions.

Since the value-capable class is value-based, it is inappropriate to
synchronize on them, make distinctions on them by means of reference
equality comparisons, attempt to mutate their fields, or attempt to
treat a `null` reference as a point in the domain of the boxed type.

A future JVM _may_ assist in detecting (or even suppressing) some of
these errors, and it may provide additional optimizations in the presence
of such boxes (which do not require a full escape analysis).

However, such assistance or optimization appears to be unnecessary in
this minimal version of the design.  Code which works with Q-types
will, by its very nature, be immune to such bugs, since Q-types are
non-synchronizable, non-mutable, non-nullable, and identity-agnostic.

## Value operator factories

Given the ability to invoke method handles that work with Q-types, all
other semantic features of value types can (temporarily) be accessed
solely through method handles.  These include:

  * Conversion routines (like box/unbox).
  * Obtaining default Q-types.
  * Constructing Q-types.
  * Comparing Q-types.
  * Calling methods defined on Q-types.
  * Reading fields defined in Q-types.
  * Updating fields defined in Q-types.
  * Reading or writing fields (or array elements) whose types are Q-types.
  * Constructing, reading, and writing  arrays of Q-types

The `MethodHandles.Lookup` and `MethodHandles` APIs will work on
Q-types (represented as `Class` objects), and surface methods which
can perform nearly all of these functions.

Pre-existing method handle API points will be adjusted as follows:

  * `MethodType` factory methods will accept `Class` objects
    representing Q-types, just as they accept primitive types today.
  * `invoke`, `asType`, and `explicitCastArguments` will treat
    Q-type/L-type pairs just as they treat primitive/wrapper pairs.
  * `Lookup.in` will allow free conversion (without loss of privilege
    modes) between Q-type/L-type pairs.
  * Non-static lookups in Q-types will produce method handles which
    take leading receiver parameters that are Q-types, not L-types.
  * The `findVirtual` method of `Lookup` will expose all accessible
    non-static methods on a Q-type, if the lookup class is a Q-type.
  * The `findConstructor` method of `Lookup` will expose all accessible
    constructors of the original value-capable class, for both the Q-type
    and the legacy L-type.  The return type of a method handle produced
    by `findConstructor` will be identical with the lookup class, even
    if it is a Q-type.
  * The `identity` method handle factory method will accept Q-types.
  * The `empty`  method handle factory method will accept Q-types,
    producing a method handle that returns the default value of the type.
  * The array-processing method handle factories will accept Q-types,
    producing methods for building, reading, and writing Q-type arrays.
    (These include `arrayConstructor`, `arrayLength`, `arrayElementGetter`,
    and `arrayElementSetter`, plus eventually the var-handle variants.)
  * All method handle transforms will accept method handles that work
    with Q-types, just as they accept primitive types today.

> <span class="smaller">
(Yes, a value type method is obtained with `findVirtual`, despite the
fact that virtuality is not present on a `final` class.  The poorer
alternatives are to co-opt `findSpecial`, or make a new API point
`findDirect` to carry the nice, fine distinction.  Since Java is
already comfortable with the notion of "final virtual" methods, we
will continue with what we have.)

Similarly, core reflection API points will be adjusted:

  * The reflected member objects `java.lang.reflect.Method`, `Field`,
    and `Constructor` may have self-types (`getDeclaringClass`) that
    are Q-types.  Such members are derived from the `Class` objects
    representing Q-types.
  * Any method handles unreflected from these member objects will
    retain the Q-type/L-type distinction on the receiver (except
    of course for static methods or fields), so that the leading
    method handle parameter will correspond to the declaring class.
  * The methods of `java.lang.reflect.Method` will work with Q-types,
    as discussed earlier.  Reflected method types will correctly
    report the distinction between Q-type and their boxes (L-types).
    The invocation method will accept boxed L-types where Q-types
    are required.
  * Likewise, the methods of `java.lang.reflect.Field` will work with
    Q-types.  However, fields of boxed Q-types types may only be read,
    not written.
  * Likewise, the methods of `java.lang.reflect.Constructor` will work
    with Q-types.  The `newInstance` method of a Q-type constructor
    will be reinterpreted as a factory method; the boxed value
    returned will _not_ be guaranteed to be a fresh object.
    (This reinterpretation may be extended later to the L-type
    constructor, since the class is value-based.)
  * The methods of `java.lang.reflect.Array` will accept Q-types as
    component types.

Some care must be taken in the reflection APIs to ensure that Q-types
are not accidentally tied into the subtype/supertype relations of
their corresponding L-types.  No Q-type is a sub-type or super-type of
any other Q-type or any other L-type.  No Q-type is a subtype of
`Object`, and Q-types declare only their own methods (which therefore
never use virtual-dispatch polymorphism).  As an exception, default
methods from interfaces are inherited into Q-types.

As value-based classes, value-capable classes are required to override
all relevant methods from `Object`.  The derived Q-types do _not_ inherit
or respond to the standard methods of `Object`.

The following additional functions do not (_as yet_) fit in the
`MethodHandle` API, and so are placed in the runtime support class
`jdk.experimental.value.ValueTypeSupport`.

`ValueTypeSupport` will contain the following static methods:

    static MethodHandle defaultValueConstant(Class<?> type);
    static MethodHandle substitutabilityTest(Class<?> type);
    static MethodHandle substitutabilityHashCode(Class<?> type);
    static MethodHandle findWither(Lookup lookup, Class<?> refc, String name, 
Class<?> type);

The `defaultValueConstant` method returns a method handle which takes
no arguments and returns a default value of that method handle.  It is
equivalent (but is probably be more efficient than) creating a
one-element array of that value type and loading the result.  This
method may be useful implementing `MethodHandles.empty` and similar
combinators.  (The method may support non-Q-types.  If it does, an
L-type will result in a method handle that returns `null`, not
a box containing the default value.)

The `substitutabilityTest` method returns a method handle which
compares two operands of the given type for substitutability.
Specifically, if the type is a Q-type, fields are compared pairwise
for substitutability, and the result is the logical conjunction of all
the comparisons.  Primitives and references are substitutable if and
only if they compare equal using the appropriate version of the Java
`==` operator, _except_ that floats and doubles are first converted to
their "raw bits" before comparison.  (The method may support
non-Q-types.  If it does, an L-type will be compared using `acmp`
reference comparison, with a possible exception for Q-type boxes.)

Likewise, the `substitutabilityHashCode` method returns a method
handle which accepts a single operand of the given type, and produces
a hash code which is guaranteed to be equal for two values of that
type if they are substitutable for each other, and is likely to be
different otherwise.  (The method may support non-Q-types.  If it
does, an L-type will be hashed used `System.identityHashCode`,
and primitives hashed using their own bit patterns.)

> <span class="smaller">
(It is an open question whether to expand the size of this hash code
to 64 bits.  It will probably be defined, for starters, as a 32-bit
composition of the hash codes of the value type fields, using legacy
hash code values.  The composition of sub-codes will probably use, at
first, a base-31 polynomial, even though that composition technique is
deeply suboptimal.)

The `findWither` method works analogously to `Lookup.findSetter`,
except that the resulting method handle always creates a new value, a
full copy of the old value, except that the specified field is changed
to contain the new value.  Since values have no identity, this is the
only logically possible way to update a field value.

In order to restrict the use of wither primitives, the `refc`
parameter will be checked against the lookup-class; if they are not
the same type (and not a coordinated pair of Q-type and L-type), the
access will fail.  The access restriction may be broadened later.  A
value-type may of course define named wither methods that encapsulate
primitive wither actions.  Eventually, as `withfield` bytecode might
be created to express field update directly, in which case the same
issues of access restriction must be addressed.

> <span class="smaller">
(The name _wither_ method does not mean a way to blight or shrivel
something--certainly a shady activity.  It refers to a naming
convention for methods that perform functional update of record
values.  Asking a complex number `c.withRe(0)` would return a new
pure-imaginary complex number.  By contrast, `c.setRe(0)`, a call to a
_setter_ method, would seem to mutate the complex number, removing any
non-zero real component.  Setter methods are appropriate to mutable
objects, while wither methods are appropriate to values.  Note that a
method can in fact be a getter, setter, or wither method even if it
does not begin with one of those standard words.  The eventual
conventions for value types may well discourage forms like `withRe(0)`
in favor of simply `re(0)`.)

It is likely that these methods in `ValueTypeSupport` will eventually
become virtual methods of `Lookup` itself (if that is the leading
argument), else static methods of `MethodHandles`.

## Future work

This minimal proposal is by nature temporary and provisional.  It
gives a necessary foundation for further work, rather than a final
specification.  Some of the further work will be similarly provisional
in nature, but over time we will build on our successes and learn
from our mistakes, eventually creating a well-designed specification
that can takes its place in the sun.

This present set of features that support value types will be
difficult to work with; this is intentional.  The rest of this
document sketches a few additional features which may enable
experiments not practical or possible in the minimized proposal.

Therefore, this last section may be safely skipped.  Any such features
will be given their own supporting documentation if they are pursued.
It may be of interest, however, to people who have noticed missing
features in the minimal values proposal.

<div class="smaller">

### Denoting Q-types in Java source code

At a minimum, no language changes are needed to work with Q-types.  A
combination of JVM hacks (value-capable classes), annotation-driven
classfile transformations, and direct bytecode generation are enough
to exercise interesting micro-benchmarks.  Method handles supply a
useful alternative to direct bytecode generation, and they will be
made fully capable of working with Q-types (as described below).

Nevertheless, there is nothing like language support.  It is likely
that very early experiments with `javac` will create simple ways to
refer to Q-types and create variables for them, directly in Java code
(subject to contextual restrictions, of course).

In particular, constructors for objects have a very different bytecode
shape than seemingly-equivalent constructors for value types.  (The
syntax for Java object constructors is a perfectly fine notation for
value type constructors, as long as all fields are final.)  It would
be reasonable for javac to take on the burden of byte-compiling both
versions of each constructor of a value-capable class.

Likewise, direct invocation of value type constructors, and direct
access of value type methods and fields, would be convenient to use
from Java source code, even if they had to be compiled to
invokedynamic calls, until bytecode support was completed.

### More constants

Additional enhancements to the constant pool may allow creation of
constants derived from bootstrap methods.  Such features are not in
the scope of present document.  They are described in the OpenJDK RFE
[JDK-8161256][].  This RFE mentions the present enhancement of
`CONSTANT_Class`.

If this RFE is implemented, it may be possible to delay a few of the
steps described in this section, such as using Q-types as receiver
types for `CONSTANT_MethodHandles`.  The key requirement, in any case,
is that invokedynamic instructions be able to refer to a full range of
operations on Q-types, since the invokedyanmic instructions are
standing in as temporarily place-holders for bytecodes we are not yet
implementing.

Independently of user-bootstrapped constants, Q-types in the constant
pool might be carried, most gracefully, by variations on the
`CONSTANT_Class` constant.  Right now, we choose to mangle type
descriptors in `CONSTANT_Class` constants as an easy-to-implement
place-holder, but the final design could introduce new constant pool
types to carry the required distinctions.

For example, `CONSTANT_Class` could be kept as-is, and re-labeled
`CONSTANT_ReferenceType`.  Then, a new `CONSTANT_Type` constant could
support arbitrary descriptors.  (Perhaps it would have other
substructure required by reified generic parameters, but that's
probably yet another kind of constant.)  Or, a `CONSTANT_ValueType`
tag could be introduced for symmetry with `CONSTANT_ReferenceType`,
and some other way could be found for mentioning primitive
pseudo-classes.  (They are useful as parameters to BSMs.)

### Q-replacement within value-capable classes

A value-capable class, compiled from Java source, may have additional
annotations (or perhaps attributes) on selected fields and methods
which cause the introduction of Q-types, as a bytecode-level
transformation when the value-capable class's file is loaded or
compiled.

Two transformations which seem useful may be called _Q-replacement_
and _Q-overloading_.  The first deletes L-types and replaces them by
Q-types, while the second simply copies methods, replacing some or all
of the L-types in their descriptors by corresponding Q-types.  This
set of ideas is tracked as [JDK-8164889][].

An alternative to annotation-driven Q-replacment would be an
experimental language feature allowing Q-types to be mentioned
directly in Java source.  Such experiments are likely to happen
as part of Project Valhalla, and may happen early enough to make
transformation unnecessary.

### More bytecodes

The library method handle `defaultValueConstant` could be replaced by
a new `vnew` bytecode, or by a prefixed `aconst_null` bytecode.

The library method handle `substitutabilityTest` could be replaced by
a new `vcmp` bytecode, or by a prefixed `if_acmpeq` bytecode.

The library method handle `findWither` could be replaced by a new
`vwithfield` bytecode.

The library method handle `findGetter` could be replaced by a suitably
enhanced `getfield` bytecode.

The library method handle `arrayConstructor` could be replaced by a
suitably enhanced `anewarray` or `multianewarray` bytecode.

The library method handle `arrayElementGetter` could be replaced by a
new `vaload` bytecode, or a prefixed `aaload` bytecode.

The library method handle `arrayElementSetter` could be replaced by a
new `vastore` bytecode, or a prefixed `aastore` bytecode.

The library method handle `arrayLength` could be replaced by a
suitably enhanced `arraylength` bytecode.

### Bridge-o-matic

In some cases, supplying Q-replaced API points in classes is just a
matter of providing suitable bridge methods.  Bytecode transformers or
generators can avoid the need to specify the bodies of such bridge
methods if the bridges are (instead of bytecodes) endowed with
suitably organized bootstrap methods.  This set of ideas has many
additional uses, including auto-generation of standard `equals`,
`hashCode`, and `toString` methods.  It is tracked as [JDK-8164891][].

### Heisenboxes

As suggested above, L-types for values are value-based, and some
version of the JVM may attempt to enforce this in various ways, such
as the following:

  * Synchronizing a boxed Q-type value may throw an exception like
    `IllegalMonitorStateException`.
  * Reference comparision (Java operator `==`, or the `acmp`
    instruction) may report "true" on two equivalent boxed Q-type
    values, even if the references previously returned false, or
    "false" when they previously returned "true".  Such variation
    would of course be subject to the logic of substitutability, of the
    underlying Q-types.  Two boxes that were once detected as equal
    references would be permanently substitutable for each other.
  * Attempts to reflectively store values into the fields of boxed
    Q-type values may fail, even after `setAccessible` is called.
  * Attempts to reflectively invoke the constructor for the box may
    fail, even after `setAccessible` is called.

A box whose identity status is uncertain from observation to
observation is called a "heisenbox".  To pursue the analogy, a
reference equality (`==`, `acmp`) observation of `true` for two
heisenboxes "collapses" them into the same object, since they are then
proven fully inter-substitutable, hence their Q-values are equivalent
also.  Two copies of the reference can later decohere, reporting
inequality, despite the continued inter-substitutability of the boxed
values.  The equality predicate could be investigated by wiring it to
a box containing Schr&ouml;dinger's cat, with many puzzling and sad
results...

This set of ideas is tracked as [JDK-8163133][].

</div>

## References

[values]: <http://cr.openjdk.java.net/~jrose/values/values.html>  
[valhalla-dev]: <http://mail.openjdk.java.net/pipermail/valhalla-dev/>  
[goetz-jvmls15]: 
<http://www.oracle.com/technetwork/java/jvmls2015-goetz-2637900.pdf>  
[valsem-0411]: 
<http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-April/000118.html>
  
[simms-vbcs]: 
<http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001981.html>  
[graves-jvmls16]: 
<http://youtu.be/Z2XgO1H6xPM?list=PLX8CzqL3ArzUY6rQAQTwI_jKvqJxrRrP_>  
[value-based]: 
<http://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html>  
[Long2.java]: 
<http://hg.openjdk.java.net/panama/panama/jdk/file/70b3ceb485cf/src/java.base/share/classes/java/lang/Long2.java>
  
[cimadamore-refman]: 
<http://cr.openjdk.java.net/~mcimadamore/reflection-manifesto.html>  
[JDK-8164891]: <http://bugs.openjdk.java.net/browse/JDK-8164891>  
[JDK-8161256]: <http://bugs.openjdk.java.net/browse/JDK-8161256>  
[JDK-8164889]: <http://bugs.openjdk.java.net/browse/JDK-8164889>
[JDK-8163133]: <http://bugs.openjdk.java.net/browse/JDK-8163133>  

\[values]: <http://cr.openjdk.java.net/~jrose/values/values.html>  
\[valhalla-dev]: <http://mail.openjdk.java.net/pipermail/valhalla-dev/>  
\[goetz-jvmls15]: 
<http://www.oracle.com/technetwork/java/jvmls2015-goetz-2637900.pdf>  
\[valsem-0411]: 
<http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-April/000118.html>
  
\[simms-vbcs]: 
<http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001981.html>  
\[graves-jvmls16]: 
<http://youtu.be/Z2XgO1H6xPM?list=PLX8CzqL3ArzUY6rQAQTwI_jKvqJxrRrP_>  
\[value-based]: 
<http://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html>  
\[goetz-jvmls16]: <http://www.youtube.com/watch?v=Tc9vs_HFHVo>  
\[Long2.java]: 
<http://hg.openjdk.java.net/panama/panama/jdk/file/70b3ceb485cf/src/java.base/share/classes/java/lang/Long2.java>
  
\[cimadamore-refman]: 
<http://cr.openjdk.java.net/~mcimadamore/reflection-manifesto.html>  
\[JDK-8164891]: <http://bugs.openjdk.java.net/browse/JDK-8164891>  
\[JDK-8161256]: <http://bugs.openjdk.java.net/browse/JDK-8161256>  
\[JDK-8164889]: <http://bugs.openjdk.java.net/browse/JDK-8164889>  
\[JDK-8163133]: <http://bugs.openjdk.java.net/browse/JDK-8163133>

minimal value types proposal

Reply via email to