When we talk about use cases for Valhalla, we've often considered a very broad
set of class abstractions that represent immutable, identity-free data. JEP 401
mentions varieties of integers and floats, points, dates and times, tuples,
records, subarrays, cursors, etc. However, as shorthand this broad set often
gets reduced to an example like Point or Int128, and these latter examples are
not necessarily representative of all candidate value types.
Specifically, our favorite example classes have a property that doesn't
generalize: they'll happily accept any combination of field values as a valid
instance. (In fact, they're even happy to accept any combination of *bits* of
the appropriate length.) Many candidate primitive classes don't have this
property—the constructors do important validation work, and only certain
combinations of fields are allowed to represent valid instances.
Related areas of concern that we've had on the radar for awhile:
- The "all zeros is your default value" strategy forces an all-zero instance
into the class's value set, even if that doesn't make sense for the class. Many
candidate classes have no reasonable default at all, leading naturally to wish
for "null is your default value" (or other, more exotic, strategies involving
revisiting the idea that every type has a default value). We've provided
'P.ref' for those use sites that *need* null, but haven't provided a complete
story for value types that want it to be *their* default value, too.
- Non-atomic heap updates can be used to create new instances that arbitrary
combine previously-validated instances' fields. There is no guarantee that the
new combination of fields is semantically valid. Again, while there's precedent
for this with 'double' and 'long' (JLS 17.7), those are special cases that
don't generalize—any combination of double bit fields is *still a valid
double*. (This is usually described as "tearing", although JLS 17.6 has
something else in mind when it uses that word...) The language provides
'volatile' as a use-site opt-in to atomicity, and we've toyed with a
declaration-site opt-in as well. But object integrity being "off" by default
may not be ideal.
- Existing class types like LocalDate are both nullable and atomic. These are
useful properties to preserve during migration; nullability, in particular, is
essential for source compatibility. We've provided reference-default
declarations as a mechanism to make reference types (which have these
properties) the default, with 'P.val' as an opt-in to value types. But in doing
so we take away the many benefits of value types by default, and force new code
to work with the "bad name".
While we can provide enough knobs to accommodate all of these special cases,
we're left with a complex user model which asks class authors to make n
different choices they may not immediately grasp the consequences of, and class
users to keep 2^n different categories straight in their heads.
As an alternative, we've been exploring whether a simpler model is workable. It
is becoming clear that there are (at least) two clusters of uses for value
types. The "classic" value types are like numerics -- they'll happily accept
any combination of field values as a valid instance, and the zero value is a
sensible (often the best possible) default value. They make relatively little
use of encapsulation. These are the ones that best "work like an int." The
"encapsulated" value types are those that are more like typical aggregates
("codes like a class") -- their constructors do important validation work, and
only certain combinations of fields are allowed to represent valid instances.
These are more likely to not have valid zero values (and hence want to be
nullable).
Some questions to consider for this approach:
- How do we group features into clusters so that they meet the sweet spot of
user expectations and use cases while minimizing complexity? Is two clusters
the right number? Is two already too many? (And what do we call them? What
keywords best convey the intended intuitions?)
- If there are knobs within the clusters, what are the right defaults? E.g.,
should atomicity be opt-in or opt-out?
- What are the performance costs (or, in the other direction, performance
gains) associated with each feature? For certain feature combinations, have we
canceled out the performance gains over identity classes (and at that point, is
that combination even worth supporting?)