----- Original Message ----- > From: "Dan Heidinga" <heidi...@redhat.com> > To: "daniel smith" <daniel.sm...@oracle.com> > Cc: "valhalla-spec-experts" <valhalla-spec-experts@openjdk.java.net> > Sent: Thursday, February 24, 2022 4:39:52 PM > Subject: Re: Evolving instance creation
> Repeating what I said in the EG meeting: > > * "new" carries the mental model of allocating space. For identity > objects, that's on the heap. For values, that may just be stack space > / registers. But it indicates that some kind of allocation / demand > for new storage has occurred. > > * It's important that "new" returns a unique instance. That invariant > has existed since Java's inception and we should be careful about > breaking it. In the case of values, two identical values can't be > differentiated so I think we're safe to say they are unique but > indistinguishable as no user program can differentiate them. Yes, it's more about == being different than "new" being different. "new" always creates a new instance but in case of value types, == does not allow us see if the instance are different or not. > > The rest of this is more of a language design question than a VM one. > The `Foo()` (without a new) is a good starting point for a canonical > factory model. The challenge will be in expressing the difference > between the factory method and the constructor as they need to be > distinct items in the source (different invariants, different return > values, etc) > > --Dan Rémi > > On Tue, Feb 22, 2022 at 4:17 PM Dan Smith <daniel.sm...@oracle.com> wrote: >> >> One of the longstanding properties of class instance creation expressions >> ('new >> Foo()') is that the instance being produced is unique—that is, not '==' to >> any >> previously-created instance. >> >> Value classes will disrupt this invariant, because it's possible to "create" >> an >> instance of a value class that already exists: >> >> new Point(1, 2) == new Point(1, 2) // always true >> >> A related, possibly-overlapping new Java feature idea (not concretely >> proposed, >> but something the language might want in the future) is the declaration of >> canonical factory methods in a class, which intentionally *don't* promise >> unique instances (for example, they might implement interning). These >> factories >> would be like constructors in that they wouldn't have a unique method name, >> but >> otherwise would behave like ad hoc static factory methods—take some >> arguments, >> use them to create/locate an appropriate instance, return it. >> >> I want to focus here on the usage of class instance creation expressions, and >> how to approach changes to their semantics. This involves balancing the needs >> of programmers who depend on the unique instance invariant with those who >> don't >> care and would prefer fewer knobs/less complexity. >> >> Here are three approaches that I could imagine pursuing: >> >> (1) Value classes are a special case for 'new Foo()' >> >> This is the plan of record: the unique instance invariant continues to hold >> for >> 'new Foo()' where Foo is an identity class, but if Foo is a value class, you >> might get an existing instance. >> >> In bytecode, the translation of 'new Foo()' depends on the kind of class (as >> determined at compile time). Identity class creation continues to be >> implemented via 'new Foo; dup; invokespecial Foo.<init>()V'. Value class >> creation occurs via 'invokestatic Foo.<newvalue>()LFoo;' (method name >> bikeshedding tk). There is no compatibility between the two (e.g., if an >> identity class becomes a value class). >> >> In a way, it shouldn't be surprising that a value class doesn't guarantee >> unique >> instances, because uniqueness is closely tied to identity. So special-casing >> 'new Foo()' isn't that different from special-casing Object.equals'—in the >> absence of identity, we'll do something reasonable, but not quite the same. >> >> Factories don't enter into this story at all. If we end up having unnamed >> factories in the future, they will be declared and invoked with a separate >> syntax, and will be declarable both by identity classes and value classes. >> (Value class factories don't seem particularly compelling, but they could, >> say, >> be used to smooth migration, like 'Integer.valueOf'.) >> >> Biggest concerns: for now, it can be surprising that 'new' doesn't always >> give >> you a unique instance. In a future with factories, navigating between the >> 'new' >> syntax and the factory invocation syntax may be burdensome, with style wars >> about which approach is better. >> >> (2) 'new Foo()' as a general-purpose creation tool >> >> In this approach, 'new Foo()' is the use-site syntax for *both* factory and >> constructor invocation. Factories and constructors live in the same overload >> resolution "namespace", and all will be considered by the use site. >> >> In bytecode, the preferred translation of 'new Foo()' is 'invokestatic >> Foo.<new>()LFoo;'. Note that this is the case for both value classes *and >> identity classes*. For compatibility, 'new/dup/<init>' also needs to be >> supported for now; eventually, it might be deprecated. Refactoring between >> constructors and factories is generally compatible. >> >> Because this re-interpretation of 'new Foo()' supports factories, there is no >> unique instance invariant. At best, particular classes can document that they >> produce unique instances, and clients who need this behavior should ensure >> they're working with classes that promise it. (It's not as simple as looking >> for a *current* factory, because constructors can be refactored to >> factories.) >> >> For developers who don't care about unique instances, this is the simplest >> approach: whenever you want an instance of Foo, you say 'new Foo()'. >> >> Biggest concerns: we've demoted an ironclad semantic guarantee to an optional >> property of some classes. For those developers/use cases who care about the >> unique instance invariant, that may be difficult, especially because we're >> undoing a longstanding property rather than designing it this way from the >> beginning. >> >> (3) 'new Foo()' for unique instances and just 'Foo()' otherwise >> >> Here, the 'new' keyword is reserved for cases in which a unique instance is >> guaranteed. For value class creation, factory invocation, and constructor >> invocation when unique instances don't matter, a bare 'Foo()' call is used >> instead. 'new Point()' would be an error—this syntax doesn't work with value >> classes. >> >> In bytecode, 'new Foo()' always compiles to 'new/dup/<init>', while plain >> 'Foo()' typically compiles to 'invokestatic Foo.<make>()LFoo;' (method name >> bikeshedding tk). For compatibility, plain 'Foo()' would support >> 'new/dup/<init>' invocations as well, if that's all the class provides. >> Refactoring between constructors and factories is generally compatible for >> plain 'Foo()' use sites, but not 'new Foo()' use sites. >> >> The plain 'Foo()' would become the preferred style for general-purpose usage, >> while 'new Foo()' would (eventually, after a long migration period) signal an >> interest in the unique instance guarantee. Java code written with the updated >> style is a little lighter on "ceremony". >> >> Biggest concerns: a somewhat arbitrary shift in coding style for all >> programmers >> to learn, which at a minimum must be adopted when working with value classes. >> >> --- >> >> What are your thoughts about the significance of the unique instance >> invariant? >> Is it important enough to design instance creation syntax around it? Do >> either > > (2) or (3) above sound like a better destination than the plan of record?