Re: Finding the spirit of L-World

Brian Goetz Fri, 22 Feb 2019 13:49:30 -0800

Thanks for bringing up .equals, and the possibility to ban == on values,both of which have been touched on in the past but we've not reallyfocused much on these. It also helped me clarify why I think there'sreally only one answer here.

As to equals(), having == be a substitutability test does not make“equals()” obsolete — far from it. The existing analogue of == vs.equals() with reference types holds (pretty much exactly) for valueswhen == is a substitutibility check.


For example, consider a class like

    value class StringWrapper { String s; }

The substitutability test here would be to compute (w, u) -> (w.s ==u.s); This _is_ the "are they exactly the same value" subst test, andsomething we should be able to express. But it is not theimplementation we'd often want for equals -- we'd likely want todelegate to String::equals:


    value class StringWrapper {
        String s;

public boolean equals(Object o) { return o instanceofStringWrapper sw && s.equals(sw.s); }

So == (continues to) means "are you exactly the same thing"; .equals()(continues to) means "are you logically the same thing", as with refstoday. And as with refs, the former is a sensible starting point forthe latter (sometimes its good enough), but we many want to refine it toallow physically different things to be treated as logically the same. (The difference is really just a quantitative one; there are just morevalues than refs (e.g., Complex) for which `==` is already the rightanswer for `equals()`.) So while the substitutibility test isquantitatively _closer_ to what equals() is likely to be, it's still notalways going to be the same (and its usually going to be simpler.) Andjust as we support both now, for reasons, we probably will still want todo so.

Coming back to our choices, there are four possible interpretations of==v so far:

- The LW1 interpretation is "== means identity, and values have noidentity, so == always says false" - The substitutibility interpretation is whether the two operands haveno observable differences

 - The third is: upcall to .equals()
 - The fourth is: Don't allow it at all.

IMO the first is considerably worse than useless; the user is allowed toask a harmless and familiar question, and is guaranteed to get asurprising answer. If that's the case, don't let them ask at all. (Andyou agree, but, see below.)

The second interpretation is a sound generalization of `==` on refs andprimitives; it means "are you exactly the same thing", which can begiven a precise linguistic meaning, and can be further refined withlogical equality tests where desired. The main objection is that it ismore expensive for the VM to implement, and the cost model has a broadervariance. (These are not nothing, I just don't think they trumpintuitive semantics or compatibility.)

If the second interpretation gives VM engineers fits, the third one iseven worse, as it means upcalling to arbitrary Java code from the ACMPbytecode. It also gives me fits, because it tries to go back 25 yearsand rewrite what == means for objects. (And it probably gives you fits,because you've commented frequently on how often equals()implementations are wrong.)

The fourth answer, ban it, is surely better than the first, but let'spull on that string.

Let's say `==` is meaningless on values, so we ban it. But, just aswith the first, we have a problem for code that trucks in Object(including erased generics). If this code uses == to compareuser-provided values to a user-provided sentinels, this code will /juststop working//. /And even if they are willing to rewrite it, there's nowno convenient and reliable way to write code that tests "are you thesame thing I saw before".

One of the subtle (but ultimately, good, I think) things about L-worldis that /values are Objects/. That means, if you take an Objectparameter (or a T, for erased generics), someone can pass you a value,and your code should still work. (If it does still work, we've achieved(yet again) that elusive form of /forward compatibility/ -- code thatwas written before the language had feature X, can deal perfectly wellwith X.)

OK, so if we ban == on values, should we ban it on generic code too? That's the sound choice, since we're quantifying over types for which ==may not be defined. But that's neither source- nor binary- compatiblewith existing code. Which means, at least to keep this code working, westill have to assign a meaning to == on T when one or both of theoperands are a value. (Of which there are three choices so far,detailed above.) So for existing sources and binaries, we should give==T a meaning, otherwise this code breaks. Now, what about Object? There exists plenty of code which accept Object, and use == on it. Sowe have to continue to assign a meaning to ==Object too. Again, we havethree choices. And if we can assign a sound meaning for T== andObject==, which works when you pass a value in, why not use that forvalue== too?

So my claim is: banning it is effectively impossible; we at least haveto pick one of the other intepretations to fall back on for existingsources and binaries, and if we're going to do that, we should just dothat.

Upleveling.... your concern about "mental database" is a valid one,and one I've been worried about too. This is why I've been on asearch-and-destroy mission to eliminate gratuitous asymmetries betweenvalues and references as we bring them closer together in the typesystem. (I don't want people who write code that trucks in Object, orerased T, to have to be writing two versions of their code, one for refsand one for values, or even to be thinking much about the differences.) On that score (downleveling again):

- The "false" interpretation means that you can ask == of values, butthe question is meaningless. That means, if you are ever to be exposedto values, you have the following bad choices: give up on discriminatingbetween values, or do something different for values and refs, or justuse equals() all the time. If these are your only options, that'spretty terrible.

- The "ban it" interpretation is similar; you don't get to ask thequestion, so you're stuck with doing one thing for refs and one forvalues, or always using .equals(). It also seems impractical; we willend up reinventing one of the other solutions for compatibility reasonsonly.

- The "call up to Java" interpretation means that the treatment of ==on refs and values are about as different as you can possibly get! Again, this means people will end up either using .euals() all the time,or writing different code for refs and values.

/* - The substitutibility test is the only interpretation that isconsistent with existing understanding and coding idioms, and which will"just work" when values start getting injected into code that wascompiled years ago that takes Object / erased T and has no conception ofvalues. It is the only version that doesn't require that people rewritetheir code when values start showing up in your HashMap, or constantlyask themselves "is this instance a value or a ref." */

The distinction between == and .equals() in Java may have its problems,but its how Object works, and people have learned idioms that work forit. Preserving that intuition, and that code, seems to me to be thehighest priority. Option 4 feels to me like a wishful attempt to tryand go back and fix history, which is a worthy goal but we've allwatched enough science fiction to know how that ends.





On 2/22/2019 3:38 PM, Brian Goetz wrote:

On Feb 22, 2019, at 2:42 PM, Kevin Bourrillion <kev...@google.com> wrote:
Fair point that `==` has always been the test of/absolute/ substitutability. But I think this is overlookingsomething big: People implement equals() in order to ask for"substitutability for virtually all intents and purposes". Of course,most code should never be going anywhere near identity hash maps orsynchronizing on value-like things, etc. And that means that equals()has become the substitutability test that people WANT.
This in turn means that every usage of `==` on a non-primitive type(named class) is always suspicious. As a reader and maintainer ofcode, I need to think about this carefully. Is it a Class<?> -- if so== is harmless but also .equals() is harmless and it's not worthswitching idioms. Is it an enum type? I have to go look it up to findout, in which cause it is once again both harmless and pointless(especially if I can replace with switch!). Barring those, then it'seither a risky micro-optimization or some other bizarre coding choicethat I need to be very careful around.
I think we should make users write `equals` to test value types. Ifthey write `==`, they are indicating a special situation where theyneed identity semantics, which don't make sense for value types, andthat should be an error.
One of the concerns I've always had about value types is thatdevelopers would be forced to maintain a mental database of whichtypes are value types and which are reference types, and that theycould not hope to assess the correctness of code they read or writewithout having that. In a world where users commonly need to do"absolutely substitutable" checks, then this proposal would be theway to achieve that. But, I don't think that's the world we're in.
Thoughts?
On Thu, Feb 21, 2019 at 9:59 AM Brian Goetz <brian.go...@oracle.com<mailto:brian.go...@oracle.com>> wrote:
    More on substitutibility and why this it is desirable...

    > #### Equality
    >
    > Now we need to define equality.  The terminology is messy, as
    so many
    > of the terms we might want to use (object, value, instance) already
    > have associations. For now, we'll describe a _substitutability_
    > predicate on two instances:
    >
    >    - Two refs are substitutable if they refer to the same object
    >      identity.
    >    - Two primitives are substitutable if they are `==` (modulo
    special
    >      pleading for `NaN` -- see `Float::equals` and
    `Double::equals`).
    >    - Two values `a` and `b` are substitutable if they are of
    the same
    >      type, and for each of the fields `f` of that type, `a.f`
    and `b.f`
    >      are substitutable.
    >
    > We then say that for any two objects, `a == b` iff a and b are
    > substitutable.

    Currently, our type system has refs and primitives, and the ==
    predicate
    applies on all of them.  And for all the types we have today
    (with the
    almost-too-small-to-mention anomaly of NaN), == *already is* a
    substitutibility predicate (where substitutibility means,
    informally:
    "no observable difference between the two arguments."  Two refs are
    substitutible if they refer to the same object identity; two
    primitives
    are substitutible if they refer to the same value (modulo NaN.)

    VM engineers like to refer to `==` on refs as "identity
    equality", but
    that's really an implementation detail.  What it really means is:
    are
    the two things the same.  And that's what `==` means for
    primitives too,
    and that's how the other 99.99% of users think of it too.

    The natural interpretation of `==` in a world with values is to
    extend
    this "are these two things the same" to values too.  The
    substitutibility relation above applies the same "are you the same"
    logic equally to refs, values, and primitives.  No sharp edges
    (except
    the NaNsense that we are already stuck with.)




--
Kevin Bourrillion | Java Librarian | Google, Inc. |kev...@google.com<mailto:kev...@google.com>

Re: Finding the spirit of L-World

Reply via email to