Thanks for bringing up .equals, and the possibility to ban == on values, both of which have been touched on in the past but we've not really focused much on these.  It also helped me clarify why I think there's really only one answer here.

As to equals(), having == be a substitutability test does not make “equals()” obsolete — far from it.  The existing analogue of == vs .equals() with reference types holds (pretty much exactly) for values when == is a substitutibility check.

For example, consider a class like

    value class StringWrapper { String s; }

The substitutability test here would be to compute (w, u) -> (w.s == u.s);  This _is_ the "are they exactly the same value" subst test, and something we should be able to express.   But it is not the implementation we'd often want for equals -- we'd likely want to delegate to String::equals:

    value class StringWrapper {
        String s;

        public boolean equals(Object o) { return  o instanceof StringWrapper sw && s.equals(sw.s); }
    }

So == (continues to) means "are you exactly the same thing"; .equals() (continues to) means "are you logically the same thing", as with refs today.    And as with refs, the former is a sensible starting point for the latter (sometimes its good enough), but we many want to refine it to allow physically different things to be treated as logically the same.  (The difference is really just a quantitative one; there are just more values than refs (e.g., Complex) for which `==` is already the right answer for `equals()`.)  So while the substitutibility test is quantitatively _closer_ to what equals() is likely to be, it's still not always going to be the same (and its usually going to be simpler.)  And just as we support both now, for reasons, we probably will still want to do so.

Coming back to our choices, there are four possible interpretations of ==v so far:

 - The LW1 interpretation is "== means identity, and values have no identity, so == always says false"  - The substitutibility interpretation is whether the two operands have no observable differences
 - The third is: upcall to .equals()
 - The fourth is: Don't allow it at all.

IMO the first is considerably worse than useless; the user is allowed to ask a harmless and familiar question, and is guaranteed to get a surprising answer.  If that's the case, don't let them ask at all.  (And you agree, but, see below.)

The second interpretation is a sound generalization of `==` on refs and primitives; it means "are you exactly the same thing", which can be given a precise linguistic meaning, and can be further refined with logical equality tests where desired.  The main objection is that it is more expensive for the VM to implement, and the cost model has a broader variance.  (These are not nothing, I just don't think they trump intuitive semantics or compatibility.)

If the second interpretation gives VM engineers fits, the third one is even worse, as it means upcalling to arbitrary Java code from the ACMP bytecode.  It also gives me fits, because it tries to go back 25 years and rewrite what == means for objects.  (And it probably gives you fits, because you've commented frequently on how often equals() implementations are wrong.)

The fourth answer, ban it, is surely better than the first, but let's pull on that string.

Let's say `==` is meaningless on values, so we ban it.  But, just as with the first, we have a problem for code that trucks in Object (including erased generics).  If this code uses == to compare user-provided values to a user-provided sentinels, this code will /just stop working//. /And even if they are willing to rewrite it, there's now no convenient and reliable way to write code that tests "are you the same thing I saw before".

One of the subtle (but ultimately, good, I think) things about L-world is that /values are Objects/.  That means, if you take an Object parameter (or a T, for erased generics), someone can pass you a value, and your code should still work.  (If it does still work, we've achieved (yet again) that elusive form of /forward compatibility/ -- code that was written before the language had feature X, can deal perfectly well with X.)

OK, so if we ban == on values, should we ban it on generic code too?  That's the sound choice, since we're quantifying over types for which == may not be defined.  But that's neither source- nor binary- compatible with existing code.  Which means, at least to keep this code working, we still have to assign a meaning to == on T when one or both of the operands are a value.  (Of which there are three choices so far, detailed above.)  So for existing sources and binaries, we should give ==T a meaning, otherwise this code breaks.  Now, what about Object?  There exists plenty of code which accept Object, and use == on it.  So we have to continue to assign a meaning to ==Object too.  Again, we have three choices.  And if we can assign a sound meaning for T== and Object==, which works when you pass a value in, why not use that for value== too?

So my claim is: banning it is effectively impossible; we at least have to pick one of the other intepretations to fall back on for existing sources and binaries, and if we're going to do that, we should just do that.


Upleveling....   your concern about "mental database" is a valid one, and one I've been worried about too.  This is why I've been on a search-and-destroy mission to eliminate gratuitous asymmetries between values and references as we bring them closer together in the type system.  (I don't want people who write code that trucks in Object, or erased T, to have to be writing two versions of their code, one for refs and one for values, or even to be thinking much about the differences.)  On that score (downleveling again):

 - The "false" interpretation means that you can ask == of values, but the question is meaningless.  That means, if you are ever to be exposed to values, you have the following bad choices: give up on discriminating between values, or do something different for values and refs, or just use equals() all the time.  If these are your only options, that's pretty terrible.

 - The "ban it" interpretation is similar; you don't get to ask the question, so you're stuck with doing one thing for refs and one for values, or always using .equals().  It also seems impractical; we will end up reinventing one of the other solutions for compatibility reasons only.

 - The "call up to Java" interpretation means that the treatment of == on refs and values are about as different as you can possibly get!  Again, this means people will end up either using .euals() all the time, or writing different code for refs and values.

/* - The substitutibility test is the only interpretation that is consistent with existing understanding and coding idioms, and which will "just work" when values start getting injected into code that was compiled years ago that takes Object / erased T and has no conception of values.  It is the only version that doesn't require that people rewrite their code when values start showing up in your HashMap, or constantly ask themselves "is this instance a value or a ref." */


The distinction between == and .equals() in Java may have its problems, but its how Object works, and people have learned idioms that work for it.  Preserving that intuition, and that code, seems to me to be the highest priority.  Option 4 feels to me like a wishful attempt to try and go back and fix history, which is a worthy goal but we've all watched enough science fiction to know how that ends.




On 2/22/2019 3:38 PM, Brian Goetz wrote:

On Feb 22, 2019, at 2:42 PM, Kevin Bourrillion <kev...@google.com> wrote:

Fair point that `==` has always been the test of /absolute/ substitutability. But I think this is overlooking something big: People implement equals() in order to ask for "substitutability for virtually all intents and purposes". Of course, most code should never be going anywhere near identity hash maps or synchronizing on value-like things, etc. And that means that equals() has become the substitutability test that people WANT.

This in turn means that every usage of `==` on a non-primitive type (named class) is always suspicious. As a reader and maintainer of code, I need to think about this carefully. Is it a Class<?> -- if so == is harmless but also .equals() is harmless and it's not worth switching idioms. Is it an enum type? I have to go look it up to find out, in which cause it is once again both harmless and pointless (especially if I can replace with switch!). Barring those, then it's either a risky micro-optimization or some other bizarre coding choice that I need to be very careful around.

I think we should make users write `equals` to test value types. If they write `==`, they are indicating a special situation where they need identity semantics, which don't make sense for value types, and that should be an error.

One of the concerns I've always had about value types is that developers would be forced to maintain a mental database of which types are value types and which are reference types, and that they could not hope to assess the correctness of code they read or write without having that. In a world where users commonly need to do "absolutely substitutable" checks, then this proposal would be the way to achieve that. But, I don't think that's the world we're in.

Thoughts?



On Thu, Feb 21, 2019 at 9:59 AM Brian Goetz <brian.go...@oracle.com <mailto:brian.go...@oracle.com>> wrote:

    More on substitutibility and why this it is desirable...

    > #### Equality
    >
    > Now we need to define equality.  The terminology is messy, as
    so many
    > of the terms we might want to use (object, value, instance) already
    > have associations. For now, we'll describe a _substitutability_
    > predicate on two instances:
    >
    >    - Two refs are substitutable if they refer to the same object
    >      identity.
    >    - Two primitives are substitutable if they are `==` (modulo
    special
    >      pleading for `NaN` -- see `Float::equals` and
    `Double::equals`).
    >    - Two values `a` and `b` are substitutable if they are of
    the same
    >      type, and for each of the fields `f` of that type, `a.f`
    and `b.f`
    >      are substitutable.
    >
    > We then say that for any two objects, `a == b` iff a and b are
    > substitutable.

    Currently, our type system has refs and primitives, and the ==
    predicate
    applies on all of them.  And for all the types we have today
    (with the
    almost-too-small-to-mention anomaly of NaN), == *already is* a
    substitutibility predicate (where substitutibility means,
    informally:
    "no observable difference between the two arguments."  Two refs are
    substitutible if they refer to the same object identity; two
    primitives
    are substitutible if they refer to the same value (modulo NaN.)

    VM engineers like to refer to `==` on refs as "identity
    equality", but
    that's really an implementation detail.  What it really means is:
    are
    the two things the same.  And that's what `==` means for
    primitives too,
    and that's how the other 99.99% of users think of it too.

    The natural interpretation of `==` in a world with values is to
    extend
    this "are these two things the same" to values too.  The
    substitutibility relation above applies the same "are you the same"
    logic equally to refs, values, and primitives.  No sharp edges
    (except
    the NaNsense that we are already stuck with.)




--
Kevin Bourrillion | Java Librarian | Google, Inc. |kev...@google.com <mailto:kev...@google.com>


Reply via email to