Equality for values -- new analysis, same conclusion

Brian Goetz brian.goetz at oracle.com
Thu Aug 29 15:43:31 UTC 2019

> Braindump thoughts went as follows:
> LocalDate.Inline
> LocalDate.Val
> VLocalDate
> LocalDateV
> LDate
> LocDate

Sounds about right.  The first two have a big advantage over the rest, in that they are suitable for mechanical derivation by language-level type operations (it would be reasonable for a language to know that the “box” type of V is a class called “V.Box”, but it would be kind of unreasonable to burn in a naming convention like BoxOfV.)  

> Do migrated inline classes have different names to highlight the
> migration?

I see where you’re going — colonize the lower-case name syntax for values.  We’ve had this discussion before, and set it aside then, but there’s new information now, so its fair to dust it off and reexamine.  The last 36 times it was suggested “let’s just spell value types with lower case letters, like primitives”, we said “way way too early to have that discussion”, in part because making that syntax choice (tail) would have thrown the possibility of compatible in-place migration out the window (dog).  (This is one of many reasons why we don’t commit to syntax too early (or even, allow it to take up too much airtime); it constrains the possible feature space in ways that are hard to even see.)  And at the time, we were quite interested in supporting a compatible migration where Optional “just” becomes a value type, and invested a great deal of work in understanding the costs of making that possible, since this would be the ideal outcome for the users — existing code just gets faster.  (And, as comments earlier in this thread suggest, you were not thrilled at the idea that we were unable to succeed on that quest, because it means coming up with new names for a bunch of perfectly good types with perfectly good names.)  

But, there’s new information, which is that we seem to be converging on a scheme where the “box” is a separate nominal type, which means that this suggestion now becomes _possible_.  (It still has challenges, and its still kind of early to discuss it, because there are still some serious object-model issues we really should work out first.  But its a good reminder to not forget it.)

> Highlighting migration seems like a bad idea in the long
> term.

I know what you mean, and agree,  but let me restate it in a less “absolutist" fashion.  

Migration compatibility is essential; flag days are often fatal.  And in any migration, there are three phases: old world, transition, and new world.  Ideally, we would like to arrive at the situation where the new, idiomatic code in the new world isn’t stuck with a crappy, migration-induced syntax forever, and any migration-induced compromises fade out as the transition phase comes to an end.  (This is exactly why, for example, in L2 you can’t say `Foo<V>`, you have to say `Foo<nullable projection of V>`, because, when Foo becomes specialized, we want the default to be “specialized Foo of V”, without requiring extra syntax that says “specialize me please.”  We’d rather the pain accrue to the transition phase than the new-world phase.)  

Sometimes this is possible; sometimes it is not.  We work hard to minimize the collateral damage when it is not.  

> Specifically, in the long term it seems like the best name
> should go to the inline class, not the box. This seems to rule out
> `LocalDate.Inline` as surely we don't want to refer to `Ratio.Inline`
> or `Long128.Inline` everywhere.

You’ve just mixed two cases, though.  For new classes, like Ratio and Long128, we have no compatibility constraint with previous utterances of the class name; for existing classes like LocalDate, we do.  Yes, it would be nice to have the “polarity” be the same in both cases, but that would-be-nice runs smack into the other would-be-nice of the above principle.  

> For a naive user, this behaviour of == is deeply unhelpful and

(10 yard penalty for gratuitous overuse of “deeply”)

> What if inline classes are divided between "pure" and "impure" (better
> names needed). A "pure" inline class has (1) a sensible default value,
> not null/exception (2) a correct == implementation that always works
> as would be expected of a primitive type (3) no reference type fields.
> An "impure" inline class may contain reference type fields, may have
> no meaningful default value and should never be compared using == in
> normal code.

Yes, you’ve put your finger on the essential discomfort here, which is that when values have identity in the transitive closure of their state, much hilarity ensues.  And some values are well-behaved in the sense that they are “values all the way down” (FYI, for a detailed exploration of the values-all-the-way-down imperative in a Java-like langauge, see the LiMe language: https://researcher.watson.ibm.com/researcher/files/us-bacon/Huang08Liquid.pdf <https://researcher.watson.ibm.com/researcher/files/us-bacon/Huang08Liquid.pdf>).  For values like Complex, Point, Rational, etc, there is no identity anywhere in the transitive closure of the state, so there are no places where `==` and `equals()` would disagree.  (However, I would point out that it is not the case that all such VATWD classes necessarily have great default values.)  

Early on, its was asked, “why not constrain value types to be values all the way down”, and about 30 ms later, someone realized “that means you couldn’t have tuples with String in them”, so clearly constraining values down to the “well-behaved” ones is a non-starter.  

As you point out, another point in the spectrum is to split over well-behavedness, and mark them in the type system (“identity-polluted value class Foo”), so the compiler could warn when you try to (say) use `==` on an identity-polluted value, but I don’t think anyone would really thank us for this either.  I get the appeal of splitting here, but I’m skeptical that the return on specificity carries its weight.  

> Then we say that "pure" inline classes are named using lowerCamelCase,
> and "impure" inline classes using UpperCamelCase.

That’s an interesting twist on where I thought you were going, which was “let’s just use lower case for all value classes” — you’re suggesting marking the distinction between VATWD and not via naming convention rather than language semantics.  (Besides the fact that I think lumping is better than splitting here, I think its pretty questionable for the language to make such inferences based on a naming convention.)  But the “use lower case for values” is workable, and connects to another concern that has been raised in the past: managing the cognitive load of asking users to carry around a mental database of which classes have identity and which do not.  So it may well be time to take this idea back off the shelf where we placed it many years ago, and give it another look.  

> Thus,`localDate` is the inline class for the `LocalDate` box (assuming
> agreement on a suitable default value).

But, ignoring the immediately above, and as pointed out way above, now that we’ve given up on compatible in-place migration, this again becomes a possibility.  (Again, as mentioned above, I don’t want to dive any deeper on it now, because we have more important object-model concerns (such as “what is the nature of the conversion from localDate to LocalDate — sub typing, boxing, widening, something else?”) which again we should nail down first.)

> The overall user model is simpler I think. All types that are
> lowerCamelCase behave like primitives wrt == and null, and all types
> that are UpperCamelCase behave like reference types wrt == and null.

I get where you’re going, but I’m not sure it is spending the complexity budget where it gets the most return.  Implicitly, this proposal says “the biggest problem with values is the equality semantics, so let’s spend all our syntax budget highlighting the difference.”  But I am not convinced that’s our biggest problem, and if it’s not, this is an awfully big chunk of syntax budget to spend on it.

Good thoughts!


More information about the valhalla-dev mailing list