value type hygiene
daniel.smith at oracle.com
Wed May 9 23:46:07 UTC 2018
> On May 6, 2018, at 3:17 AM, John Rose <john.r.rose at oracle.com> wrote:
> Like many of us, I have been thinking about the problems of keeping values, nulls,
> and objects separate in L-world. I wrote up some long-ish notes on the subject.
> I hope it will help us wrap our arms around the problem, and get it solved.
> TL;DR: Remi was right in January. We need a ValueTypes attribute.
So I've been digesting this for a few days. I don't like it much. Subtle contextual dependencies are a good recipe for exploits and general confusion. If it were the only way forward, okay, but I find myself frequently thinking, "yeah, but... Q types!"
The way you've framed the problem has evolved from the original idea. Which is fine, but it's helpful to review: the idea was to make a choice between two type hierarchies, U-world and L-world:
The crux of the choice was: in what way do value types interact with legacy bytecode? Does the old code reject values, or does it get automatically enhanced to work with them?
We acknowledged that, in the latter hierarchy, we must push many operations into the top, which minimizes the need for 'R' and 'Q', perhaps so much that they can be elided entirely. You said in a November write-up:
"The Q-type syntax is *maybe* needed, but in any case does not appear in a parallel position of importance with the dominant L-type syntax."
In other words, working exclusively with L types wasn't a requirement, it was a might-be-nice.
So we set out on an experiment to see how far we could get without 'R' and 'Q'. My read of the current situation is that we've probably stretched that to the breaking point, so: good experiment, we've learned some things, and we understand what value 'Q' types give us.
Another read is that we're not ready to end the experiment yet, we have a few tricks up our sleeves, and we can force this to work. That's fair, but I'm not convinced we need to force it. Not changing descriptors is not a hard requirement.
(To be clear about my preferred alternative: we introduce Q types as first-class types (applicable to value classes only), update the descriptor syntax, assert QFoo <: LFoo, and ask compilers to use Qs when they want to guarantee non-nullability and allow flattenability. Compilers generate bridge methods (and bridge fields?) where needed/if desired.)
You talk a little about why it's nice to avoid changing descriptors:
"L-world is backward compatible with tools that must parse classfile descriptors, since it leaves descriptor syntax unchanged. There have been no changes to this syntax in almost thirty years, and there is a huge volume of code that depends on its stability. The HotSpot JVM itself makes hundreds of distinct decisions based on descriptor syntax which would need careful review and testing if they were to be adapted to take account of a new descriptor type ("QFoo;", etc.)."
Okay, put that in the "pro" column for "Should we leave descriptors untouched?" In the "con" column is all the weird new complexity in this proposal. Notably:
- The mess of overloading and implicit adaptations. Huge complexity cost here, from spec to implementation to debugging. We've been there before, and have always thrown up our hands and retreated (not always for the same reasons, but still).
- The JVM "knows" internally about the two kinds of types, but we won't give users the ability to directly express them, or inspect them with reflection. That mismatch seems bound to bite us repeatedly.
- We talk a lot about nullability being a migration problem, but it is sometimes just a really nice feature! All things being equal, not being able to freely talk about nullable value types is limiting.
I'd rather spend the feature budget on getting dusty code to work with shiny new descriptors than on dealing with these problems/compromises.
I guess that, before going all in on this approach, it would be helpful for me to see a more complete exploration of the relative costs.
More information about the valhalla-spec-observers