Valhalla basic concepts / terminology
brian.goetz at oracle.com
Fri May 22 19:36:42 UTC 2020
> * There are two kinds of objects/instances; the notions "object" and
> "instance" apply equally to both kinds. These are "inline objects"
> and "identity objects". Statements like "it's an instance, so that
> means it's on the heap" and "you can lock on any object" become
> invalid, but statements like "42 is an instance of `int`" /are/ valid.
From a pedagogical perspective, it's not clear whether we are better
off framing it as a partitioning (there are two kinds, red and blue) or
that some objects have a special property (in addition to their state,
some objects have a special hidden property, its identity.)
We have been going down the former path, but I am starting to think the
latter path is more helpful; rather than cleaving the world of objects
in two, instead highlight how some (many!) objects are "special".
> o (do we intend to use the term "object", or use the term
> "instance", or define the two differently somehow?)
To the extent we can avoid redefining these things, I think it is easier
to just leave these terms in place.
> * Identity objects get the benefits of identity, at the cost that
> you may only store /references/ to them. They will ~always go onto
> the heap (modulo invisible vm tricks).
Yes. Again, pedagogically, I am not sure whether the heap association
is helpful or hurtful; on the one hand, C programmers will understand
the notion of "pointer to heap node", but on the other, this is focusing
on implementation details, not concepts.
One of the oddest things here is that you can have references to all
objects, but only can pass/store inline objects directly -- it's like a
2x2 matrix with one corner blacked out.
> * Inline objects forgo the benefits of identity to give you the
> /option/ to store either a reference to a heap object or the data
> itself inline.
> o (Users choose by e.g. writing either `Foo.val` or `Foo.ref`,
> though one would be the default)
Yes. It is worth noting here that we would like for the actual
incidence of `.ref` and `.val` in real code to be almost negligible.
Maurizio likens them to "raw types", in the sense that we need them to
complete the type system, and there are cases where they are
unavoidable, but the other 99.9% of the time, you just say "Point".
> * We can also sort concrete classes into just two groups: "inline
> classes" and "identity classes", each of which begets only its own
> kind of objects/instances.
Yes. All the instances of a class C are either identity objects, or
> We don't say "value types" anymore because the term "value" (as in
> "value set") applies to /all/ types.
Yes. The appeal of "value" comes from "pass by value", but there is too
much baggage associated with the word value.
The choice of inline is not perfect; it's a strange word to most people,
but it comes with the intuition that an inline object's layout will be
"inlined" into containing objects/arrays. But, it doesn't mean that its
methods will always be inlined (though that is more likely as they are
final and the VM will have sharp type information.)
> * A concrete /class/ is either an "identity class" or an "inline
> class". But a compile-time /type/ is distinguished not by "inline
> vs identity" but by "inline vs /reference/".
Yeah, this is the other hard one. In fact, it took us years to realize
that the key distinction is not reference vs primitive/inline, but
_identity_ vs inline.
Here's the scorecard:
Object is a reference type.
For an identity or abstract class C, C is a reference type.
For an interface I, I is a reference type.
For an inline class V, V is an inline type.
Primitives are inline types.
A reference type always holds a reference to an object (which might be
inline or identity), or null.
> o must hold a "reference" (or null)
> + Condition: the type (or, for a type variable, its bound):
> is neither an interface nor "almost-interface"; or is a
> subtype of IdentityObject; or is an inline class that
> specifies ref-default; or bears an explict `.ref`.
> + this is probably what the term "reference type" needs to
> apply to now. For example it is currently "reference
> types" that my nullness analysis project is concerned with
> and I think it would remain that way.
> + key: it's always a reference to an instance (well, unless
> it's not null), but that might be either kind of instance.
> o must hold an inline object
> + Condition: it's a subtype of InlineObject (perhaps by
> being an `inline class` itself that is a val-default....
> or by being primitive?); or bears an explicit `.val`.
> + this is probably what the term "inline /type/" should
> refer to.
> o might hold either?
> + or can this not happen because you would be forced to
> write `.ref/.val`?
I'm not sure I follow what this section is asking?
> * Primitive types /are/ inline classes, full stop.
> o It's just that for compatibility reasons they get to have
> custom-built reference projections instead of only the
> general-purpose `Foo.val` treatment.
That's where we hope to get, but we will have to break a few eggs to get
Egg #1: synchronization on wrappers. Today, you can (but should not)
synchronize on a j.l.Integer; to achieve this goal, this would throw.
Egg #1a: possibly, depending on where we land for weak refs, a similar
thing will happen for WR<Integer>
Egg #2: equality. Today, equality on wrappers is identity based, and
the primitive cache makes some small wrappers == to each other; to get
to this goal, == would actually be equality on the contained number.
This is arguably better, but different from how it works now.
More information about the valhalla-spec-observers