Evolving CONSTANT_Class

Brian Goetz brian.goetz at oracle.com
Mon Jun 15 19:28:39 UTC 2020

> Here's a table listing all the type-flavored uses (where "X" means 
> "allowed here" and "~" means "maybe not essential, but the semantics 
> would be clear"):

More specifically, in the first two columns X means "allowed now", and 
in the later columns, X means "proposed."  Note too that the proposed 
Species column is identical to the proposed Class name column.

The primitive column is interesting as we probably are going to 
translate away all of these to some sort of `Qint` type when they appear 
in these places, so in the JVM, are probably not needed.

> Another way to handle it is to distinguish between a *species*, which 
> is a class-like entity, and a *species type*. It's helpful to remember 
> that there may be inline types of species (that is, a "Q envelope" of 
> a species).

I think this is a fruitful direction; I can have `ArrayList[T] extends 
List[T]` where it is a class-like use, and I can have `Foo[T].x` where 
it is a type-like use.

> 1) Treat everything in the class/interface table as a degenerate use 
> of a type. A class name is always interpreted as an L type.

Given that a specializable class Foo<T> gives rise to species Foo[x] and 
Foo[y], _and_ a class type Foo such that Foo[t] <: Foo for all t, the 
duality between class and type here seems inevitable.

> - When a Class constant is viewed as a type (for (1) that's always, 
> for (2) that's for type-flavored references), the implicit L envelope 
> is a historical wart. Do we also support explicit L descriptors? Do we 
> try to migrate the world away from the implicit envelopes?

I would love to migrate away, but I suspect the cost/benefit isn't 
there.  Historical warts are OK.

> - Should we add primitive types? How are they spelled? (The standard 
> descriptor syntax for primitives is already interpreted as a bare 
> class name.)

Given the way we are thinking for translation, where there is going to 
be some Q type that stands in for primitives when used in class-y 
contexts (if for no other reason than the double-slot thing), I don't 
think this is needed.

> - How do we handle type variables, both top-level and nested? Either 
> we embed constant pool pointers in Utf8 entries (yuck!), or we need to 
> extend Class constants to support references both to Utf8 entries and 
> to [some new thing].

This is the stringy-vs-tree problem we've been wrestling with for a long 
time.  The solution to this problem seems to hinge on the solution to 
that one.

> - Should we revisit "naked" descriptor references, allowing them to 
> point to either bare Utf8 entries or Class constants and 
> MethodType/[something else] constants? Do we try to migrate the world 
> away from naked descriptor references?

I think this may well fall out of the "trees vs strings" discussion.

> I'm appealing here to a design principle that seems to have driven the original constant pool design: Class constants are for things that get resolved (and can be cached); descriptor strings are little more than fancy names. This principle doesn't always get followed: the verifier sometimes loads classes named by descriptors; array type class constants resolve their element types without a separate entry; more recently, StackMapTables use Class constants to represent types, and MethodTypes resolve method descriptors "as if" there were class constants for all of the parameter types. But I think these, especially the recent ones, are mistakes, and I still think the original notion is a useful separation of concerns that we should try to follow in our design.

The tension that comes up here is that we want to be able to match 
descriptors between clients and declarations.  I don't want to invent 
one way to describe class constants for species, and another way to 
embed species in descriptors.

Now, it may be possible (depending on our translation strategy) that we 
don't need to embed species in descriptors, because we're just going to 
erase descriptors, and put the specialization information somewhere 
else, for the VM to use opportunistically.  That would make the 
splitting strategy more appealing.

> - For bare descriptors (type of a field), it's fine to use something like "LList[QVal;];". Or maybe it's useful to describe descriptors in terms of Class/Species constants. In any case, there's still a need to figure out how to parameterize a descriptor with live constants ("LList[$T];"), but I think this can be set aside as a separate problem.

This is the one I'm alluding to above.

> So I think we need CONSTANT_SpecializedMethodref, which has 1) a pointer to a Methodref constant, and 2) pointers to some resolvable constants (typically, but maybe not exclusively, representing types). (Caveat: there are some details about the interaction between type arguments, overriding, and method resolution that I'm hand-waving about. Maybe the encoding will be stacked a little differently.)

We've been around this merry go round a few times too, going back and 
forth between cramming stuff into the descriptor string and putting the 
method types somewhere else.  Again, the translation story (can we leave 
descriptors alone) impinges on this.

Don't forget that when you have a local generic class nested in a 
generic method, the method args implicitly parameterize the nested 
class.  Which means that when we refer to a species of the local class, 
we have to supply the type arguments for both the method and for the 
local class (and any other enclosing classes.)  Again, there is a 
lump/split choice here; we can smoosh together the arguments, or provide 
a trail of witnesses to the enclosing arguments.  If we choose the 
latter, then it might be mix of C_SMRef and C_Species.

More information about the valhalla-spec-observers mailing list