What's in a CONSTANT_Class?

John Rose john.r.rose at oracle.com
Fri Jun 16 02:18:31 UTC 2017

On Jun 15, 2017, at 3:09 PM, forax at univ-mlv.fr wrote:
> (BTW, neither you nor Karen did answer to my mail asking why we need UFoo; )

I'm working on a manifesto about this.  Short answer is Q-Foo and L-Foo need to
be disjoint types, so that interpreter processing of L-Foo is undisturbed and Q-Foo
can used non-heap buffering.  But, some language features require a *union*
of Q-Foo and L-Foo.  That could be Q-MaybeRef<Foo> but it is so fundamental
to translation strategies that it seems to merit a new type-kind.  It can be the
final one, since it is a disjoint union containing all values of the other type-kinds
(L,Q,I,J,F,D).  To soften the blow, we can then align Q and U type-kinds to use
exactly the same representation, so that in practice verified Q-values are
carried using the same format as verified U-values (of the same class) but
with a little less exercise of the power of the carrier (refs and nulls don't appear).
The motivating uses of U-types are any-kinded type parameters and interfaces
which are implemented by values.  Once accepted, there are many serendipitous
uses of U-types that arise, including use cases where you want "Q-Foo or null"
(that's U-Foo) or "Foo and I don't want to know if it was boxed or not" (U-Foo again).
Finally, putting U-types in the heap, if we get that far, gives us frozen arrays
and frozen objects "for free".

Trust me, we spent many months trying to find a way to implement type-vars
and interfaces without U-types and the alternatives were all worse.  (Always
box for interface calls, but no heisenboxes?  No.  Always box any-type vars?
Also no.  Always specialize any-generic code at bytecode level?  No, no, no.
Use U-types for operands in generic algos and interface defaults?  Yes!)

> I agree that we may need a tree of constants but only if the interpreter need that in order to interpret the code.
> In my opinion, we should only use a tree of constants if it makes sense for the interpreter, otherwise, the constant should be flattened as a String.

There are other reasons to avoid strings, notably better type checking.
Also footprint, if the complexity exponent goes above 2 (as I already argued).
> You can patch this with repeat operators, but the natural format
> is a tree, which represents all subparts uniformly, rather than some
> as a defining use, and others as repeated uses.
> I fully agree. Specializing the code should not require to patch constants of the constant pool.
> The patchable content should be represented by an index inside a tree and the interpreter should maintain an array (in fact two arrays because you have method parameter types and class parameter types) of the corresponding type arguments.

(Another manifesto I'm working on!)  IMO, enhancing constant pools so they can
be smoothly parameterized is Job One for VM support of extended generics.
> Actually, for the proposed extension, you look at the *first* character to see
> if it is a ';'.  It's a different place (already existing) in the system where you
> check to see whether the name is of the form Foo or "LFoo;", and strip
> the decorations in the latter case.  You *could* get away with Class["QFoo;"]
> but I don't recommend it, because it's a little harder to decode for both
> human readers and parsers.
> i do not understand why ?

We read a prefix first so it sets a context, and then we read the rest.
We human readers are used to this.  It's like backslash superquoting.
If you don't know what you are reading until you read the end, then
you have to read it twice.

For a computer, the flag is at foo[0] rather than foo[foo.length-1], which
is a little less complex.  And putting the flag at the beginning allows the
computer to stream over the text, choosing a parser up front, and then
like the human reader it can "read the rest" with confidence.  Streaming
over small strings like this has no performance advantage, but streaming
code is easier to understand and reason about, hence less buggy.

There's also my prejudice, to be frank.  I've always been annoyed by the
obtuseness of the check sym[0]=='L' && sym[sym.length-1]==';'.  That
sort of oddity breeds bugs, compared to a left-to-right parse.

> If I understand what you are saying, that's not MVT at all, since it
> would force a revolution in tools.  So we won't do that.  It's overwhelmingly
> likely that legacy uses of CONSTANT_Class will coexist with new
> CP forms for multiple releases, even if this gives up the advantages
> of normal forms.
> yes,, it's post MVT and given there will be other changes in the constant pool, tools will need to be updated so we can also mandate CONSTANT_Class to use only the descriptor format at at time. 

Those mandates are harder to pull off than they look beforehand.
(Who knew modules would be so excruciatingly hard to "mandate"?)
I think we'll have to do a gentle introduction with peaceful coexistence
for at least a couple years.

> [In the crystal ball]
> Beyond MVT, the CONSTANT_Class[";QFoo;"] wants to become either
> a Pack200 style thing like this:
>   CONSTANT_Type[format="Q.;", args={ClassFile["Foo"]}]
> or (preferentially) a Dan-tree-like thing like this:
>   CONSTANT_ClassType[mode=Q, class=ClassFile["Foo"]]
> i disagree, i think we should limit ourselves to use tree of constants only when we need type substitution,
> Constant_Class("QFoo;")  get you enough information, and if the format is normalized to be always descriptor, you can easily disambiguate using the first character.

The repetition of the string "Foo" in two constants is a failure of normalization.
This is not just theoretical:  If we have one ClassFile["Foo"] in the whole CP
for getting at "Foo.class", then we have a clear (non-buggy) algorithm for
making queries about the contents of "Foo.class".  This is something the
VM team has asked for, if I'm not mistaken.  (Karen?)

So, more-lighter trees are not just for the interpreter, not just for type variables,
but have a number of properties which make them (on balance) better than
fewer-heavier trees, or just strings.

— John

More information about the valhalla-spec-observers mailing list