Structural descriptors in the constant pool

Dan Smith daniel.smith at
Wed Dec 20 23:49:37 UTC 2017

As discussed at the meeting today, here's an outline of what a structural descriptor (pointer-based rather than string-based) in the constant pool might look like.

New constants representing types:

CONSTANT_PrimitiveType_info {
   u1 tag;
   u1 kind; // any valid 'atype' as specified for 'newarray', or T_VOID

CONSTANT_ArrayType_info {
   u1 tag;
   u2 component_type_index; // a type constant

(An alternative encoding of ArrayType would consist of an element type and a dimensions count. Putting each lower-dimension array type in its own constant potentially improves sharing, but adds overhead where the component types would otherwise go unused.)

A "type constant" is one of:
- Class
- ArrayType
- PrimitiveType

For historical reasons, a CONSTANT_Class may also refer to an array type, but this usage is discouraged.

New constant and attribute for method descriptors (in the spirit of NameAndType and BootstrapMethods, these are supplementary substructures of a MethodType):

CONSTANT_ParametersAndReturn_info {
   u1 tag;
   u2 return_type_index; // a type constant
   u2 parameter_types_attr_index; // an entry in TypeLists

TypeLists_attribute {
   u2 attribute_name_index; // Utf8 "TypeLists"
   u4 attribute_length;
   u2 num_type_lists;
   {   u2 num_types;
       u2 types[num_types]; // type constants

(Two design goals constraining this solution: 1) re-use MethodType as the preferred representation of method descriptors, and 2) avoid introducing a new variable-length constant pool entry. Thus, two levels of indirection, one to provide a non-Utf8 constant for MethodType to reference, and another to offload the variable-length list to an attribute. We can drop these goals to reduce indirections.)

Changes in usage of constants:
- A MethodType can refer to a ParametersAndReturn (preferred) or a Utf8 method descriptor (legacy)
- The descriptor_index of a field may refer to a type constant (preferred) or a Utf8 field descriptor (legacy)
- The descriptor_index of a method may refer to a MethodType (preferred) or a Utf8 method descriptor (legacy)
- The descriptor of a NameAndType may refer to a type constant or MethodType (preferred), or a Utf8 field/method descriptor (legacy)
- ldc can refer any type constant
- checkcast/instanceof can refer to a Class or an ArrayType
- anewarray/multianewarray can refer to any type constant (and arguably the opcode names should be changed; newarray can be viewed as a compact shortcut, like iconst_0)

Changes in interpreting descriptors:
- A MethodType that refers to a Utf8 treats the descriptor as if it were expressed with a fresh ParametersAndReturn referencing fresh type constants
- A field or NameAndType that refers to a Utf8 field descriptor treats the descriptor as if it were expressed with a fresh type constant
- A method or NameAndType that refers to a Utf8 method descriptor treats the descriptor as if it were expressed with a fresh MethodType referencing fresh type constants
- Descriptor comparisons (during field/method resolution and method selection) continue to occur *without resolving* the referenced type constants. They are simply tree equality tests.
- Class loader constraints are generated for matching Class constants, identified by recurring through the trees.

(Note that implementations can continue to work with strings, if desired.)

More information about the valhalla-spec-observers mailing list