optional arguments for bootstrap methods

John Rose john.r.rose at oracle.com
Fri Oct 22 01:43:06 PDT 2010

Based partly on our discussions at the Summit about "live constants", and also based on the likely requirements of Project Lambda, the JSR 292 EG is likely to allow any single invokedynamic instruction to pass one or more extra constant values into the bootstrap method invocation.

Here is the current thinking.  Language implementors, please tell us if we are missing anything.

We call these "static arguments", in contrast to the normal "dynamic arguments" that are received on every method call.  For invokedynamic, the dynamic arguments are received as if by 'invokeExact' on the method handle bound to the invokedynamic instruction instance, by the BSM.  The BSM decides, once at link time, which method handle to choose based on the static arguments.

There are three standard static arguments always passed to the BSM:
 1. an indication of the caller class (note: this is likely to change to a MethodHandles.Lookup capability)
 2. a String naming the method apparently being called
 3. a MethodType indicating the dynamic arguments and return value types

The String and MethodType are extracted from the NameAndType constant at the invokedynamic site.

The invokedynamic instruction points to a constant pool entry that looks like this:

struct InvokeDynamic_info {
  u1 tag; // always CONSTANT_InvokeDynamic = 18
  u2 bsm_index;   // ref to CONSTANT_MethodHandle
  u2 descr_index; // ref to CONSTANT_NameAndType
  u2 argc;  // count of optional static arguments
  u2 argv[argc];  // refs to anything 'ldc' can refer to (int, long, float, double, class, method handle, method type)

If we take this path, we will switch to the tag '18', to reduce confusion when old and new class files are mixed.

The existing tag '17' for the no-extra-args format will drop out of use and be illegal in JDK7 FCS.

Depending on the value of argc, the BSM will be invoked in one of three ways:
  if (argc = 0)  binding = bsm.invokeGeneric(lookup, name, type);
  if (argc = 1)  binding = bsm.invokeGeneric(lookup, name, type, (Object) argv[0]);
  if (argc > 1)  binding = bsm.invokeGeneric(lookup, name, type, (Object[]) argv);

Note that the BSM, since it is derived from a CONSTANT_MethodHandle, can only be a "direct method handle", a pointer to a Java method.  It cannot be adapted (e.g., as a spreader or collector).  But in user-visible code, it would be reasonable to express a typical BSM as an overloaded method, whose third overloading takes a varargs array:
  MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType type);
  MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType type, Object arg);
  MethodHandle myBSM(MethodHandles.Lookup look, String name, MethodType type, Object... args);

It is natural to ask why we are using varargs, when we could just specify that the extra static arguments could be passed positionally.  The simple answer is positional arguments are of limited use, but a varargs array can be used to encode very rich and useful BSM arguments.

Since very few Java methods take more than 10 parameters, allowing up to 255 extra arguments is not very interesting.  (Actually the limit would be 251 non-long non-double arguments, since there are three to start with, plus the BSM itself.)  Writing a BSM which takes (say) 100 arguments would be silly.  (Note that BSMs cannot be collectArguments adapters; they have to be simple JVM methods or constructors.)  And a related one that takes 99 arguments would have to be a completely distinct method.  It is clear that any large number of arguments has to be passed in an array.  So let's pass them all in a trailing varargs parameter.

Will users want more than a couple of extra static arguments?  I think so.  It will provide a way to bind interesting specifications directly into the classfile, without cumbersome bytecode-based construction.  Examples:
 - a serialized AST structure, built from a mix of strings and method handles, to be interpreted
 - complex application-defined constants, such as lists or sets
 - similarly, templates for partly-constant data structures (the invokedynamic builds a factory method for the template)
 - vtables (i.e., maps of names to method handles)

All of these things can be created by executable bytecodes in <clinit>, but implementors will (in many cases) be able to create them more compactly from series of constants.  For example, a list of integer values will occupy 2+1+4 bytes per element if encoded as a sequence of static arguments.  (The '2' is the argv element; the 1+4 is the CONSTANT_Integer.)  Using <clinit> style bytecodes, the same element will require (1+2+3+1)+1+4 bytes, where the parenthesized numbers stand for a sequence of "aload buf; bipush J; ldc N; aastore".  (This sequence stores the element into an object array, which is going to be passed to something like Arrays.asList.)  The ratio is 7 to 11.  For integer values which repeat, the ratio is closer to 2 to 6.

There is a limit to this technique, of course, since the constant pool has only 65535 constants.  But this limit is shared with the <clinit> style technique.

A key use case for one or two BSM arguments is closure construction for Project Lambda.  Here, an extra static argument can specify a private synthetic method which gives the body (code, not data) of the closure.  The data parts are normal dynamic arguments.  The BSM produces a factory function which (efficiently) binds the data values to the statically specified closure body.  A second BSM argument might be the SAM type intended for the closure.  (That could also be inferred from the MethodType.)

Another key use case is an invokedynamic instruction that implements an arbitrary live constant, by linking the call site (of zero arguments) to a method handle which always returns the desired constant.  (MethodHandles.constant will do this.)  The only missing bit is the serialized data behind the live constant.  Again, allowing an essentially unbounded array gives implementors the right degree (I think) of flexibility.

If, instead of constants, we want templated values (think Groovy strings like "hello, $name"), the statically determined structure of the value can be expressed in static arguments to an invokedynamic, with the inserted values ("$name") passed on the stack.  The BSM produces a factory function which builds the desired result.  The BSM might use a templating engine to partially evaluate the static structure, so that the dynamically changing parts can be combined in at full speed.

(A useful thing missing here is substructure sharing:  What if two invokedynamic instructions need almost the same static arguments?  This can be dealt in user code, with via a static table created in <clinit> or a similar method.  Shared values can be referred to by small integers assigned by the language backend.  In essence, the components we are proposing help language implementors to build better versions of constant pools and vtables, with compactness and efficiency similar to the corresponding native structures.)

In conclusion:  It is true that most use cases for BSM arguments will only need one or two extra arguments.  But if we allow an array of strings, integer, method handles, etc., with a reasonable length, suddenly our language implementor friends have a flexible and natural way to use for encoding the "serialized" version of their live constants.

So, let's not just one or two static arguments, and not a useless 251, either, but rather a useful 65535.

(I'd go for a larger number, 2**31-1, but it would not mesh with the other 16-bit numbers in the class file format.  That's got to be fixed in a big format revision, another day.)

-- John

More information about the mlvm-dev mailing list