lazy statics design notes

John Rose john.r.rose at
Sat Mar 2 23:15:16 UTC 2019

Remi, Maurizio, Brian, I shot my last round, and I'm out.
I agree we shouldn't tinker with the (value sets of the) types.

Instead let's reach for ways to adjoin extra sentinel values
in the case of lazies (and optionals, and lazies of optionals),
of both null-default and zero-default types.  These sentinel
values will encode as disjoint from the base value set of the
type T (whether T is null-default/ref or zero-default/prim).

Sentinels will denote the states outside of the normal "T value
is present" state, either:  unbound-lazy or empty-optional.
A lazy optional needs both sentinels, while a plain lazy or
optional needs just one.

In the case where T is a reference, the JVM might add in one
or two new references (perhaps with tag bits for extra dynamic
checking).  This can be done outside the safe type system in
the case of the JVM, if it puts the right decoding barriers in
the right places, to strip the sentinels before using them in
a T-safe manner.

In the case where T's encoding space is fully tensioned (like int)
the sentinel will have to take the form of an extra field of
two states.  One is "I'm the sentinel" and the other is "there's
a T value in my other component".  This is just Optional all
over again, which uses a sentinel (null!) today.

(If two sentinels are required, for a lazy-optional, then the extra
field can take three states.  Or we append two extra fields.)

If we are buffering T on the heap in a stand-alone object, the
extra state can (with some ad hoc hacking) be folded into the
object header, because it is almost certain that the object header
has some slack that's usable for the purpose.  Since buffered value
object's won't need to store synchronization state (individually,
at least), the bits which usually denote synchronization state can
be co-opted to store a sentinel state, for a buffered T.  This usually
won't be necessary, though, since if a T value is buffered, the
client that is holding the reference is also capable of holding
a real null, which more directly represents an out-of-type value
point for T.  This is today's situation with Integer, which is
null-default, while its payload type int is zeroable but not nullable.

If we were to load a value-like Integer onto the stack, the extra
sentinel field would have to be manufactured like this:
    boolean hasPayload = (p == null ? false : true);
    int payload = (p == null ? int.default : p.value);
This pair of values on stack would act like a value type whose
default zero bits encode null, while an ordinary int payload value
would be accompanied by a 'true' bit that distinguishes it from
the null encoding.  This value type should, of course, be null-default,
even though it carries a zero-default payload.

In the case where T's encoding space has some slack (like boolean)
a sentinel or two can be created by using unencoded bit patterns.
If T is a value type containing a reference or floating point field,
then the option exists to "steal" the encoding from inside that field.

The all-zero-bits state is favorable in the heap because it is most
reliably the first state of the object.  In the case of both optional
and lazy (lazy-optional is just lazy here), the sentinel encodes
the initial state, which encourages us to implement the sentinel
with a default value (zero or null) for T.  This means that the normal
corresponding default (zero or null) should actually be encoded
with a special sentinel value.

On the stack the all-zero-bits state is less directly useful, but of
course it's good if the stack and heap encodings can be as close
as possible.

The getfield operation which loads a lazy instance field should do
two things: 1. check for the encoding of the unbound state (which
should be all-zeroes), 2. check for the encoding of the bound-to-default
state (which should be a specially crafted sentinel).  In case 1, the
lazy value binding code must be executed.  In case 2, the sentinel
must be replaced by a true default value.  Something like this probably
needs to happen anyway for null-default value types, since the
zero-default encoding of a null-default value type needs to be
replaced by a null pointer when it is loaded.

It looks to me like there are at least three places where a "raw"
value is "wrapped" to give it adjusted semantics.  First, a null-default
value type wraps the underlying zero-default bits by swapping
out the zero and swapping in the null.  Second, an optional wraps
the internal value by adjoining the "empty" value point.  Third,
a lazy wraps its non-lazy value by adjoining the "unbound" state.

Sentinels are just one way to do it; surely there are others.  But if
you don't use sentinels in some capacity to overlay new values on
T's value set, you probably need a side bit to convey the variable's
state; as I've said before, managing that correctly seems to require
transactional memory.

Condy doesn't require a sentinel.  But of course HotSpot *internally*
uses a sentinel to distinguish a resolved null value from the unresolved
state.  The unresolved state is a null pointer while a resolved null
is a special out-of-type non-null reference (called "the null sentinel")
which condy swaps out for a resolved null after it does the null check.
That's the same trick as I've described above.  Surprise; I wrote it.
Great minds may think alike, but mediocre minds think the same
thing repeatedly.

— John

More information about the valhalla-spec-observers mailing list