john.r.rose at oracle.com
Wed Dec 5 00:03:29 UTC 2018
On Dec 4, 2018, at 1:38 PM, dean.long at oracle.com wrote:
> On 12/4/18 12:32 PM, Doug Lea wrote:
>> On 12/4/18 12:12 PM, Andrew Haley wrote:
>>> Exactly, yes. The problem is that the current TheadLocal code is very
>>> complex, and if we restrict ourselves to a simple get() we can do
>> Could you explain? Of the possibilities I'm aware of that might be cheaper:
>> * The cheapest version is to access a field of current Thread/Fiber, as
>> is possible with Threads by defining Thread subclasses.
(This one doesn't scale to several modules defining several FLs.
It's probably an oversimplification to have just one, or just some
small N. Making it open-ended large N requires a search and/or
a sparse table.)
>> * Close behind is to have an index associated with each Thread/Fiber
>> that users could then use to access data in a separate array (or
>> whatever) that they otherwise manage themselves.
> I suppose we could also invert the above and having an index associated with
> each thread local to access an array in each fiber. We could allocate the index
> for the life of the VM if we had a flavor of "permanent" thread local.
> However, recycling indexes and resizing arrays when a non-permanent
> thread local becomes collectable by the GC doesn't sound trivial to me.
This is a stop on the way to the "sweet spot" I mentioned. You can certainly
allocate the index for the life of the VM, but if you also forbid collisions
(by using a plain array) you must allocate slices of those arrays for the
life of the VM, even if the FL has disappeared, perhaps due to class
unloading. (You can also try to storage-manage slices of those arrays,
recycling them when FLs are GCed, but that's not IMO a sweet spot.)
Keeping the binding index arrays (sometimes called "displays") limited
in size requires either a static model which assigns and packs
indexes into small arrays, or else allowing collisions. Collisions require search,
which is where I think the sweet spot it: Just make sure the search is
O(1) on average, and maybe O(log(N)) at worst, like HashMap.
So, I'd guess you are on the right track, but you haven't taken footprint
into account yet.
>> * Of variants hinted at by John Rose, the only potentially fast kind I
>> know would be to stack-allocate at initial frames of a Thread/Fiber, and
>> use a new form of VarHandle that can be passed in calls or even somehow
>> implicitly accessed via some form of "display" so they can be accessed
>> by children (in the same or a nested Thread/Fiber)
>> (see https://en.wikipedia.org/wiki/Call_stack#Lexically_nested_routines)
Implicit access is really at the root of the problem space we are discussing.
The thing being accessed is somehow coupled to the current JVM stack
frame, but without being a local, stack element, or monitor. We will end
up adding yet another kind of stack frame state here, I predict; I think it
will wedge in nicely next to the existing JVM monitor array, both in spec.
and implementation. Getting a caller's implicit state into a callee is what
requires all the clever searching and/or display management.
For the record again, I think we are right to jump directly to frame
locals after being forced to abandon ThreadLocal for a number of use
cases. Frame locals should align very directly to the JVM's frame-based
semantics, and so are likely to optimize better than hanging them off
of other, less primary entities. And if we succeed in making them fast,
library writers will be encouraged to move most of their use cases off
of ThreadLocal and onto frame locals.
Using a VH is a good way to package up the lookup "smarts". So is an
general Java object, but general APIs are not signature-polymorphic, so
VHs are likely to play a role here. Doug's observation is that if you can
obtain a constant VH you can parley that (with an extra indirection or so)
into non-constant proper variable.
I would like to reply, though, that lookup of an implicit *constant* value,
possibly inherited from a caller frame, is the proper primitive to start with.
In many cases, you just need a constant value (think "final Java variable").
In other cases, you can have the defining frame bind the constant value
to a mutable "box", either as a crude 1-element Java array, or using a VH.
Also, regarding frame-scoped heaps, those are IMO not the proper primitive
here. A frame-scoped heap must be paired with some sort of implicit access
to a root set and/or factory for the heap, so something like inherited frame
local constants is needed to go along with frame-scoped heaps. But if you
have inherited frame-local constants, you can represent heap roots using
VHs or Maps or the like, and represent factories using MHs or Suppliers
or the like. As for the storage lifetime of the frame-scoped heaps, it is an
interesting and useful performance model that when you leave the scope
the heap is (or might be) bulk-deallocated. But that model can be readily
approximated by other means than frame-scoped heaps. Just using
inherited frame-local displays, plus ordinary GC-able data structures,
will perform adequately for non-RT applications; there's not always a
need to do resettable storage. Properly resettable storage, OTOH,
requires special concurrency guarantees, such as linear chain of
ownership, or it is not safe. Without touching inheritable frame-local
variables, the Panama project is experimenting with resettable
resource scopes (which can malloc and free safely on behalf of
a Java thread). To me, all this indicates that frame-scoped heaps and
frane-local variables are independent capabilities, neither subsuming
> If I understand this correctly, we wouldn't have to map from a shared key object
> to something stack-local, because we would only allow access through
> something that is already stack-local. However, I don't see how the inner
> callee lookup could be directly associated with the outer binding. If each frame
> was passed a hidden list of bound VarHandles, then it seems like lookup would
> still need to search that list, though the list is probably short.
Yep. IMO the challenge is to allow the search but keep it short. BTW,
some outlier events, like deoptimization, or even JIT compilation, are
likely to overturn the carefully managed displays and require an episode
of searching. What I'm saying is that we should not reject a technique
if it *sometimes* requires searching, only if it *often* requires it, or maybe
if there is no way of making a QOS guarantee (such as O(log something)).
BTW, Dean, what you've done is a really cool first step. I'm not throwing
any shade here, just encouraging us to follow the logic of the tradeoffs.
One thing I like about your prototype is that you avoid registering user-defined
methods (which requires something like annotations to call them out statically)
in favor of a single method that creates *all* the bindings. I do think a more
human-friendly design might involve pairs of statically marked (annotated)
methods and private-static-final VH-like entities to get their values, rather
than unnamed FrameLocal objects, not statically associated with metadata.
There's got to be a metadata link, which in the prototype is to FL::doEnclosed,
but something more open-ended may be desirable in the end, and that leads
to a less object-oriented and more metadata-oriented design.
More information about the loom-dev