dl at cs.oswego.edu
Tue Dec 4 20:32:53 UTC 2018
On 12/4/18 12:12 PM, Andrew Haley wrote:
> Exactly, yes. The problem is that the current TheadLocal code is very
> complex, and if we restrict ourselves to a simple get() we can do
Could you explain? Of the possibilities I'm aware of that might be cheaper:
* The cheapest version is to access a field of current Thread/Fiber, as
is possible with Threads by defining Thread subclasses.
* Close behind is to have an index associated with each Thread/Fiber
that users could then use to access data in a separate array (or
whatever) that they otherwise manage themselves.
* Of variants hinted at by John Rose, the only potentially fast kind I
know would be to stack-allocate at initial frames of a Thread/Fiber, and
use a new form of VarHandle that can be passed in calls or even somehow
implicitly accessed via some form of "display" so they can be accessed
by children (in the same or a nested Thread/Fiber)
* Some updated form of RTSJ ScopedMemory regions.
> The fast path is 12 field loads, 5 conditional branches, and these are
> dependent loads, so have a lot of latency. We also suffer a fair bit
> from mispredicted branches, from the look of the profile.
* It would be possible to create a variant of ThreadLocal that does not
use WeakReferences, requiring explicit removal. This would reduce
* Some usages might be able to tolerate a version providing only
"static" ThreadLocals, that can use compile-time constants vs hashed keys.
Short of these restrictions, despite the overhead, current ThreadLocals
seem to be faster than any other general mechanism anyone has tried. But
any further ideas for making them cheaper would be welcome.
(Note also that we are sitting on some updates that will reduce garbage
retention of ThreadLocals under some GCs at the expense of adding a
"long" field per ThreadLocal. See
More information about the loom-dev