dl at cs.oswego.edu
Sun Dec 16 13:10:56 UTC 2018
On 12/9/18 3:12 PM, Alan Bateman wrote:
>> - Proxy for processor-local
>> - Contention-avoidance schemes
> This one is somewhat specialized and too early to say if it make sense
> to expose as an API or not. For now, it's the "processorid" branch in
> the loom repo where there is an Unsafe method to get the processor id.
I did some experimentation, and I'm pretty sure that this is NOT a good
alternative to using per-thread fields in current usages. These
(including LongAdder/Striped64, ForkJoinPool, Exchanger) dynamically
spread out (stripe) contention by arranging that different threads use
different variables/queues/whatever and then coordinating from there.
This is done by associating a form of hash code ("probe" field) with
each thread, and (infrequently) changing these probe hash codes on
An alternative is to use processorId as an index of some kind. Note that
usages must be (and already are) thread-safe because a thread may move
to a different core after obtaining ID, so it mostly acts as a hint.
There are a few contexts where processorId might be the only reasonable
option, but in current usages it is noticeably worse than current
* The underlying instructions (variants of CPUID on x86, plus decoding
to turn results into indices) tend to be slow. They vary across Intel
and AMD machines I tested on but is usually not reliably faster than a
conditional atomic instruction (CompareAndSet etc) that we are trying to
avoid calling more than once.
* In part because of GC-based pause/resumes that lead to rescheduling,
threads tend to frequently move across cores (at least on Linux test
machines). Spaces are dynamically created upon contention, so using
processorIds adds footprint without improving throughput. For example,
due to safepoints during expansion, some spaces are created that aren't
used because processorIds change on next use.
On microbenchmarks for LongAdder, the net results range from about 20%
to more than 100% slowdown across machines. Less systematic experiments
with the other uses don't look promising either. This does not strictly
argue for dropping getProcessorId because it might still be useful in
other contexts. But it is not a good replacement for current usages.
(While experimenting, I came up with a couple of tiny tweaks to
Striped64 that also apply to the current probe-based version, so will
check those in to make LongAdder etc a few percent faster.)
More information about the loom-dev