RFR (S): CR 8005926: (thread) Merge ThreadLocalRandom state into java.lang.Thread

Submitting this on behalf of Doug Lea. The webrev is here:

Bottom-line: merge ThreadLocalRandom state into Thread, to optimize many
use cases around j.u.c.* code. The simple performance tests on 2x2 i5,
Linux x86_64, 4 threads, 5 forks, 3x3s warmup, 5x3s measurement:

JDK8 (baseline)
              TLR.nextInt():  6.4 +- 0.1 ns/op
    TLR.current().nextInt(): 16.1 +- 0.4 ns/op
         TL.get().nextInt(): 19.1 +- 0.6 ns/op

JDK8 (patched)
              TLR.nextInt():  6.5 +- 0.2 ns/op
    TLR.current().nextInt():  6.4 +- 0.1 ns/op
         TL.get().nextInt(): 17.2 +- 2.0 ns/op

First line shows the peak performance of TLR itself, everything over
that is the ThreadLocal overhead. One can see the patched version
bypasses ThreadLocal machinery completely, and the overhead is slim to none.

N.B. It gets especially interesting when there are many ThreadLocals
registered. Making 1M ThreadLocals and pre-touching them bloats the
thread-local maps, and we get:

JDK8 (baseline), contaminators = 1M:
              TLR.nextInt():  6.4 +- 0.1 ns/op
    TLR.current().nextInt(): 21.7 +- 5.3 ns/op
         TL.get().nextInt(): 28.7 +- 1.1 ns/op

JDK8 (patched), contaminators = 1M:
              TLR.nextInt():  6.6 +- 0.2 ns/op
    TLR.current().nextInt():  6.5 +- 0.1 ns/op
         TL.get().nextInt(): 29.4 +- 0.5 ns/op

Note that patched version successfully dodges this pathological case.

  - Doug tested on his platforms
  - Tested Linux x86_64 to build and run successfully
  - JPRT builds are OK
  - JPRT tests are OK (modulo some weird lambda/default-methods test
failures in jdk8/tl)

  - dl: original patch
  - shade: testing, copyright headers, etc.


