Code Review request for bug fixes found by the Contended Locking project

Daniel D. Daugherty daniel.daugherty at
Fri Jan 18 09:38:19 PST 2013

On 1/17/13 8:49 PM, Vitaly Davidovich wrote:
> Hi Dan,
> Yup, those are the patterns I was referring to.

First off, credit where credit is due. The original author of these
changes is Dave Dice from Sun/Oracle Labs. I'm shepherding these
changes into HSX-25 for Dave.

> So to make sure I understand correctly, you're saying that some 
> pthreads implementations either (a) allow code motion by compiler that 
> moves other code above the unlock() call or (b) generate assembly 
> instructions that don't entail a full fence and thus allow the CPU to 
> reorder surrounding memory accesses.

Yes, but more important than what a particular implementation does
is the fact that this is permissible (and should be permissible).

> I'm a bit unclear on what it means, at compiler or CPU/assembly level, 
> for pthreads implementation to defer memory fences until a later point 
> in time - is there an example?

The minimum requirement upon a pthreads implementation is "entry
consistency with regards to the same lock". How this translates
from a particular pthreads implementation down to the underlying
chip level is highly dependent on both of those components and
way too gory to dive into in this thread. There are academic
papers on this topic that will likely be good sleep aides... :-)

> Is this a theoretical issue that you guys spotted or did someone come 
> across a bug that was due to this?

The set of fixes that are the subject of this review are an extension
of work done on a real bug way back in December 2009:

Changeset: 95e9083cf4a7
Author:    dholmes
Date:      2009-12-01 22:29 -0500

6822370: ReentrantReadWriteLock: threads hung when there are no threads holding onto the lock (Netra x4450)
Summary: This day one bug is caused by missing memory barriers in various Parker::park() paths that can result in lost wakeups and hangs.
Reviewed-by: dice, acorn

! src/os/linux/vm/os_linux.cpp
! src/os/solaris/vm/os_solaris.cpp

> Also, what performance effect, if any, will this have for platforms 
> using a strong pthreads impl? Two consecutive full barriers will be 
> executed right after each other (don't know whether compiler will pick 
> up on this and coalesce them into 1).  Maybe the 2nd one is "cheap" 
> since the first one will take the hit of stalling the pipeline and 
> emptying store buffers.

Dave Dice has asserted that the new fence() calls added in this
changeset will not affect performance on any of the platforms
that he tested.

> Finally, what supported architectures/platforms could have this 
> problem? I know Alpha is funky, but I don't think hotspot supports 
> it.  What's left, PowerPC or something?

I believe the platforms that we are worried about are ARM and PowerPC.
However, that's not really the point. To reiterate what I said last night:

     pthreads is allowed to presume that everyone is playing nice and
     using locks; this allows pthreads to make some optimizations of
     their own like deferring memory syncing to certain points.

There is no way to ask a pthreads implementation:

     What optimizations do you have that might mess up my optimizations?

so you have no way knowing if the pthreads you are depending on is "strong"
or "loose". And the other important point is that you have no way of knowing
if the "pthreads" you have today won't add optimizations tomorrow that will
mess up your own optimizations.

> Sorry to sidetrack this thread a bit. :)

You aren't asking anything that we didn't agonize over in our internal
review cycle. However, I do hope it doesn't take 50+ e-mails to resolve
the issue to your satisfaction.

> Thanks for your response

No problem. You should check out Dave Dice's blog (referred to in the
triangular-race.txt document) and then check out the academic papers
that Dave refers to...


> Sent from my phone
> On Jan 17, 2013 10:13 PM, "Daniel D. Daugherty" 
> <daniel.daugherty at <mailto:daniel.daugherty at>> wrote:
>     Vitaly,
>     Thanks for the question. Reply embedded below.
>     On 1/17/13 4:16 PM, Vitaly Davidovich wrote:
>>     Hi Dan,
>>     Just curious - what's the reason for adding OrderAccess::fence()
>>     after the pthread_mutex_unlock() calls? Is this working around
>>     some libc issues or something? I'd expect those unlocks to
>>     provide the ordering and visibility guarantee.
>     I presume you are asking about one of these two patterns:
>     src/os/linux/vm/os_linux.cpp:
>     4979 void os::PlatformEvent::park() {
>     <snip>
>     5004     _Event = 0 ;
>     5005      status = pthread_mutex_unlock(_mutex);
>     5006      assert_status(status == 0, status, "mutex_unlock");
>     5007     // Paranoia to ensure our locked and lock-free paths interact
>     5008     // correctly with each other.
>     5009     OrderAccess::fence();
>     or
>     5194 void Parker::park(bool isAbsolute, jlong time) {
>     <snip>
>     5240     _counter = 0;
>     5241     status = pthread_mutex_unlock(_mutex);
>     5242     assert (status == 0, "invariant") ;
>     5243     // Paranoia to ensure our locked and lock-free paths interact
>     5244     // correctly with each other and Java-level accesses.
>     5245     OrderAccess::fence();
>     5246     return;
>     The short answer is that we use both locked and lock-free paths
>     in os::PlatformEvent::park(), os::PlatformEvent::unpark() and in
>     Parker::park(). Those optimizations on our part may not play well
>     with optimizations done in a looser pthreads implementation. This
>     is _not_ a bug in the pthreads implementation. pthreads is allowed
>     to presume that everyone is playing nice and using locks; this
>     allows pthreads to make some optimizations of their own like
>     deferring memory syncing to certain points.
>     The slightly longer answer is:
>     These changes were made as part of the fix for the following bug:
>         8004902 correctness fixes motivated by contended locking work
>     (6607129)
>     In the last paragraph of the triangular-race.txt attachment of
>     8004902,
>     I wrote the following about these fence() calls:
>>     The addition of fence() calls after the second and third settings of
>>     "_counter = 0" was done in the original fix for 6822370 for similar
>>     reasons: worries about weaker memory model semantics. Those two
>>     settings
>>     of "_counter = 0" are done while the mutex lock is held and they
>>     are followed by unlock(_mutex) operations. In all of the transaction
>>     diagrams, I included a "// _counter flushes" comment after T2 did an
>>     "unlock(_mutex)" operation. However, there is concern that
>>     unlock(_mutex)
>>     may not force _counter to be flushed. It appears that a loose
>>     pthreads
>>     implementation is allowed to be lax on memory flushes on unlock
>>     as long
>>     as everyone uses locks. Since unpark() always uses locks, I think
>>     it is
>>     OK for the transaction diagrams to show T2's unlock() calls
>>     including the
>>     "// _counter flushes" comment. In support of this, I'll note that the
>>     original fix for 6822370 does not include fence() calls after
>>     unpark()
>>     calls unlock(_mutex). Dave D's latest changes also do not include
>>     fence()
>>     calls after unpark() calls unlock(_mutex)
>     There is some context in the above explanation that assumes that you
>     have read other parts of triangular-race.txt.
>     For the extremely long answer about the entire triangular race, I
>     refer you to the triangular-race.txt attachment.
>     Thanks again for the question. Please let me know if I have not
>     addressed
>     the question to your satisfaction.
>     Dan
>>     Thanks
>>     Sent from my phone
>>     On Jan 17, 2013 1:46 PM, "Daniel D. Daugherty"
>>     <daniel.daugherty at
>>     <mailto:daniel.daugherty at>> wrote:
>>         Greetings,
>>         I have been working on the Contended Locking project and
>>         there are some
>>         bug fixes that I'd like to push into HSX-25 independently of
>>         the Contended
>>         Locking project.
>>         I'm using three different bug IDs to track these very
>>         distinct bug
>>         fixes:
>>             6444286 Possible naked oop related to biased locking
>>         revocation
>>                     safepoint in jni_exit()
>>             8004902 correctness fixes motivated by contended locking
>>         work (6607129)
>>             8004903 VMThread::execute() calls
>>         Thread::check_for_valid_safepoint_state()
>>                     on concurrent VM ops
>>         Here is the URL for the webrev:
>> <>
>>         These changes have been through two internal rounds of code
>>         review so I
>>         think they are ready for wider review. The changes for
>>         8004902 are very
>>         tricky and there is long write-up attached to the 8004902 bug
>>         report that
>>         explains the intricate details of the "triangular race". The
>>         changes for
>>         6444286 and 8004903 are more straight forward.
>>         These changes have been through JPRT build-and-test and have
>>         also been
>>         tested with the vm.parallel_class_loading and vm.quick
>>         subsuites via
>>         the Aurora ad-hoc testing mechanism.
>>         The current set of reviewers is:
>>             acorn, dholmes, dice
>>         As always, other reviewers are welcome. Comments, questions and
>>         suggestions are also welcome.
>>         Dan

-------------- next part --------------
An HTML attachment was scrubbed...

More information about the hotspot-runtime-dev mailing list