MFENCE vs. LOCK addl

Jiva, Azeem Azeem.Jiva at
Wed Feb 25 05:53:54 PST 2009

I was looking at memory barrier performance and noticed that HotSpot
uses MFENCE as a memory barrier in 64bit mode.  MFENCE is significantly
slower than using a LOCKed instruction, since MFENCE is serializing
(similar to CPUID).   I'd like to recommend the following change:

// Serializes memory.
void Assembler::mfence() {
  // Memory barriers are only needed on multiprocessors
  if (os::is_MP()) {    
      // All usable chips support "locked" instructions which suffice
      // as barriers, and are much faster than the alternative of
      // using cpuid or mfence instructions. We use here a locked add
      // This is conveniently otherwise a no-op except for blowing
      // flags (which we save and restore.)	  
      pushf();                // Save eflags register      	  
      addl(Address(rsp, 0), 0);// Assert the lock# signal here
      popf();                 // Restore eflags register


Sorry it's not a diff, but I'm not setup with mercurial yet.  Only
application I've ran is SPECjbb2005, and there are no regressions or
gains.  Mostly because the generated code from SPECjbb2005 doesn't use
MFENCE in any significant amount.  

Azeem Jiva
AMD Java Labs
T 512.602.0907

More information about the hotspot-compiler-dev mailing list