These are primarily being emitted for volatile fences so many programs won't care about their speed at all.  If you look at my other email it suggests that the difference is that Intel chips prior to Nehalem had heavier weight implementation of lock addl than was required.  mfence stayed approximately the same between processor versions with it's speed pretty much tracking the relative clock speeds, 2.4 for the Tigerton and 2.8 for Nehalem.  The original data suggested no performance change on Nehalem when switching instructions so it probably doesn't care either way.


