RFR(M): JDK-8163011: AArch64: NMT detail stack trace cleanup

Chris Plummer chris.plummer at oracle.com
Wed Sep 6 21:20:48 UTC 2017

On 9/6/17 12:26 AM, dmitry.samersov wrote:
> Chris,
>> You are always skipping 3
>> frames when it's a non-product build, and that isn't uniformly correct
>> for all platforms and for slowdebug vs fastdebug.
> NMT_InternalFrames means *maximum possible* number of internal frames.
> We check *a name of the frame* before we skip it, so we don't need to
> know how many frames got inlined on in particular build by particular
> compiler and this is the main idea of the fix.
> i.e. It's safe to set (e.g.) NMT_InternalFrames = 10 and keep this logic
> in production.
> I guard the changes by #ifndef PRODUCT to avoid memory overhead in
> production but if we can afford memory/performance penalty of
> storing 3 or 4 extra frames, it's better to keep this logic
> enabled ever in a product build.
> *Testing:*
> I'd tested 3 builds (release, fastdebug, slowdebug) on two platforms
> (Linux/x86_64, Linux/aarch64). All hotspot/runtime/NMT tests are passed
> including CheckForProperDetailStackTrace. Also, manual comparison of
> stacktraces on Linux/x86_64 don't show any changes in output.
Hi Dmitry,

So your assumption is that product builds don't need any additional 
frames pruned, and debug builds at most need os::get_native_stack and 
NativeCallStack:: pruned. In that case should add AllocateHeap to the 
list and simplify CheckForProperDetailStackTrace.java:

         // AllocateHeap shouldn't be in the output because it is 
supposed to always be inlined.
         // We check for that here, but allow it for Aix, Solaris and 
Windows slowdebug builds
         // because the compiler ends up not inlining AllocateHeap.
         Boolean okToHaveAllocateHeap =
             Platform.isSlowDebugBuild() &&
             (Platform.isAix() || Platform.isSolaris() || 
         if (!okToHaveAllocateHeap) {

A couple of questions:

  * Are you sure the frame name will always be demangled? I recall
    either macos or solaris not doing this reliably. I think this is why
    the test looks for patterns like "
    .*ModuleEntryTable.*new_entry.*\n". It looks like you didn't test
    either of these platforms.
  * What about platforms that do not do tail calls for
    os::get_native_stack, even for PRODUCT builds. I don't see that
    you've tested any of these either. I think you'll find
    os::get_native_stack is the NMT backtrace on these platforms after
    your changes.

I was also going to ask about os::current_frame() and get_previous_fp(), 
but if I understand correctly, you always limit the frames you print to 
4, so you never get to these frames. Is that correct? However I'd still 
be concerned that your changes cause os::current_frame() to no longer be 
consistent. I fixed it to always return the frame of whoever calls 
os:current_frame(). After your changes sometimes it will return the 
frame for os::current_frame(). This might have unintended side affects 
for other stack walking/printing code.



> -Dmitry
> On 05.09.2017 21:34, Chris Plummer wrote:
>> Hi Dmitry,
>> I've looked over the changes and some of the comments so far, and do
>> agree with Zhengyu regarding removal _NMT_NOINLINE_, but I also have
>> concerns about other platform dependent code you have removed.
>> _NMT_NOINLINE is only defined for slowdebug builds. You now are instead
>> trying to change the frame skipping logic to be based on the PRODUCT
>> flag. fastdebug builds do not set the PRODUCT flag, so you cannot
>> possibly get the frame skipping for both slowdebug and fastdebug builds
>> by checking the PRODUCT flag since they each have different frame
>> skipping requirements. Since we have/had no other way of telling the
>> difference between slowdebug and fastdebug builds, I continued to
>> leverage the _NMT_NOINLINE flag for this.
>> NMT_InternalFrames will should not always be 3 for non product builds.
>> It should not only vary between slowdebug and fastdebug, but it also
>> vary between platforms. This is due both to compiler differences and
>> code differences. That's why there is code like this:
>>    36     // We need to skip the NativeCallStack::NativeCallStack frame
>> if a tail call is NOT used
>>    37     // to call os::get_native_stack. A tail call is used if
>> _NMT_NOINLINE_ is not defined
>>    38     // (which means this is not a slowdebug build), and we are on
>> 64-bit (except Windows).
>>    39     // This is not necessarily a rule, but what has been obvserved
>> to date.
>>    40 #define TAIL_CALL (!defined(_NMT_NOINLINE_) && !defined(WINDOWS) &&
>> defined(_LP64))
>>    41 #if !TAIL_CALL
>>    42     toSkip++;
>>    43 #if (defined(_NMT_NOINLINE_) && defined(BSD) && defined(_LP64))
>>    44     // Mac OS X slowdebug builds have this odd behavior where
>> NativeCallStack::NativeCallStack
>>    45     // appears as two frames, so we need to skip an extra frame.
>>    46     toSkip++;
>>    47 #endif
>>    48 #endif
>> And also in _get_previous_fp(), which varies in name and implementation
>> for various os/cpu combos, the logic for the number of frames to skip is
>> not always the same.
>> You noted that:
>>> 1. This patch doesn't affect product build. On product build we have all
>>> NMT frames inlined and don't need to skip anything.
>> It's not true that for product builds we don't need to skip anything.
>> The code above indicates the tail call differences on some platforms,
>> which requires skipping a frame in product builds on some platforms, but
>> not others.
>> Thomas wrote:
>>> Code is easier to read now and less vulnerable
>>> to compiler decisions.
>> When the reality is that with your changes compiler decisions are being
>> ignored, as are implementation decisions. You are always skipping 3
>> frames when it's a non-product build, and that isn't uniformly correct
>> for all platforms and for slowdebug vs fastdebug.
>> I think your NMT_InternalFrames solution makes parts of the code easier
>> to read, but in order to be correct its computation needs to be platform
>> dependent, and also account for slowdebug/fastdebug differences.
>> I did write an NMT test to try to make sure frame skipping is correct.
>> It's called CheckForProperDetailStackTrace.java. However, I doubt it's
>> 100% reliable. You should run it with all 3 build flavors: release,
>> slowdebug, fastdebug. And you need to run it on all supported platforms.
>> For linux-arm32, it would be good to actually use an -marm build instead
>> of -mthumb since we don't even get stack traces with -mthumb.
>> thanks,
>> Chris
>> On 9/5/17 8:28 AM, Zhengyu Gu wrote:
>>> Hi Dmitry,
>>> I have concerns on this change:
>>> Although, you only extend tracking stacks for none-production build,
>>> eliminating _NMT_NOINLINE_ actually affect production code. Have you
>>> tested production build?
>>> Chris (cc'ed) worked on fixing stack walking before this change, we
>>> should get a feedback from him.
>>> Thanks,
>>> -Zhengyu
>>> On 09/05/2017 10:49 AM, dmitry.samersov wrote:
>>>> Everybody,
>>>> Please, review updated webrev:
>>>> http://cr.openjdk.java.net/~dsamersoff/JDK-8163011/webrev.06/
>>>> Only files below different from the previous webrev.
>>>>    src/share/vm/services/nmtCommon.hpp
>>>>    src/share/vm/utilities/nativeCallStack.cpp
>>>>    src/share/vm/utilities/nativeCallStack.hpp
>>>> 1. Changes guarded by #ifndef PRODUCT
>>>> 2. Addressed Thomas comments
>>>> -Dmitry
>>>> On 31.08.2017 10:49, dmitry.samersov wrote:
>>>>> Everybody,
>>>>> Please review:
>>>>> http://cr.openjdk.java.net/~dsamersoff/JDK-8163011/webrev.05/
>>>>>       I would propose different approach to fix JDK-8133740
>>>>> platform-independent way: record all frames but strip unnecessary
>>>>> NMT-internal ones on printing.
>>>>> This approach is safe (we don't depend to compiler inlining and we
>>>>> never
>>>>> strip non-NMT frames) and platform independent, but cost us some extra
>>>>> memory.
>>>>> -Dmitry

More information about the hotspot-runtime-dev mailing list