RFR(M): 8133749, 8133747, 8133740: NMT detail stack trace cleanup

Chris Plummer chris.plummer at oracle.com
Thu Aug 18 19:50:50 UTC 2016

On 8/18/16 12:42 PM, Chris Plummer wrote:
> Hi Dmitry,
> These changes were already pushed last week. Chris was just responding 
> to a request I had to see if any of the open ports also needed any of 
> the fixes that were done for other platforms.
Sorry, I meant Andrew, not Chris.

> On 8/18/16 9:18 AM, Dmitry Samersoff wrote:
>> Chris,
>> 1. (general) intptr_t* _get_previous_fp()
>> I'm not sure we should rely on the fact that _get_previous_fp is
>> inlined.
> It would be nice to not rely on gcc for inlining (or not inlining), 
> but unfortunately the main theme of these bugs is fixing cases were we 
> are assuming, and the assumption is wrong. The following CR was filed 
> to address in general the fragility of relying on what compilers are 
> doing for inlining:
> JDK-8163899 NMT frame skipping code is fragile
>> AFAIK, gcc doesn't inline function if it has inline assembly.
> That wasn't my observation. If it were true, the test wouldn't be 
> passing on both debug and non-debug builds.
>> So it might be better to mark it explicitly by
>> __attribute__ ((noinline))
> I think what actually would be best is to get rid of 
> _get_previous_fp() and move it's logic into os:current_frame(). That's 
> pointed out in another related CR I filed:
> JDK-8163900 os::current_frame has a misleading name
>> 2. os_solaris_x86.cpp:299
>> Should we check os::is_first_C_frame(&myframe) before asking for caller
>> frame ?
> The other platforms don't do this. At best maybe they all should do so 
> as an assert to make sure the logic isn't flawed. If the logic is 
> flawed, it will crash, so the assert would just catch the reason earlier.
> thanks,
> Chris
>> -Dmitry
>> On 2016-08-02 23:31, Chris Plummer wrote:
>>> Hello,
>>> Please review the following:
>>> webrev:
>>> http://cr.openjdk.java.net/~cjplummer/8133749-8133747-8133740/webrev-01/webrev.hotspot/ 
>>> Bugs fixed:
>>> JDK-8133749: os::current_frame() is not returning the proper frame on
>>> ARM and solaris-x64
>>> https://bugs.openjdk.java.net/browse/JDK-8133749
>>> JDK-8133747: NMT includes an extra stack frame due to assumption NMT is
>>> making on tail calls being used
>>> https://bugs.openjdk.java.net/browse/JDK-8133747
>>> JDK-8133740: NMT for Linux/x86/x64 and bsd/x64 slowdebug builds 
>>> includes
>>> NativeCallStack::NativeCallStack() frame in backtrace
>>> https://bugs.openjdk.java.net/browse/JDK-8133740
>>> The above bugs all result in the NMT detail stack traces including 
>>> extra
>>> frames in the stack traces. Certain frames are suppose to be skipped,
>>> but sometimes are not. The frames that show up are:
>>>    NativeCallStack::NativeCallStack
>>>    os::get_native_stack
>>> These are both methods used to generate the stack trace, and therefore
>>> should not be included it.  However, under some (most) circumstances,
>>> they were.
>>> Also, there was no test to make sure that any NMT detail output is
>>> generated, or that it is correct. I've added one with this webrev. Of
>>> the 27 possible builds (9 platforms * 3 build flavors), only 9 of 
>>> the 27
>>> initially passed this new test. They were the product and fastdebug
>>> builds for solaris-sparc, bsd-x64, and linux-x64; and the slowdebug
>>> builds for solaris-x64, windows-x86, and windows-x64. All the rest
>>> failed. They now all pass with my fixes in place.
>>> Here's a summary of the changes:
>>> src/os/posix/vm/os_posix.cpp
>>> src/os/windows/vm/os_windows.cpp
>>> JDK-8133747 fixes: There was some frame skipping logic here which was
>>> sort of correct, but was misplace. There are no extra frames being 
>>> added
>>> in os::get_native_stack() due to lack of inlining or lack of a tail
>>> call, so no need for toSkip++ here. The logic has been moved to
>>> NativeCallStack::NativeCallStack, which is where the tail call is
>>> (sometimes) made, and also corrected (see nativeCallStack.cpp below).
>>> src/share/vm/utilities/nativeCallStack.cpp
>>> JDK-8133747 fixes: The frame skipping logic that was moved here assumed
>>> that NativeCallStack::NativeCallStack would not appear in the call 
>>> stack
>>> (due to a tail call be using to call os::get_native_stack) except in
>>> slow debug builds. However, some platforms also don't use a tail call
>>> even when optimized. From what I can tell that is the case for 32-bit
>>> platforms and for windows.
>>> src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
>>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>> JDK-8133740 fixes: When _get_previous_fp is not inlined, we need to 
>>> skip
>>> one extra frame
>>> src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp
>>> JDK-8133749 fixes: os:current_frame() was not consistent with other
>>> platforms and needs to skip one more frame. This means it returns the
>>> frame for the caller's caller. So when called by os:get_native_stack(),
>>> it returns the frame for whoever called os::get_native_stack(). 
>>> Although
>>> not intuitive, this is what os:get_native_stack() expects. Probably a
>>> method rename and/or a behavior change is justified here, but I would
>>> prefer to do that with a followup CR if anyone has a good suggestion on
>>> what to do.
>>> test/runtime/NMT/CheckForProperDetailStackTrace.java
>>> This is the new NTM detail test. It checks for frames that shouldn't be
>>> present and validates at least one stack trace is what is expected.
>>> I verified that the above test now passes on all supported platforms,
>>> and also did a full jprt "-testset hotpot" run. I plan on doing some 
>>> RBT
>>> testing with NMT detail enabled before committing.
>>> Regarding the community contributed ports that Oracle does not support,
>>> I didn't make any changes there, but it looks like some of these 
>>> bugs do
>>> exist. Notably:
>>> -linux-aarch64: Looks like it suffers from JDK-8133740. The changes 
>>> done
>>> to the
>>>   os_linux_x86.cp should also be applied here.
>>> -linux-ppc: Hard to say for sure since the implementation of
>>> os::current_frame is
>>>   different than others, but it looks to me like it suffers from both
>>> JDK-8133749
>>>   and JDK-8133740.
>>> -aix-ppc: Looks to be the same implementation as linux-ppc, so would
>>> need the
>>>   same changes.
>>> These ports may also be suffering from JDK-8133747, but that fix is in
>>> shared code (nativeCallStack.cpp). My changes there will need some
>>> tweaking for these ports they don't use a tail call to call
>>> os::get_native_stack().
>>> If the maintainers of these ports could send me some NMT detail output,
>>> I can advise better on what changes are needed. Then you can implement
>>> and test them, and then send them back to me and I'll include them with
>>> my changes. What I need is the following command run on product and
>>> slowdebug builds. Initially run without any of my changes applied. If
>>> needed I may followup with a request that they be run with the changes
>>> applied:
>>> bin/java -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=detail
>>> -XX:+PrintNMTStatistics -version
>>> thanks,
>>> Chris

More information about the hotspot-runtime-dev mailing list