RFR(M): 8133749, 8133747, 8133740: NMT detail stack trace cleanup

Chris Plummer chris.plummer at oracle.com
Wed Aug 10 05:39:30 UTC 2016

On 8/9/16 9:19 PM, David Holmes wrote:
> On 10/08/2016 1:56 PM, Chris Plummer wrote:
>> On 8/8/16 5:52 PM, David Holmes wrote:
>>> On 9/08/2016 6:22 AM, Chris Plummer wrote:
>>>> Hi David,
>>>> Did you want me to implement any of the additional cleanup work I
>>>> mentioned: manually inline _get_previous_fp, change 
>>>> os::current_frame()
>>>> to walk back one less frame, possibly rename os::current_frame()?
>>> Up to you. I'm not insisting on anything, but the less reliance we
>>> have on uncheckable (at build time) compiler behaviour, the better.
>> Ok. Given the relatively short amount of time left to resolve p4s, I
>> think I will just leave it as-is. I've started some more robust testing
>> the NMT detailed enabled. Once that completes I'll do the push. I'll
>> also file a couple of RFEs for further cleanup/improvements.
>> Can I consider the changes officially reviewed by you now?
> Yes.
> Thanks,
> David

>> thanks,
>> Chris
>>> Thanks,
>>> David
>>>> thanks,
>>>> Chris
>>>> On 8/7/16 4:26 PM, David Holmes wrote:
>>>>> Hi Chris,
>>>>> I don't have any good suggestions for this. So go with (2) and lets
>>>>> work on (3).
>>>>> Thanks,
>>>>> David
>>>>> On 5/08/2016 5:05 PM, Chris Plummer wrote:
>>>>>> Hi David,
>>>>>> If fixing os::current_frame() to have a better name and also make
>>>>>> it go
>>>>>> up one less frame makes these changes more palatable, I'm willing to
>>>>>> make that change. I would prefer to do it with a follow up CR (it
>>>>>> would
>>>>>> probably have to be an RFE), but will do it with these changes if
>>>>>> necessary. I still pull hairs over the proper name for this method,
>>>>>> even
>>>>>> if it is modified to return the frame of whoever called it. Usually
>>>>>> the
>>>>>> meaning conveyed by a method's name does not change based on
>>>>>> whether you
>>>>>> choose the caller's or callee's point of view, but in this case it
>>>>>> does,
>>>>>> and I'm not sure which point of view makes more sense. If we choose
>>>>>> the
>>>>>> caller's point of view, then the proper name remains
>>>>>> os::current_frame(). If we choose the callee's point of view, 
>>>>>> then it
>>>>>> should be os::callers_frame(). Maybe there's a name that is
>>>>>> agnostic and
>>>>>> means the same thing from both view points. I just haven't 
>>>>>> thought of
>>>>>> one yet.
>>>>>> With respect to ALWAYSINLINE, it does not work for solaris and 
>>>>>> windows
>>>>>> slowdebug builds. Note the special case in the test I wrote to
>>>>>> allow for
>>>>>> AllocateHeap() in the stack trace in this case, even though it
>>>>>> shouldn't
>>>>>> be there because it uses ALWAYSINLINE. I could have made changes in
>>>>>> the
>>>>>> source to get rid of it from the stack trace, but I didn't feel the
>>>>>> source code disruption was worth it for a slowdebug build, 
>>>>>> especially
>>>>>> since there are only a allocation call sites where it is a 
>>>>>> problem. I
>>>>>> could use ALWAYSINLINE for the cases where it will work to inline
>>>>>> _get_previous_fp, but I don't really see that as being any more
>>>>>> reliable
>>>>>> than what is there now.
>>>>>> As for making _get_previous_fp() a macro, that's made more 
>>>>>> complicated
>>>>>> because it has #ifdefs already. I could move its implementation
>>>>>> directly
>>>>>> into os::current_frame(). That would fix the inlining problem. I 
>>>>>> think
>>>>>> it could also use some cleanup with the #ifdefs. For example, for
>>>>>> linux-x86 do we have to worry about the SPARC_WORKS and __clang__
>>>>>> cases?
>>>>>> And yes, even with my changes the code is no less fragile, and no 
>>>>>> less
>>>>>> misdirected in its approach to getting a consistent allocation back
>>>>>> trace.  As I see it, there are 3 options:
>>>>>> (1) Do nothing, and leave it both broken and fragile.
>>>>>> (2) Do the cleanup I've done to at least correct the known stack 
>>>>>> trace
>>>>>> issues.
>>>>>> (3) Find another solution that doesn't suffer from these fragility
>>>>>> issues.
>>>>>> Note that (3) does not preclude doing (2) first, and (2) seems a
>>>>>> better
>>>>>> alternative than leaving it in its broken state (1). That's why I 
>>>>>> have
>>>>>> pursued these changes even though I know things will still be 
>>>>>> fragile.
>>>>>> thanks,
>>>>>> Chris
>>>>>> On 8/4/16 9:47 PM, David Holmes wrote:
>>>>>>> Hi Chris,
>>>>>>> On 5/08/2016 7:53 AM, Chris Plummer wrote:
>>>>>>>> Ping!
>>>>>>> I took another look at this and my earlier comments from 
>>>>>>> JDK-8133749.
>>>>>>> I hate to see the functionality "fixed" yet still have a completely
>>>>>>> confusing and mis-named API. I'm still far from convinced that
>>>>>>> returning the callers caller wasn't an "error" that was done due to
>>>>>>> the lack of inlining and the appearance of an unexpected 
>>>>>>> stackframe.
>>>>>>> You've now made things consistent - but os::current_frame() is
>>>>>>> completely mis-leading in name. And I'm still concerned that
>>>>>>> correctness here depends on C compiler inlining choices, with no 
>>>>>>> way
>>>>>>> to verify at build time that they were indeed inlined or not!
>>>>>>> Don't we
>>>>>>> have ALWAYSINLINE to mark things like _get_previous_fp ? For that
>>>>>>> matter shouldn't _get_previous_fp be a macro so inlining plays no
>>>>>>> role ?
>>>>>>> Sorry but this code seems to simply limp from one broken state to
>>>>>>> another due to its fragility.
>>>>>>> Thanks,
>>>>>>> David
>>>>>>> -----
>>>>>>>> On 8/2/16 1:31 PM, Chris Plummer wrote:
>>>>>>>>> Hello,
>>>>>>>>> Please review the following:
>>>>>>>>> webrev:
>>>>>>>>> http://cr.openjdk.java.net/~cjplummer/8133749-8133747-8133740/webrev-01/webrev.hotspot/ 
>>>>>>>>> Bugs fixed:
>>>>>>>>> JDK-8133749: os::current_frame() is not returning the proper
>>>>>>>>> frame on
>>>>>>>>> ARM and solaris-x64
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8133749
>>>>>>>>> JDK-8133747: NMT includes an extra stack frame due to assumption
>>>>>>>>> NMT
>>>>>>>>> is making on tail calls being used
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8133747
>>>>>>>>> JDK-8133740: NMT for Linux/x86/x64 and bsd/x64 slowdebug builds
>>>>>>>>> includes NativeCallStack::NativeCallStack() frame in backtrace
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8133740
>>>>>>>>> The above bugs all result in the NMT detail stack traces 
>>>>>>>>> including
>>>>>>>>> extra frames in the stack traces. Certain frames are suppose 
>>>>>>>>> to be
>>>>>>>>> skipped, but sometimes are not. The frames that show up are:
>>>>>>>>>   NativeCallStack::NativeCallStack
>>>>>>>>>   os::get_native_stack
>>>>>>>>> These are both methods used to generate the stack trace, and
>>>>>>>>> therefore
>>>>>>>>> should not be included it.  However, under some (most)
>>>>>>>>> circumstances,
>>>>>>>>> they were.
>>>>>>>>> Also, there was no test to make sure that any NMT detail 
>>>>>>>>> output is
>>>>>>>>> generated, or that it is correct. I've added one with this
>>>>>>>>> webrev. Of
>>>>>>>>> the 27 possible builds (9 platforms * 3 build flavors), only 9
>>>>>>>>> of the
>>>>>>>>> 27 initially passed this new test. They were the product and
>>>>>>>>> fastdebug
>>>>>>>>> builds for solaris-sparc, bsd-x64, and linux-x64; and the 
>>>>>>>>> slowdebug
>>>>>>>>> builds for solaris-x64, windows-x86, and windows-x64. All the 
>>>>>>>>> rest
>>>>>>>>> failed. They now all pass with my fixes in place.
>>>>>>>>> Here's a summary of the changes:
>>>>>>>>> src/os/posix/vm/os_posix.cpp
>>>>>>>>> src/os/windows/vm/os_windows.cpp
>>>>>>>>> JDK-8133747 fixes: There was some frame skipping logic here
>>>>>>>>> which was
>>>>>>>>> sort of correct, but was misplace. There are no extra frames 
>>>>>>>>> being
>>>>>>>>> added in os::get_native_stack() due to lack of inlining or lack
>>>>>>>>> of a
>>>>>>>>> tail call, so no need for toSkip++ here. The logic has been
>>>>>>>>> moved to
>>>>>>>>> NativeCallStack::NativeCallStack, which is where the tail call is
>>>>>>>>> (sometimes) made, and also corrected (see nativeCallStack.cpp
>>>>>>>>> below).
>>>>>>>>> src/share/vm/utilities/nativeCallStack.cpp
>>>>>>>>> JDK-8133747 fixes: The frame skipping logic that was moved here
>>>>>>>>> assumed that NativeCallStack::NativeCallStack would not appear
>>>>>>>>> in the
>>>>>>>>> call stack (due to a tail call be using to call
>>>>>>>>> os::get_native_stack)
>>>>>>>>> except in slow debug builds. However, some platforms also don't
>>>>>>>>> use a
>>>>>>>>> tail call even when optimized. From what I can tell that is the
>>>>>>>>> case
>>>>>>>>> for 32-bit platforms and for windows.
>>>>>>>>> src/os_cpu/bsd_x86/vm/os_bsd_x86.cpp
>>>>>>>>> src/os_cpu/windows_x86/vm/os_windows_x86.cpp
>>>>>>>>> src/os_cpu/linux_x86/vm/os_linux_x86.cpp
>>>>>>>>> JDK-8133740 fixes: When _get_previous_fp is not inlined, we 
>>>>>>>>> need to
>>>>>>>>> skip one extra frame
>>>>>>>>> src/os_cpu/solaris_x86/vm/os_solaris_x86.cpp
>>>>>>>>> JDK-8133749 fixes: os:current_frame() was not consistent with 
>>>>>>>>> other
>>>>>>>>> platforms and needs to skip one more frame. This means it
>>>>>>>>> returns the
>>>>>>>>> frame for the caller's caller. So when called by
>>>>>>>>> os:get_native_stack(), it returns the frame for whoever called
>>>>>>>>> os::get_native_stack(). Although not intuitive, this is what
>>>>>>>>> os:get_native_stack() expects. Probably a method rename and/or a
>>>>>>>>> behavior change is justified here, but I would prefer to do that
>>>>>>>>> with
>>>>>>>>> a followup CR if anyone has a good suggestion on what to do.
>>>>>>>>> test/runtime/NMT/CheckForProperDetailStackTrace.java
>>>>>>>>> This is the new NTM detail test. It checks for frames that
>>>>>>>>> shouldn't
>>>>>>>>> be present and validates at least one stack trace is what is
>>>>>>>>> expected.
>>>>>>>>> I verified that the above test now passes on all supported
>>>>>>>>> platforms,
>>>>>>>>> and also did a full jprt "-testset hotpot" run. I plan on doing
>>>>>>>>> some
>>>>>>>>> RBT testing with NMT detail enabled before committing.
>>>>>>>>> Regarding the community contributed ports that Oracle does not
>>>>>>>>> support, I didn't make any changes there, but it looks like 
>>>>>>>>> some of
>>>>>>>>> these bugs do exist. Notably:
>>>>>>>>> -linux-aarch64: Looks like it suffers from JDK-8133740. The 
>>>>>>>>> changes
>>>>>>>>> done to the
>>>>>>>>>  os_linux_x86.cp should also be applied here.
>>>>>>>>> -linux-ppc: Hard to say for sure since the implementation of
>>>>>>>>> os::current_frame is
>>>>>>>>>  different than others, but it looks to me like it suffers from
>>>>>>>>> both
>>>>>>>>> JDK-8133749
>>>>>>>>>  and JDK-8133740.
>>>>>>>>> -aix-ppc: Looks to be the same implementation as linux-ppc, so
>>>>>>>>> would
>>>>>>>>> need the
>>>>>>>>>  same changes.
>>>>>>>>> These ports may also be suffering from JDK-8133747, but that fix
>>>>>>>>> is in
>>>>>>>>> shared code (nativeCallStack.cpp). My changes there will need 
>>>>>>>>> some
>>>>>>>>> tweaking for these ports they don't use a tail call to call
>>>>>>>>> os::get_native_stack().
>>>>>>>>> If the maintainers of these ports could send me some NMT detail
>>>>>>>>> output, I can advise better on what changes are needed. Then you
>>>>>>>>> can
>>>>>>>>> implement and test them, and then send them back to me and I'll
>>>>>>>>> include them with my changes. What I need is the following 
>>>>>>>>> command
>>>>>>>>> run
>>>>>>>>> on product and slowdebug builds. Initially run without any of my
>>>>>>>>> changes applied. If needed I may followup with a request that
>>>>>>>>> they be
>>>>>>>>> run with the changes applied:
>>>>>>>>> bin/java -XX:+UnlockDiagnosticVMOptions
>>>>>>>>> -XX:NativeMemoryTracking=detail -XX:+PrintNMTStatistics -version
>>>>>>>>> thanks,
>>>>>>>>> Chris

More information about the hotspot-runtime-dev mailing list