Extend NMT to JDK native libraries?

Zhengyu Gu zgu at redhat.com
Wed Nov 21 15:31:22 UTC 2018

Hi Thomas,

On 11/21/18 9:59 AM, Thomas Stüfe wrote:
> Thanks Zhengyu.
> Hm, closed as wont fix I see. But the solution outlined there differs
> from our proposal.

It was just a placeholder. In fact, we did discussed the approach that 
you purposed. IIRC, we concluded that initial effort was just too great.

Global replacement of ::malloc()/::free()/::realloc() can be a good 
starting point, at the end, we do want library code to follow hotspot's 
convention - provides correct memory tags. A single mtOther for all 
library code does not seem very helpful.

I do believe NMT can be very helpful for library development too. It 
helped us to catch several memory leaks during Shenandoah development.

The gradual approach, you mentioned, make sense to me. Matter of fact, 
we already started with Unsafe.allocateMemory() :-)



> (I think even if we were to instrument parts of the JDK - e.g. just
> NIO - this would already be very helpful. In parts we do this already
> for mtOther.).
> On Wed, Nov 21, 2018 at 3:54 PM Zhengyu Gu <zgu at redhat.com> wrote:
>> FYI: There was a phase 2 RFE: Native Memory Tracking (Phase 2)
>> https://bugs.openjdk.java.net/browse/JDK-6995789
>> -Zhengyu
>> On 11/21/18 9:28 AM, Thomas Stüfe wrote:
>>> Hi all,
>>> (yet again not sure if this is serviceablity-dev or not - I start at
>>> hs-dev, feel free to move this mail around.)
>>> Do we have any plans to extend NMT to cover native JDK libaries too?
>>> That would be a really cool feature.
>>> --
>>> We at SAP have done a similar thing in the past:
>>> We have a monitoring facility in our port which tracks C-heap
>>> allocations, non-imaginatively called "malloc statistic". This feature
>>> predates NMT somewhat - had we had NMT at that time, we would not have
>>> bothered. Our Malloc statistic is less powerful than NMT and
>>> implementation-wise completely at odds with it, so I never felt the
>>> urge to bring it upstream. However, one thing we did do is we extended
>>> its coverage to the JDK native code.
>>> This has been quite helpful in the past to find leaks in JDK, see
>>> e.g.: https://bugs.openjdk.java.net/browse/JDK-8155211
>>> We did this by exposing os::malloc, os::free etc from libjvm.so
>>> ("JVM_malloc", "JVM_realloc", "JVM_free"). In the JDK native code, we
>>> then either manually replaced calls to raw ::malloc(), ::free() etc
>>> with JVM_malloc(), JVM_free(). Or, in places where this was possible,
>>> we did this replacement stuff wholesale by employing a header which
>>> re-defined malloc(), free() etc JVM_malloc, JVM_free etc. Of course,
>>> we also had to add a number of linkage dependencies to the libjvm.so.
>>> All this is pretty standard stuff.
>>> One detail stood out: malloc headers are evil. In our experience, JDK
>>> native code was more difficult to control and "unbalanced
>>> malloc/frees" kept creeping in - especially with the
>>> wholesale-redefinition technique. Unbalanced mallocs/frees means cases
>>> where malloc() is instrumented but ::free() stays raw, or the other
>>> way around. Both combinations are catastrophic since os::malloc uses
>>> malloc headers. We typically corrupted the C-Heap and crashed, often
>>> much later in completely unrelated places.
>>> These types of bugs were very hard to spot and hence very expensive.
>>> And they can creep in in many ways. One example, there exist a
>>> surprising number of system APIs which return results in C-heap and
>>> require the user to free that, which of course must happen with raw
>>> ::free(), not os::free().
>>> We fixed this by not using malloc headers. That means a pointer
>>> returned by os::malloc() is compatible with raw ::free() and vice
>>> versa. The only bad thing happening would be our statistic numbers
>>> being slightly off.
>>> Instead of malloc headers we use a hand-groomed hash table to track
>>> the malloced memory. It is actually quite fast, fast enough that this
>>> malloc statistic feature is on-by-default in our port.
>>> --
>>> Of course, if we extend NMT to JDK native code we also would want to
>>> extend it to mmap() etc - we never did this with our statistic, since
>>> it only tracked malloc.
>>> What do you think? Did anyone else play with similar ideas? Would it
>>> be worth the effort?
>>> Cheers, Thomas

More information about the hotspot-dev mailing list