Extend NMT to JDK native libraries?

Zhengyu Gu zgu at redhat.com
Wed Nov 21 16:26:11 UTC 2018

> But note the unbalanced-malloc issue. The more we expose NMT to the
> JDK wild, the more probable are such bugs. Within the hotspot all is
> nice and tidy.

Hotspot also has a few unbalanced-malloc instances. And yes, we will 
find more in library, I would consider it as initial investment to fix 
them, especially, if we can do them module-by-module, no?

Mismatched statistics is quite annoying ... Did see people actually 
counting bytes and expecting to match :-) JDK-8191369 actually was 
driven by customers, who tried to match smap.



> If we wanted to avoid these bugs, we would have to remove malloc
> headers from both os::malloc() and NMT MallocTracker and add a malloc
> pointer hashtable of some sorts to the latter. This is not very
> difficult, but would still need an initial investment.

> Thanks, Thomas
>> Thanks,
>> -Zhengyu
>>> (I think even if we were to instrument parts of the JDK - e.g. just
>>> NIO - this would already be very helpful. In parts we do this already
>>> for mtOther.).
>>> On Wed, Nov 21, 2018 at 3:54 PM Zhengyu Gu <zgu at redhat.com> wrote:
>>>> FYI: There was a phase 2 RFE: Native Memory Tracking (Phase 2)
>>>> https://bugs.openjdk.java.net/browse/JDK-6995789
>>>> -Zhengyu
>>>> On 11/21/18 9:28 AM, Thomas Stüfe wrote:
>>>>> Hi all,
>>>>> (yet again not sure if this is serviceablity-dev or not - I start at
>>>>> hs-dev, feel free to move this mail around.)
>>>>> Do we have any plans to extend NMT to cover native JDK libaries too?
>>>>> That would be a really cool feature.
>>>>> --
>>>>> We at SAP have done a similar thing in the past:
>>>>> We have a monitoring facility in our port which tracks C-heap
>>>>> allocations, non-imaginatively called "malloc statistic". This feature
>>>>> predates NMT somewhat - had we had NMT at that time, we would not have
>>>>> bothered. Our Malloc statistic is less powerful than NMT and
>>>>> implementation-wise completely at odds with it, so I never felt the
>>>>> urge to bring it upstream. However, one thing we did do is we extended
>>>>> its coverage to the JDK native code.
>>>>> This has been quite helpful in the past to find leaks in JDK, see
>>>>> e.g.: https://bugs.openjdk.java.net/browse/JDK-8155211
>>>>> We did this by exposing os::malloc, os::free etc from libjvm.so
>>>>> ("JVM_malloc", "JVM_realloc", "JVM_free"). In the JDK native code, we
>>>>> then either manually replaced calls to raw ::malloc(), ::free() etc
>>>>> with JVM_malloc(), JVM_free(). Or, in places where this was possible,
>>>>> we did this replacement stuff wholesale by employing a header which
>>>>> re-defined malloc(), free() etc JVM_malloc, JVM_free etc. Of course,
>>>>> we also had to add a number of linkage dependencies to the libjvm.so.
>>>>> All this is pretty standard stuff.
>>>>> One detail stood out: malloc headers are evil. In our experience, JDK
>>>>> native code was more difficult to control and "unbalanced
>>>>> malloc/frees" kept creeping in - especially with the
>>>>> wholesale-redefinition technique. Unbalanced mallocs/frees means cases
>>>>> where malloc() is instrumented but ::free() stays raw, or the other
>>>>> way around. Both combinations are catastrophic since os::malloc uses
>>>>> malloc headers. We typically corrupted the C-Heap and crashed, often
>>>>> much later in completely unrelated places.
>>>>> These types of bugs were very hard to spot and hence very expensive.
>>>>> And they can creep in in many ways. One example, there exist a
>>>>> surprising number of system APIs which return results in C-heap and
>>>>> require the user to free that, which of course must happen with raw
>>>>> ::free(), not os::free().
>>>>> We fixed this by not using malloc headers. That means a pointer
>>>>> returned by os::malloc() is compatible with raw ::free() and vice
>>>>> versa. The only bad thing happening would be our statistic numbers
>>>>> being slightly off.
>>>>> Instead of malloc headers we use a hand-groomed hash table to track
>>>>> the malloced memory. It is actually quite fast, fast enough that this
>>>>> malloc statistic feature is on-by-default in our port.
>>>>> --
>>>>> Of course, if we extend NMT to JDK native code we also would want to
>>>>> extend it to mmap() etc - we never did this with our statistic, since
>>>>> it only tracked malloc.
>>>>> What do you think? Did anyone else play with similar ideas? Would it
>>>>> be worth the effort?
>>>>> Cheers, Thomas

More information about the hotspot-dev mailing list