Stack walking performance issue
amirhadadi at hotmail.com
Sun Mar 17 14:43:15 UTC 2019
We've encountered the following performance issue which happens on an instance deployed in docker with ubuntu 16.04, Linux kernel 4.4.0-92-generic and OpenJDK 11.0.2.
The issue shows up with zgc but does not show up with G1.
During a period of 10 days after deployment, CPU usage goes up steadily at a rate of ~10% per day. Eventually we have to restart all instances.
I profiled our app after 4 days of uptime using async-profiler and found that the following stack is the most frequent stack: https://gist.github.com/amirhadadi/48b6f84e3b2412124e817a50608e6ddd
I tried restarting the instance and waited 10 minutes before profiling, and stack walking shows up much less in sampling: https://gist.github.com/amirhadadi/0c43b087b9bfd995119a97cbf3557d21
This is how the stack walk looks when profiling an instance deployed with g1 after 3 days: https://gist.github.com/amirhadadi/224c33a19bfd9ea8dcc264cefc641496
Please help me figure this one out.
More information about the zgc-dev