Re: how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

Michal Frajt michal at frajt.eu
Thu Dec 14 16:34:00 UTC 2017


Hi Andy,

How many ConcurrentHashMap instances do you actually have in your 16 gig heap? Not sure if I understand your map structure correctly - "But the first char of the key takes you to the second tier of ConcurrentHashMaps and so". Could you provide historgram of your application when running full (before you start LRU sweeping)? Do you need the ConcurrentHashMaps if you have several tiers which already act as concurrent segments? Did you consider open addressing maps (Trove, Koloboke) eliminating the need of the map nodes (there would be some trade off when removing)? Did you consider to store char or even byte array instead of the String instance? Do your remove ConcurrentHashMap tier when it gets completely empty after the LRU sweep? All this might significantly reduce the heap requirement shortening the GC time. 

Regards, 
Michal 
 


Od: "hotspot-gc-dev" hotspot-gc-dev-bounces at openjdk.java.net
Komu: "Andy Nuss" andrew_nuss at yahoo.com
Kopie: "hotspot-gc-dev at openjdk.java.net openjdk.java.net" hotspot-gc-dev at openjdk.java.net
Datum: Thu, 14 Dec 2017 08:19:21 +0100
Předmet: Re: how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects


Hi Andy,
What you are describing is fairly routine caching behavior with a small twist in that the objects being held in this case are quite regular in size. Again, I wouldn’t design with the collector in mind where as I certainly design with memory efficiency as a reasonable goal.
As for GC, in the JVM there are two basic strategies which I then to label evacuating and in-place. G1 is completely evacuating and consequently the cost (aka pause duration) is (in most cases) a function of the number of live objects. The trigger for a young generational collection is when you have consumed all of the Eden regions. Thus the frequency is the size of Eden divided by your allocation rate. The trigger for a Concurrent Mark of tenured is when it consumes 45% of available heap. Thus your Concurrent Mark frequency is 45% to the size of heap / promotion rate. Additionally G1 keeps some memory on reserve to avoid painting the collector into a Full GC corner.
Issues specific to caching are; very large live sets that result in inflated copy costs as data flows from Eden through survivor and finally into tenured space. In these case I’ve found that it’s better slow down the frequency of collections  as this will result in you experiencing the same pause time but less frequently. There is also another tactic that I’ve found to be helpful on occasion is to lower the Initiating Heap Occupancy Percent (aka IHOP) from it’s default value of 45% into a value that sees is consistantly in the live set. Meaning, you’ll run back to back concurrent cycles. And I’ve got a bag of other tactics that I’ve used with varying degrees of success. Which one would be for you? I’ve no idea. Tuning a collector isn’t something you can do after reading a few tips from StackOverflow. GC behavior is an emergent reaction to the workload that you place on it meaning the only way to really understand how it’s all going to work is to run production like experiments (or better yet, run in production) and look at a GC log. (Shameless plug.. Censum, my GC log visualization tooling helps).
I understand your concerns in wanting to avoid the dreaded GC pause but I’d also look at your efforts in two ways. First, it’s an opportunity to get a better understanding of GC and secondly, recognize that this feels like a premature optimization as you’re trying to solve a problem that you, well none of us to be fair and honest, fully understand and may not actually have. Let me recommend some names that have written about how G1 works. Charlie Hunt in his performance tuning book, Poonan Parhhar in her blog entries, Monica Beckwith in a number of different places, Simone Bordet in a number of places. I should add that hotspot-gc-use at openjdk.java.net is a more appropriate list for these types of questions. We also have a number of GC related discussions on our mailing list, friends at jclarity.com. I’ve also recorded a session with Dr. Heinz Kabutz on his https://javaspecialists.teachable.com/ site. I’ll get an exact link if you email me offline.
Kind regards,Kirk Pepperdine On Dec 13, 2017, at 9:55 PM, Andy Nuss <andrew_nuss at yahoo.com> wrote:
Let me try to explain.  On a 16 gig heap, I anticipate almost 97% of the heap in use at any given moment is ~30 and ~100 char strings.  The rest is small pointer objects in the ConcurrentHashMap, also longly held, and tomcat's nio stuff.  So at any moment in time, most of the in-use heap (and I will keep about 20% unused to aid gc), is a huge number of longly held strings.  Over time, as the single servlet receives requests to cache newly accessed key/val pairs, the number of strings grows to its maximum I allow.  At that point, a background thread sweeps away half of the LRU key/value pairs (30,100 char strings).  Now they are unreferenced and sweepable.  That's all I do.  Then the servlet keeps receiving requests to put more key/val pairs.  As well as handle get requests.  At the point in time where I clear all the LRU pairs, which might take minutes to iterate, G1 can start doing its thing, not that it will know to do so immediately.  I'm worried that whenever G1 does its thing, because the sweepable stuff is 100% small oldgen objects, servlet threads will timeout on the client side.  Not that this happens several times a day, but if G1 does take a long time to sweep a massive heap with all oldgen objects that are small, the *only* concern is that servlet requests will time out during this period.
Realize I know nothing about GC, except that periodically, eclipse hangs due to gc and then crashes on me.  I.e. after 4 hours of editing.  And that all the blogs I found talked about newgen and TLAB and other things assuming typical ephemeral usage going on which is not at all the case on this particular machine instance.  Again, all longly held small strings, growing and growing over time steadily, suddenly half are freed reference wise by me.
If there are no GC settings that make that sweepable stuff happen in a non-blocking thread, and tomcat's servlets could all hang once every other day for many many seconds on this 16 gig machine (the so-called long gc-pause that people blog about), that might motivate me to abandon this and use the memcached product.

            


            
            
                
                    
                    
                        On Wednesday, December 13, 2017, 12:15:38 PM PST, Kirk Pepperdine <kirk at kodewerk.com> wrote:
                    
                    

                    

                    Hi Andy,
On Dec 13, 2017, at 8:34 PM, Andy Nuss <andrew_nuss at yahoo.com> wrote:
Thanks Kirk,
The array is just a temporary buffer held onto that has its entries cleared to null after my LRU sweep.  The references that are freed to GC are in the ConcurrentHashMaps, and are all 30 char and 100 char strings, key/vals, but not precisely, so I assume that when I do my LRU sweep when needed, its freeing a ton of small strings, 

which G1 has to reallocate into bigger chunks, and mark freed, and so,
Not sure I understand this bit. Can you explain what you mean by this?
 so that I can in the future add new such strings to the LRU cache.  The concern was whether this sweep of old gen strings scattered all over the huge heap would cause tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less pre-emptively.  Are you basically saying that, "no tomcat servlet response time won't be significantly affected by G1 sweep”?

I’m not sure what you’re goal is here. I would say, design as needed and let the collector do it’s thing. That said, temporary humongous allocations are not well managed by the G1. Better to create up front and cache it for future downstream use.
As for a sweep… what I think you’re asking about is object copy costs. These costs should and typically do dominate pause time. Object copy cost is proportional to the number of live objects in the collection set (CSet). Strings are dedup’ed after age 5 so with most heap configurations, duplicate Strings will be dedup’ed before they hit tenured.

Also, I was wondering does anyone know how memcached works, and why it is used in preference to a custom design such as mine which seems a lot simpler.  I.e. it seems that with "memcached", you have to worry about "slabs" and memcached's own heap management, and waste a lot of memory.

I’m the wrong person to defend the use of memcached. It certainly does serve a purpose.. that said, to use it to offload temp object means you end up creating your own garbage collector… and as you can see by the efforts GC engineers put into each implementation, it’s a non-trivial under-taking.
Kind regards,Kirk

                
            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20171214/48184368/attachment.html>


More information about the hotspot-gc-dev mailing list