RFR: 8232588: G1 concurrent System.gc can return early or late

Thomas Schatzl thomas.schatzl at oracle.com
Thu Nov 7 09:58:28 UTC 2019


On 07.11.19 01:14, Kim Barrett wrote:
>> On Nov 6, 2019, at 10:24 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
>> Hi
>> On 31.10.19 21:53, Kim Barrett wrote:
>>> A change is that waiting by a user-requested GC for a concurrent marking
>>> cycle to complete used to be performed with the thread transitioned to
>>> native and without safepoint checks on the associated monitor lock and wait.
>>> This was noted as having been cribbed from CMS.  Coleen and I looked at this
>>> and could not come up with a reason for doing that for G1 (anymore, after
>>> the recent spate of locking improvements), so there's a new G1-specific
>>> monitor being used and the locking and waiting is now "normal".  (This makes
>>> the FullGCCount_lock monitor largely CMS-specific.)
>> I do not see a reason for the extra monitor (why it is better to have aseparate monitor instead of reusing the existing one with almost the same name?), it does not seem to add information. It is still used in GenCollectedHeap/Serial, so can't be removed with CMS.
> The old monitor is _safepoint_check_never and uses
> no_safepoint_checking locking and waiting, while the new one is
> _safepoint_check_always and uses corresponding locking and waiting
> (which only checks for safepoints from Java threads). The latter is
> the proper configuration for G1's usage. I didn't want to change the
> other one and test and such, with CMS about to go away. And I think
> with CMS gone there are no longer any waiters for it and it becomes
> rather vestigal and can probably be eliminated. I was planning to file
> an RFE to explore that after this change and CMS removal have
> happened.


>> - Not sure, but maybe this logging should be moved to trace level? In either case I would suggest to improve it to cover the try_collect call too, i.e. with appropriate "attempt"/"complete"/"discard" messages.
> Agreed on trace vs debug level.  There are already “completion" messages.  Maybe
> what you are asking for is more detail on those?

One case I am missing is a "discard" message in the case described in 

>> Otherwise the logging looks a bit incomplete to me, i.e. only attempts to initiate a concurrent collection cauuse messages. I understand that this might generate too many messages for gc+debug, that is the reason to potentially move to gc+trace.
> I'm intentionally only minimally touching the other cases in
> try_collect(). Or am I misunderstanding your comment? (There are
> logging messages for completion, waiting, and retries, in addition to
> the attempts to initiate messages.)  Or maybe this is more about
> wanting additional detail in “completion” messages?

In this case I was looking at the log messages only (on gc+debug/trace 
level), sorry. I found the other messages at gc+alloc=trace in the code.

It would be nice to think about consolidating these messages a bit (in a 
separate CR) as they are somewhat different in style now (and use 
different log tags).

While looking through the code, I found another minor issue I think: to 
schedule a standard evacuation pause (in line 2270+, wouldn't it be 
better to call do_collection_pause() instead of executing the VMOps 

The existing code there seems to miss the check whether the VMOps 
gc_prologue succeeded.

>> - feel free to ignore: in such log messages, if I add thread id for debugging purposes I tend to put the thread id value first so that if you have a log in front of you, and ask your viewer to highlight a particular id, all highlights are in the same column.
>> I.e.
>> .... stuff from unified logging] <thread-id> Message
>> instead of having the thread-id somewhere located in the message like in
>> this case where it is somewhere in the middle.
>> (I kind of also prefer the raw Thread::current() value instead of the pretty name, fwiw)
> I don't care that much how the thread is identified; I used the name
> as something human readable in a log file. The address of the Thread
> doesn't seem that useful though. (Or are you saying the logging system
> provides something for threads that I'm not aware of? Wouldn't be the
> first time I've overlooked some feature of logging.) I'll move it to
> the front of the log message though.

Nah, it's fine to use the name, just some (obvious) rambling thought.

And you're right with your suspicion, there is a decorattor for the 
thread id...

>> - a comment why we need a separate gc_counter_less_than method instead of a < comparison would be nice (I know, because of roll-over of these counts).
> Done.
>> - pre-existing:
>> g1VMOperations.cpp:124-127: maybe this could be changed to
>> } else if (g1h->should_upgrade....) {
>> }
>> Also please fix the indentation of the parameter list in the call to do_full_collection() in line 133.
> Done.  Also fixed one or two other formatting nits.  And added some
> more comments to the new VMOp's doit.
> But I wonder if there might be a pre-existing bug here. The call to
> should_upgrade was introduced as part of JDK-8211425. This changed the
> previous check to add policy()->force_upgrade_to_full(), which has a
> default false that is overridden by G1HeterogeneousHeapPolicy to
> return true for _manager->has_borrowed_regions(). From reading the RFR
> thread discussion, the point is that if we were to proceed without the
> full GC here we could apparently have a heap size that is greater than
> MaxHeapSize.
> But I don't see why we couldn't be in that situation and still be able
> to successfully perform the requested allocation and return.
> So I wonder if the should_upgrade should be called unconditionally,
> and then attempt the requested allocation (if any).

I agree here. The should_upgrade_to_full() check should be first.

> New webrevs:
> full: https://cr.openjdk.java.net/~kbarrett/8232588/open.01/
> incr: https://cr.openjdk.java.net/~kbarrett/8232588/open.01.inc/

Looks good (obviously missing the items discussed here).


More information about the hotspot-gc-dev mailing list