Strange Long ParNew GC Pauses (Sample Code Included)

Srinivas Ramakrishna ysr1729 at gmail.com
Tue Jun 2 00:45:11 UTC 2015


Hi Jon --

I missed this earlier, but just saw the bug you opened for this issue.
Out of curiosity, is the issue that the fix-up of the "suffix block"
doesn't go sufficiently far back in that block, or is it something else?

thanks!
-- ramki


On Thu, May 14, 2015 at 6:35 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
wrote:

>  Jungwoo,
>
> Thanks for the information.
>
> There's a bug in the current code which tries to patch up the BOT after
> a block is split.  I don't understand yet what is wrong but a simple
> workaround (which is known to be slower than necessary) avoids the
> much of the long pauses.  After this has been fixed, we should compare
> the performances of our two implementations.
>
> Jon
>
>
>
> On 5/14/2015 2:52 PM, Jungwoo Ha wrote:
>
> We have a faster implementation for card table scan, and on our
> implementation ~20s pause goes down to 1~2s with above-mentioned
> experiments. It doesn't solve the problem but it alleviates it (and
> actually shows that card table scan is faster).
> If someone can sponsor this, I will create a patch for the review.
>
>
> On Wed, May 13, 2015 at 6:36 PM, Srinivas Ramakrishna <ysr1729 at gmail.com>
> wrote:
>
>> Hi Jon --
>>
>>  thanks, just a comment inline below....
>>
>> On Wed, May 13, 2015 at 4:06 PM, Jon Masamitsu <jon.masamitsu at oracle.com>
>> wrote:
>>
>>>
>>>
>>> On 5/13/2015 11:40 AM, Srinivas Ramakrishna wrote:
>>>
>>> Hi Jon --
>>>
>>>  My expectation is that for any large block spanning multiple cards, if
>>> you walk the BOT entries for that block from lower addresses to higher
>>> addresses, the BOT entries should be monotone increasing (albeit not
>>> strictly monotone) until the last card for that block (but one). In the
>>> example you gave below, is D > C > N_words, with the BOT entries in some
>>> interior window of the same block/object going (low to high addresses in
>>> the window) like " .... DDDDCCCCDDDD...." ? In other words, there are
>>> lower-valued entries C within a range of higher-valued entries D, with both
>>> C and D denoting logarithmic entries for the block in question?
>>>
>>>
>>>  Yes, that's what I'm seeing.
>>>
>>>
>>>  If you look in the BOT methods, there was a parameter "reducing"
>>> passed to some of the mark_block() methods which was supposed to do some
>>> kind of assertion checking towards maintenance of this monotonicity, but
>>> looking at it now, that may not be sufficient and the assert could be
>>> strengthened at the "end" of the range being marked. More worryingly, I
>>> believe I had turned off even that assert before I left (ca 6uXX?) because
>>> it was failing at that time, but the corresponding bug hadn't been fixed,
>>> and seems as though it still isn't. This might possibly  have some bearing
>>> on the non-monotonicity in the entries you see.
>>>
>>>
>>>  Thanks for that bit of information.   So what I'm seeing is not
>>> necessarily the cause for the long pauses.  But see my next comment.
>>>
>>>
>>>  I'll hopefully get some time to go over this code again. I was looking
>>> at the code on the train ride into work this morning, and trying to
>>> remember the details, but have not quite delved again into all of its
>>> details yet or why the assert I mention above should be failing.
>>>
>>>
>>>  After the logarithmic stride back to find the block containing the
>>> start of the object, there is a object by object walk forward to find the
>>> to find the start of the object
>>>
>>>    558    while (n <= addr) {
>>>    559      debug_only(HeapWord* last = q);   // for debugging
>>>    560      q = n;
>>>    561      n += _sp->block_size(n);
>>>
>>> [...]
>>>
>>> From instrumentation I think the long pauses originate in this forward
>>> walk
>>> (mostly in the block_size() method).   So maybe the BOT
>>> "DDDDDCCCCC...CDDDD"
>>> indicates that BOT at the start of an objects is being trashed and we're
>>> going way too far back.  I'm trying to use VerifyBlockOffsetArray but it
>>> seems to
>>> have bit rotted.    I'll keep trying.
>>>
>>
>>  Ah, that's what I was afraid of. This must likely mean that when a
>> large block is split, the suffix from the split
>> (which is typically returned to the free list) isn't getting fixed  up as
>> was the intention, or when the sweeper
>> coalesces smaller blocks into a larger one, the new larger block isn't
>> getting fixed correctly (although that seems
>> unlikely). Using VerifyBlockOffsetArray and asserting if the forward walk
>> is more than a card's worth in length
>> would seem to be one way to catch that badness closer to when it
>> happens...
>>
>>  Thanks for the update, Jon!
>> -- ramki
>>
>>
>>>
>>> Jon
>>>
>>>
>>>
>>>  It might be worthwhile (if it is possible) to do a bisect on whatever
>>> the first known JDK (7uXX?) where the performance regression appears and
>>> try to narrow down the change that may have caused the regression, using
>>> James' test program. I have a sneaking suspicion (because of the elided
>>> assert) that it might well be a change I had made towards closing some
>>> BOT-walking/card-scanning cracks when initializing CMS objects ca 6uXX.
>>>
>>>  James, in case you have read thus far, can you provide the exact
>>> version number of the 6uXX Oracle jdk you used that did _not_ show the
>>> performance regression for your test program?
>>>
>>>  Thanks!
>>> -- ramki
>>>
>>> On Tue, May 12, 2015 at 1:50 PM, Jon Masamitsu <jon.masamitsu at oracle.com
>>> > wrote:
>>>
>>>>  Ramki,
>>>>
>>>> The logarithmic BOT is taking big steps back to
>>>> find the start of a large object.  But something is odd.
>>>>
>>>> Is there a reason why the BOT entries would decrease for a while
>>>> and then increase without a value that gives the start of an object?
>>>>
>>>> &_array->_offset_array[index] = 0xffffffff6550f600 "DDDDDDDDDDDDDD ..."
>>>>
>>>> and then down the road.
>>>>
>>>>  print &_array->_offset_array[0x110600U],
>>>> _array->_offset_array[0x10f600U >
>>>> &_array->_offset_array[0x110600U] = 0xffffffff65510600
>>>> "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
>>>> CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCDDDDDDDDDDDDDDDD
>>>> DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
>>>> DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
>>>> DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
>>>> DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
>>>> DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
>>>> DDDDDDDDDDDDDDDDDDDDDDDDDD
>>>>
>>>> 'D' is 0x44
>>>>
>>>> N_words is 0x40
>>>>
>>>> Jon
>>>>
>>>>
>>>> On 05/12/2015 08:46 AM, Srinivas Ramakrishna wrote:
>>>>
>>>> Thanks for that background Jon; I'll try and look at the 8uxx code and
>>>> run James's example when I get a few spare cycles (which won't be soon,
>>>> unfortunately).
>>>>
>>>>  -- ramki
>>>>
>>>> ysr1729
>>>>
>>>> On May 8, 2015, at 14:25, Jon Masamitsu <jon.masamitsu at oracle.com>
>>>> wrote:
>>>>
>>>>   Ramki,
>>>>
>>>> I compared the memory/blockOffsetTable.* files between 6u35 and 7 and
>>>> there
>>>> were essentially no differences.  The .cpp file had some changes in
>>>> assertions.
>>>> The other two were the same except for comments.
>>>>
>>>> Between 6 and 6u35 (didn't pin point it), we changed from the rather
>>>> weak N_powers=3  :-) to N_powers=14.
>>>>
>>>> There seem to be two problematic cases.
>>>>
>>>> 1)  Spike in ParNew times that lasts for 2 (or a few) GC's
>>>> 2) Continually growing ParNew times
>>>>
>>>> From just reading the CR
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8079274
>>>>
>>>> the test case seems to reproduce 1).  The test case
>>>> has a large object that lives till the application exits.
>>>> If it were a broken logarithmic BOT, I would not expect
>>>> the ParNew pauses to shrink again.
>>>>
>>>> Jon
>>>>
>>>>
>>>>
>>>> On 5/8/2015 11:49 AM, Srinivas Ramakrishna wrote:
>>>>
>>>> This is indeed interesting...
>>>>
>>>>  Jesper, did you figure out why (as James seems to note) the problem
>>>> is seen in JDK 8 but not in JDK 6 (he's using ParNew in both cases). Also,
>>>> isn't the BOT logarithmic, which will allow threads to "quickly" traverse
>>>> back to the start of even a large object? Did something change in the BOT
>>>> that might have led to what James sees as a regression, or did the range
>>>> that he characterizes as a "blind spot" merely move from one place to
>>>> another?
>>>>
>>>>  Unfortunately, I have not been keeping up with the changes in this
>>>> area and am not working with JDK 6 code much, so can't offer any clues, but
>>>> am surprised that this would be a regression. (I would expect the same
>>>> performance pathologies in this code to exist in 6 and forward -- has it
>>>> been established conclusively that this is indeed a regression?)
>>>>
>>>>  thanks!
>>>> -- ramki
>>>>
>>>> On Fri, May 8, 2015 at 6:40 AM, James Lent <jlent at tivo.com> wrote:
>>>>
>>>>>  The explanation of the problem added by Jesper Wilhelmsson on 2015-05-06
>>>>> 08:59 to JDK-8079274 is interesting, but, not very satisfying.
>>>>>
>>>>> 1) The problem affects more than this one specific breakpoint.  It
>>>>> impacts objects in at least the range of about 480M to 512M.  The comment
>>>>> "A larger object probably triggers some higher level code that are able to
>>>>> skip scanning of this part of the memory" does not inspire confidence. It
>>>>> makes even more concerned that we do not understand this issue and that
>>>>> there could be other triggers.  It seems like "the higher level code" has a
>>>>> 32M blind spot.  Further I can create scenarios where the GC code takes
>>>>> progressively longer to clear the issue.  Presumably it completes the lower
>>>>> block scan during each GC cycle.  Why does it take longer and longer each
>>>>> time (my guess is it has something to do with the number of objects moved
>>>>> to survivor space).  If I cache 1 in 100 instead of 1 in 10 then there are
>>>>> 11 progressively slower ParNewGC (after several fast one) and the issue
>>>>> takes minutes to clear:
>>>>>
>>>>> $ /usr/lib/jvm/java-7-oracle/bin/java -verbose:gc
>>>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC
>>>>> -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G LongParNewPause 536870384 100 100
>>>>> 0.308: [GC0.308: [ParNew: 272640K->3029K(306688K), 0.0113180 secs]
>>>>> 796927K->527316K(4160256K), 0.0114040 secs] [Times: user=0.05 sys=0.00,
>>>>> real=0.02 secs]
>>>>> 0.367: [GC0.367: [ParNew: 275669K->7766K(306688K), 0.0140840 secs]
>>>>> 799956K->532053K(4160256K), 0.0141630 secs] [Times: user=0.04 sys=0.01,
>>>>> real=0.02 secs]
>>>>> 0.430: [GC0.430: [ParNew: 280406K->10057K(306688K), 0.0120810 secs]
>>>>> 804693K->534344K(4160256K), 0.0121690 secs] [Times: user=0.05 sys=0.00,
>>>>> real=0.02 secs]
>>>>> 0.490: [GC0.490: [ParNew: 282697K->15383K(306688K), 0.0128540 secs]
>>>>> 806984K->539670K(4160256K), 0.0129390 secs] [Times: user=0.05 sys=0.00,
>>>>> real=0.02 secs]
>>>>> 0.551: [GC0.551: [ParNew: 288023K->21006K(306688K), 0.0146480 secs]
>>>>> 812310K->545294K(4160256K), 0.0147650 secs] [Times: user=0.05 sys=0.01,
>>>>> real=0.01 secs]
>>>>> 0.613: [GC0.613: [ParNew: 293646K->26805K(306688K), 0.0141250 secs]
>>>>> 817934K->551092K(4160256K), 0.0142050 secs] [Times: user=0.05 sys=0.00,
>>>>> real=0.01 secs]
>>>>> 0.675: [GC0.675: [ParNew: 299445K->24258K(306688K), 0.4816840 secs]
>>>>> 823732K->551436K(4160256K), 0.4817620 secs] [Times: user=1.90 sys=0.00,
>>>>> real=0.48 secs]
>>>>> 1.205: [GC1.205: [ParNew: 296898K->17914K(306688K), 3.4216550 secs]
>>>>> 824076K->547788K(4160256K), 3.4217180 secs] [Times: user=13.50 sys=0.00,
>>>>> real=3.42 secs]
>>>>> 4.674: [GC4.674: [ParNew: 290554K->17433K(306688K), 6.3406820 secs]
>>>>> 820428K->550031K(4160256K), 6.3407410 secs] [Times: user=25.00 sys=0.02,
>>>>> real=6.34 secs]
>>>>> 11.062: [GC11.062: [ParNew: 290073K->17315K(306688K), 10.9493130 secs]
>>>>> 822671K->552636K(4160256K), 10.9493890 secs] [Times: user=42.98 sys=0.03,
>>>>> real=10.95 secs]
>>>>> 22.059: [GC22.059: [ParNew: 289955K->17758K(306688K), 16.0539650 secs]
>>>>> 825276K->555798K(4160256K), 16.0540250 secs] [Times: user=62.44 sys=0.09,
>>>>> real=16.05 secs]
>>>>> 38.161: [GC38.161: [ParNew: 290398K->17745K(306688K), 20.6900820 secs]
>>>>> 828438K->558512K(4160256K), 20.6901420 secs] [Times: user=80.79 sys=0.08,
>>>>> real=20.69 secs]
>>>>> 58.898: [GC58.898: [ParNew: 290385K->21826K(306688K), 24.6781700 secs]
>>>>> 831152K->565310K(4160256K), 24.6782710 secs] [Times: user=95.58 sys=0.05,
>>>>> real=24.68 secs]
>>>>> 83.625: [GC83.625: [ParNew: 294466K->21834K(306688K), 30.3501800 secs]
>>>>> 837950K->568044K(4160256K), 30.3502610 secs] [Times: user=115.80 sys=0.11,
>>>>> real=30.35 secs]
>>>>> 114.022: [GC114.022: [ParNew: 294474K->21836K(306688K), 33.6648800
>>>>> secs] 840684K->570772K(4160256K), 33.6649390 secs] [Times: user=127.38
>>>>> sys=0.14, real=33.67 secs]
>>>>> 147.736: [GC147.736: [ParNew: 294476K->18454K(306688K), 38.9828430
>>>>> secs] 843412K->570117K(4160256K), 38.9829040 secs] [Times: user=149.14
>>>>> sys=0.15, real=38.99 secs]
>>>>> 186.766: [GC186.766: [ParNew: 291094K->26602K(306688K), 39.8726730
>>>>> secs] 842757K->580977K(4160256K), 39.8727330 secs] [Times: user=156.35
>>>>> sys=0.11, real=39.88 secs]
>>>>> 226.686: [GC226.686: [ParNew: 299242K->26612K(306688K), 0.0174470
>>>>> secs] 853617K->583714K(4160256K), 0.0175300 secs] [Times: user=0.06
>>>>> sys=0.00, real=0.02 secs]
>>>>>
>>>>> 2) The suggestion to use only 1 thread is not acceptable to me.  Use
>>>>> parallel GC with one thread?  Further it did not work for me (in fact it
>>>>> made things worse).  "user" time does now equal "real" time (and the new
>>>>> size is 1/4 as big) so I think the parameter took effect:
>>>>>
>>>>> $ /usr/lib/jvm/java-8-oracle/bin/java -verbose:gc
>>>>> -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+UseParNewGC
>>>>> -XX:+UseConcMarkSweepGC -Xmx4G -Xms4G -XX:ParallelGCThreads=1
>>>>> LongParNewPause 536870384 100 10
>>>>> 0.226: [GC (Allocation Failure) 0.226: [ParNew: 68160K->6980K(76672K),
>>>>> 0.0489756 secs] 592447K->531267K(4185792K), 0.0490718 secs] [Times:
>>>>> user=0.05 sys=0.00, real=0.05 secs]
>>>>> 0.291: [GC (Allocation Failure) 0.291: [ParNew: 75140K->8512K(76672K),
>>>>> 29.8707281 secs] 599427K->539479K(4185792K), 29.8707954 secs] [Times:
>>>>> user=29.80 sys=0.00, real=29.88 secs]
>>>>> 30.180: [GC (Allocation Failure) 30.180: [ParNew:
>>>>> 76672K->8093K(76672K), 30.6357699 secs] 607639K->545772K(4185792K),
>>>>> 30.6358257 secs] [Times: user=30.57 sys=0.00, real=30.63 secs]
>>>>> 60.837: [GC (Allocation Failure) 60.837: [ParNew:
>>>>> 76253K->6830K(76672K), 69.2115255 secs] 613932K->552169K(4185792K),
>>>>> 69.2115783 secs] [Times: user=69.05 sys=0.01, real=69.22 secs]
>>>>> 130.068: [GC (Allocation Failure) 130.068: [ParNew:
>>>>> 74990K->8226K(76672K), 132.5118903 secs] 620329K->560381K(4185792K),
>>>>> 132.5119459 secs] [Times: user=132.21 sys=0.00, real=132.52 secs]
>>>>> 262.603: [GC (Allocation Failure) 262.603: [ParNew:
>>>>> 76386K->8512K(76672K), 0.0400138 secs] 628541K->568748K(4185792K),
>>>>> 0.0400687 secs] [Times: user=0.04 sys=0.00, real=0.04 secs]
>>>>>
>>>>> 3) Applications should not be expected to tune the GC settings to
>>>>> account for the size of one static object.  The connection between the size
>>>>> of this object and the GC performance was not easy to even find.  Further
>>>>> the parameter ParGCCardsPerStrideChunk is pretty low level and not one I
>>>>> think many application developers are familiar with.
>>>>>
>>>>> Error: VM option 'ParGCCardsPerStrideChunk' is diagnostic and must be
>>>>> enabled via -XX:+UnlockDiagnosticVMOptions.
>>>>> Error: Could not create the Java Virtual Machine.
>>>>> Error: A fatal exception has occurred. Program will exit.
>>>>>
>>>>>  4) I can not reproduce this issue in Oracle 6.  This appears to be a
>>>>> regression.  How can fixing a regression be called an enhancement request?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------
>>>>>
>>>>> This email and any attachments may contain confidential and privileged
>>>>> material for the sole use of the intended recipient. Any review, copying,
>>>>> or distribution of this email (or any attachments) by others is prohibited.
>>>>> If you are not the intended recipient, please contact the sender
>>>>> immediately and permanently delete this email and any attachments. No
>>>>> employee or agent of TiVo Inc. is authorized to conclude any binding
>>>>> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
>>>>> Inc. may only be made by a signed written agreement.
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
>  --
>  Jungwoo Ha | Java Platform Team | jwha at google.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/attachments/20150601/1ecf8fcb/attachment-0001.html>


More information about the hotspot-gc-dev mailing list