RFR for bug JDK-8004807: java/util/Timer/Args.java failing intermittently in HS testing
roger.riggs at oracle.com
Wed Jun 4 20:23:43 UTC 2014
Hi Martin, Eric,
Of several hundred failures of this test, most were done in a JRE run with
-Xcomp set. A few failures occurred with -Xmixed, none with -Xint.
The printed "elapsed" times (not normalized to hardware or OS) range from
24 to 132 (ms) with most falling into several buckets in the 30's, 40's,
50's and 70's.
I don't spot anything in the Timer.mainLoop code that might break when
optimized but that's one possibility.
On 6/4/2014 3:25 PM, Martin Buchholz wrote:
> Tests for Timer are inherently timing (!) dependent.
> It's reasonable for tests to assume that:
> - reasonable events like creating a thread and executing a simple task
> should complete in less than, say 2500ms.
> - system clock will not change by a significant amount (> 1 sec) during the
> test. Yes, that means Timer tests are likely to fail during daylight
> saving time switchover - we can live with that. (we could even try to fix
> that, by detecting deviations between clock time and elapsed time, but
> probably not worth it)
> Can you detect any real-world unreliability in my latest version of the
> test, not counting daylight saving time switch?
> I continue to resist your efforts to "fix" the test by removing chances for
> the SUT code to go wrong.
> On Tue, Jun 3, 2014 at 11:28 PM, Eric Wang <yiming.wang at oracle.com> wrote:
>> Hi Martin,
>> Thanks for explanation, now I can understand why you set the DELAY_MS to
>> 100 seconds, it is true that it prevents failure on a slow host, however, i
>> still have some concerns.
>> Because the test tests to schedule tasks at the time in the past, so all
>> 13 tasks should be executed immediately and finished within a short time.
>> If set the elapsed time limitation to 50s (DELAY_MS/2), it seems that the
>> timer have plenty of time to finish tasks, so whether it causes above test
>> point lost.
>> Back to the original test, i think it should be a test stabilization
>> issue, because the original test assumes that the timer should be cancelled
>> within < 1 second before the 14th task is called. this assumption may not
>> be guaranteed due to 2 reasons:
>> 1. if test is executed in jtreg concurrent mode on a slow host.
>> 2. the system clock of virtual machine may not be accurate (maybe faster
>> than physical).
>> To support the point, i changed the test as attached to print the
>> execution time to see whether the timer behaves expected as the API
>> document described. the result is as expected.
>> The unrepeated task executed immediately: 
>> The repeated task executed immediately and repeated per 1 second:
>> [1401855509337, 1401855510337, 1401855511338]
>> The fixed-rate task executed immediately and catch up the delay:
>> [1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338,
>> 1401855509338, 1401855509338, 1401855509338, 1401855509338, 1401855509338,
>> 1401855509338, 1401855509836, 1401855510836]
>> On 2014/6/4 9:16, Martin Buchholz wrote:
>> On Tue, Jun 3, 2014 at 6:12 PM, Eric Wang <yiming.wang at oracle.com> wrote:
>>> Hi Martin,
>>> To sleep(1000) is not enough to reproduce the failure, because it is much
>>> lower than the period DELAY_MS (10*1000) of the repeated task created by
>>> "scheduleAtFixedRate(t, counter(y3), past, DELAY_MS)".
>>> Try sleep(DELAY_MS), the failure can be reproduced.
>> Well sure, then the task is rescheduled, so I expect it to fail in this
>> But in my version I had set DELAY_MS to 100 seconds. The point of
>> extending the DELAY_MS is to prevent flaky failure on a slow machine.
>> Again, how do we know that this test hasn't found a Timer bug?
>> I still can't reproduce it.
More information about the core-libs-dev