Code Review - 6957683 TEST_BUG: test/java/util/concurrent/ThreadPoolExecutor/ failing

David Holmes david.holmes at
Tue Nov 22 02:33:10 UTC 2011


On 22/11/2011 6:26 AM, Gary Adams wrote:
> Not quite as sure about the internal timeouts in this regression test.
> Here's a proposed change that cranks up the timeouts. Since the test
> harness defaults to 2 minutes there's no reason the service termination
> should timeout more quickly. For the thread startup, I added a countdown
> latch so the main thread waits til the worker threads have started.
> Also, after the service termination completes increased the delay to
> ensure the thread count check will be accurate.

Sorry but I think my initial analysis of this problem was incorrect. The 
initial test failure reported 3 failed cases:

1 not equal to 0
11 not equal to 10
1 not equal to 0

which corresponded to failures at the marked lines:

     public static void main(String[] args) throws Throwable {
         CustomTPE tpe = new CustomTPE();
         equal(tpe.getCorePoolSize(), threadCount);
         equal(countExecutorThreads(), 0);
         for (int i = 0; i < threadCount; i++)
             tpe.submit(new Runnable() { public void run() {}});
         equal(countExecutorThreads(), threadCount);
         equal(CustomTask.births.get(), threadCount);
         tpe.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
/*1*/   equal(countExecutorThreads(), 0);

         CustomSTPE stpe = new CustomSTPE();
         for (int i = 0; i < threadCount; i++)
             stpe.submit(new Runnable() { public void run() {}});
         equal(CustomSTPE.decorations.get(), threadCount);
/*2*/   equal(countExecutorThreads(), threadCount);
         stpe.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
/*3/    equal(countExecutorThreads(), 0);

         System.out.printf("%nPassed = %d, failed = %d%n%n", passed, 
         if (failed > 0) throw new Exception("Some tests failed");

However, I think the failure at /*2*/ is actually seeing the same 
"extra" thread as reported at /*1*/. It could even be the same thread 
again at /*3*/ but we have other independent failure cases for /*3*/.

When we create the STPE each submit will create and start a worker 
thread until we reach the core pool size. These worker threads will be 
immediately visible to the countExecutorThreads() logic because it 
simply enumerates the ThreadGroup. I was mistakenly thinking that these 
threads need to actually get a chance to execute to become visible to 
the counting logic, but that is not the case.

Consequently you can elide the change to add the latch and just leave 
the defensive timeouts on the awaitTermination with the increased sleep 
delays to give the worker threads a chance to really terminate.

Sorry for sending you down the wrong path on this one.


More information about the core-libs-dev mailing list