RFR: JDK-8032050: TEST_BUG: java/rmi/activation/Activatable/shutdownGracefully/ShutdownGracefully.java fails intermittently

Tristan Yan tristan.yan at oracle.com
Wed Feb 12 05:39:19 UTC 2014

Thank you for your thorough mail. This is very educational. I took you advice and generated a new webrev for this.

I appreciate you can review this again.

On Feb 11, 2014, at 8:32 AM, Stuart Marks <stuart.marks at oracle.com> wrote:

> Hi Tristan,
> Sorry about my recurring delays.
> Several comments on these changes.
> JavaVM.java --
> The waitFor(timeout) method is mostly ok. The new thread started at line 208 and following seems unnecessary, though. This code is reached when a timeout occurs, so throwing TimeoutException is the only thing necessary in this case. Should the code to start the new thread be removed?
> There should be a similar check for vm == null as in the waitFor() [no args] method.
> ShutdownGracefully.java --
> The condition that's checked after calling rmid.waitFor(SHUTDOWN_TIMEOUT) is incorrect. It's testing the exit status against zero. Offhand, when and if rmid exits, it might exit with a nonzero exit status. If rmid has exited at this point, then the test should succeed.
> Instead of testing against zero, the code should catch TimeoutException, which means that rmid is still running. It's probably reasonable to catch TimeoutException, print a message, and then let the finally-block destroy the rmid. Calling TestLibrary.bomb() from within the try-block is confusing, since that method throws an exception, which is then caught by the catch-block, when then calls TestLibrary.bomb() again.
> We should also make sure to test the success case properly. If rmid.waitFor() returns in a timely fashion without throwing TimeoutException, it doesn't matter what the exit status is. (It might be worth printing it out.) At that point we know that rmid *has* exited gracefully, so we need to set rmid to null so that the finally-block doesn't attempt to destroy rmid redundantly. Some additional messages about rmid having exited and the test passing are also warranted for this case.
> Some additional cleanup can be done here as well, over and above the changes you've proposed. (This stuff is left over from earlier RMI test messes.) In order to shut down an active object, the code here spawns a new thread that sleeps for a while and then deactivates this object. This isn't necessary. (It might have been necessary in the past.) It's sufficient simply to unexport this object and then deactivate it, directly within the shutdown() method. See
> test/java/rmi/activation/ActivationSystem/unregisterGroup/UnregisterGroup.java
> for an example of this. In addition, the run() method can be removed, and the "implements Runnable" declaration can also be removed from the ShutdownGracefully test class.
> Finally, revisiting some code farther up in the test, the try-block at lines 135-140 issues a registration request that the test expects to fail. If it succeeds, the message at line 139 isn't very clear; it should say that the registration request succeeded unexpectedly. This should cause the test to fail. We still probably want to go through the waitFor(timeout) path and eventual rmid cleanup, but a flag should be set here to ensure that the test indeed fails if the registration succeeds unexpectedly, and the messages should clearly indicate that this is going on.
> A good way to test this last case is to change rmid's security manager to the normal security manager java.lang.SecurityManager instead of TestSecurityManager.
> Thanks,
> s'marks
> On 2/10/14 2:59 AM, Tristan Yan wrote:
>> Hi Stuart
>> Could you help to review this.
>> Thank you
>> Tristan
>> On Jan 31, 2014, at 4:36 PM, Tristan Yan <tristan.yan at oracle.com
>> <mailto:tristan.yan at oracle.com>> wrote:
>>> Thank you for fixing JDK-8023541. Then I leave ActivationLibrary.java for now.
>>> I still did some clean up following your suggestion.
>>> 1. I changed waitFor(long timeout) method, this method is going to use code
>>> like Process.waitFor(timeout, unit). This can be backported to JDK7. Also
>>> exitValue is kept as a return value. For making sure there is no Pipe leak, a
>>> cleanup thread will start when timeout happens.
>>> 2. Change in ShutdownGracefully is a little tricky. I think we should just
>>> destroy JVM once exception is thrown. So I move the wait logic into try block
>>> instead keep them in finally block.
>>> Can you receive it again.
>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.02/
>>> Thank you
>>> Tristan
>>> On 01/29/2014 03:16 PM, Stuart Marks wrote:
>>>> Hi Tristan,
>>>> I don't want to put the workaround into ActivationLibrary.rmidRunning() for a
>>>> null return from the lookup, because this is only a workaround for an actual
>>>> bug in rmid initialization. See the review I just posted for JDK-8023541.
>>>> Adding JavaVM.waitFor(timeout) is something that would be useful in general,
>>>> but it needs to be handled carefully. It uses the new
>>>> Process.waitFor(timeout, unit) which is new in Java SE 8; this makes
>>>> backporting to 7 more complicated. Not clear whether we'll do so, but I don't
>>>> want to forclose the opportunity without discussion. It's also not clear how
>>>> one can get the vm's exit status after JavaVM.waitFor() has returned true.
>>>> With the Process API it's possible simply to call waitFor() or exitValue().
>>>> With JavaVM, a new API needs to be created, or the rule has to be established
>>>> that one must call JavaVM.waitFor() to collect the exit status as well as to
>>>> close the pipes from the subprocess. If JavaVM.waitFor(timeout, unit) is
>>>> called without subsequently calling JavaVM.waitFor(), the pipes are leaked.
>>>> In ShutdownGracefully.java, the finally-block needs to check to see if rmid
>>>> is still running, and if it is, to shut it down. Simply calling
>>>> waitFor(timeout, unit) isn't sufficient, because if the rmid process is still
>>>> running, it will be left running.
>>>> The straightforward approach would be to call ActivationLibrary.rmidRunning()
>>>> to test if it's still running. Unfortunately this isn't quite right, because
>>>> rmidRunning() has a timeout loop in it -- which should probably be removed.
>>>> (I think there's a bug for this.) Another approach would be simply to call
>>>> rmid.destroy(). This calls rmid's shutdown() method first, which is
>>>> reasonable, but I'm not sure it kills the process if that fails. In any case,
>>>> this already has a timeout loop waiting for the process to die, so
>>>> ShutdownGracefully.java needn't use a new waitFor(timeout, unit) call.
>>>> Removing the commented-out code that starts with "no longer needed" is good,
>>>> and removing the ShutdownDetectThread is also good, since that's unnecessary.
>>>> There are some more cleanups I have in mind here but I'd like to see a
>>>> revised webrev before proceeding.
>>>> Thanks,
>>>> s'marks
>>>> On 1/25/14 8:57 PM, Tristan Yan wrote:
>>>>> Hi Stuart
>>>>> Thank you for your review and suggestion.
>>>>> Yes, since this failure mode is very hard to be reproduced. I guess it's
>>>>> make sense  to do some hack. And I also noticed in
>>>>> ActivationLibrary.rmidRunning. It does try to look up ActivationSystem but
>>>>> doesn't check if it's null. So I add the logic to make sure we will look up
>>>>> the non-null ActivationSystem. Also I did some cleanup if you don't mind.
>>>>> Add a waitFor(long timeout, TimeUnit unit) for JavaVM. Which we can have a
>>>>> better waitFor control.
>>>>> I appreciate you can review the code again.
>>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/
>>>>> Thank you
>>>>> Tristan
>>>>> On 01/25/2014 10:20 AM, Stuart Marks wrote:
>>>>>> On 1/23/14 10:34 PM, Tristan Yan wrote:
>>>>>>> Hi All
>>>>>>> Could you review the bug fix for JDK-8032050.
>>>>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/
>>>>>>> Description:
>>>>>>> This rare happened failure caused because when RMID starts. It don't
>>>>>>> guarantee
>>>>>>> sun.rmi.server.Activation.startActivation finishes.
>>>>>>> Fix by adding a iterative getSystem with a 5 seconds timeout.
>>>>>> Hi Tristan,
>>>>>> Adding a timing/retry loop into this test isn't the correct approach for
>>>>>> fixing this test.
>>>>>> The timing loop isn't necessary because there is already a timing loop in
>>>>>> RMID.start() in the RMI test library. (There's another timing loop in
>>>>>> ActivationLibrary.rmidRunning() which should probably be removed.) So the
>>>>>> intent of this library call is that it start rmid and wait for it to become
>>>>>> ready. That logic doesn't need to be added to the test.
>>>>>> In the bug report JDK-8032050 you had mentioned that the
>>>>>> NullPointerException was suspicious. You're right! I took a look and it
>>>>>> seemed like it was related to JDK-8023541, and I added a note to this
>>>>>> effect to the bug report. The problem here is that rmid can come up and
>>>>>> transiently return null instead of the stub of the activation system.
>>>>>> That's what JDK-8023541 covers. I think that rmid itself needs to be fixed,
>>>>>> though modifying the timing loop in the RMI test library to wait for rmid
>>>>>> to come up *and* for the lookup to return non-null is an easy way to fix
>>>>>> the problem. (Or at least cover it up.)
>>>>>> The next step in the analysis is to determine, given that
>>>>>> ActivationLibrary.getSystem can sometimes return null, whether this has
>>>>>> actually caused this test failure. This is pretty easy to determine; just
>>>>>> hack in a line "system = null" in the right place and run the test. I've
>>>>>> done this, and the test times out and the output log is pretty much
>>>>>> identical to the one in the bug report. (I recommend you try this
>>>>>> yourself.) So I think it's fairly safe to say that the problem in
>>>>>> JDK-8023541 has caused the failure listed in JDK-8032050.
>>>>>> I can see a couple ways to proceed here. One way is just to close this out
>>>>>> as a duplicate of JDK-8023541 since that bug caused this failure.
>>>>>> Another is that this test could use some cleaning up. This bug certainly
>>>>>> covers a failure, but the messages emitted are confusing and in some cases
>>>>>> completely wrong. For example, the "rmid has shutdown" message at line 180
>>>>>> is incorrect, because in this case rmid is still running and the wait()
>>>>>> call has timed out. Most of the code here can be replaced with calls to
>>>>>> various bits of the RMI test library. There are a bunch of other things in
>>>>>> this test that could be cleaned up as well.
>>>>>> It's up to you how you'd like to proceed.
>>>>>> s'marks

More information about the core-libs-dev mailing list