RFR: JDK-8032050: TEST_BUG: java/rmi/activation/Activatable/shutdownGracefully/ShutdownGracefully.java fails intermittently

Tristan Yan tristan.yan at oracle.com
Fri Jan 31 08:36:56 UTC 2014

Thank you for fixing JDK-8023541. Then I leave ActivationLibrary.java 
for now.
I still did some clean up following your suggestion.
1. I changed waitFor(long timeout) method, this method is going to use 
code like Process.waitFor(timeout, unit). This can be backported to 
JDK7. Also exitValue is kept as a return value. For making sure there is 
no Pipe leak, a cleanup thread will start when timeout happens.
2. Change in ShutdownGracefully is a little tricky. I think we should 
just destroy JVM once exception is thrown. So I move the wait logic into 
try block instead keep them in finally block.
Can you receive it again.
Thank you

On 01/29/2014 03:16 PM, Stuart Marks wrote:
> Hi Tristan,
> I don't want to put the workaround into 
> ActivationLibrary.rmidRunning() for a null return from the lookup, 
> because this is only a workaround for an actual bug in rmid 
> initialization. See the review I just posted for JDK-8023541.
> Adding JavaVM.waitFor(timeout) is something that would be useful in 
> general, but it needs to be handled carefully. It uses the new 
> Process.waitFor(timeout, unit) which is new in Java SE 8; this makes 
> backporting to 7 more complicated. Not clear whether we'll do so, but 
> I don't want to forclose the opportunity without discussion. It's also 
> not clear how one can get the vm's exit status after JavaVM.waitFor() 
> has returned true. With the Process API it's possible simply to call 
> waitFor() or exitValue(). With JavaVM, a new API needs to be created, 
> or the rule has to be established that one must call JavaVM.waitFor() 
> to collect the exit status as well as to close the pipes from the 
> subprocess. If JavaVM.waitFor(timeout, unit) is called without 
> subsequently calling JavaVM.waitFor(), the pipes are leaked.
> In ShutdownGracefully.java, the finally-block needs to check to see if 
> rmid is still running, and if it is, to shut it down. Simply calling 
> waitFor(timeout, unit) isn't sufficient, because if the rmid process 
> is still running, it will be left running.
> The straightforward approach would be to call 
> ActivationLibrary.rmidRunning() to test if it's still running. 
> Unfortunately this isn't quite right, because rmidRunning() has a 
> timeout loop in it -- which should probably be removed. (I think 
> there's a bug for this.) Another approach would be simply to call 
> rmid.destroy(). This calls rmid's shutdown() method first, which is 
> reasonable, but I'm not sure it kills the process if that fails. In 
> any case, this already has a timeout loop waiting for the process to 
> die, so ShutdownGracefully.java needn't use a new waitFor(timeout, 
> unit) call.
> Removing the commented-out code that starts with "no longer needed" is 
> good, and removing the ShutdownDetectThread is also good, since that's 
> unnecessary.
> There are some more cleanups I have in mind here but I'd like to see a 
> revised webrev before proceeding.
> Thanks,
> s'marks
> On 1/25/14 8:57 PM, Tristan Yan wrote:
>> Hi Stuart
>> Thank you for your review and suggestion.
>> Yes, since this failure mode is very hard to be reproduced. I guess 
>> it's make sense  to do some hack. And I also noticed in 
>> ActivationLibrary.rmidRunning. It does try to look up 
>> ActivationSystem but doesn't check if it's null. So I add the logic 
>> to make sure we will look up the non-null ActivationSystem. Also I 
>> did some cleanup if you don't mind. Add a waitFor(long timeout, 
>> TimeUnit unit) for JavaVM. Which we can have a better waitFor control.
>> I appreciate you can review the code again.
>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.01/
>> Thank you
>> Tristan
>> On 01/25/2014 10:20 AM, Stuart Marks wrote:
>>> On 1/23/14 10:34 PM, Tristan Yan wrote:
>>>> Hi All
>>>> Could you review the bug fix for JDK-8032050.
>>>> http://cr.openjdk.java.net/~tyan/JDK-8032050/webrev.00/
>>>> Description:
>>>> This rare happened failure caused because when RMID starts. It 
>>>> don't guarantee
>>>> sun.rmi.server.Activation.startActivation finishes.
>>>> Fix by adding a iterative getSystem with a 5 seconds timeout.
>>> Hi Tristan,
>>> Adding a timing/retry loop into this test isn't the correct approach 
>>> for fixing this test.
>>> The timing loop isn't necessary because there is already a timing 
>>> loop in RMID.start() in the RMI test library. (There's another 
>>> timing loop in ActivationLibrary.rmidRunning() which should probably 
>>> be removed.) So the intent of this library call is that it start 
>>> rmid and wait for it to become ready. That logic doesn't need to be 
>>> added to the test.
>>> In the bug report JDK-8032050 you had mentioned that the 
>>> NullPointerException was suspicious. You're right! I took a look and 
>>> it seemed like it was related to JDK-8023541, and I added a note to 
>>> this effect to the bug report. The problem here is that rmid can 
>>> come up and transiently return null instead of the stub of the 
>>> activation system. That's what JDK-8023541 covers. I think that rmid 
>>> itself needs to be fixed, though modifying the timing loop in the 
>>> RMI test library to wait for rmid to come up *and* for the lookup to 
>>> return non-null is an easy way to fix the problem. (Or at least 
>>> cover it up.)
>>> The next step in the analysis is to determine, given that 
>>> ActivationLibrary.getSystem can sometimes return null, whether this 
>>> has actually caused this test failure. This is pretty easy to 
>>> determine; just hack in a line "system = null" in the right place 
>>> and run the test. I've done this, and the test times out and the 
>>> output log is pretty much identical to the one in the bug report. (I 
>>> recommend you try this yourself.) So I think it's fairly safe to say 
>>> that the problem in JDK-8023541 has caused the failure listed in 
>>> JDK-8032050.
>>> I can see a couple ways to proceed here. One way is just to close 
>>> this out as a duplicate of JDK-8023541 since that bug caused this 
>>> failure.
>>> Another is that this test could use some cleaning up. This bug 
>>> certainly covers a failure, but the messages emitted are confusing 
>>> and in some cases completely wrong. For example, the "rmid has 
>>> shutdown" message at line 180 is incorrect, because in this case 
>>> rmid is still running and the wait() call has timed out. Most of the 
>>> code here can be replaced with calls to various bits of the RMI test 
>>> library. There are a bunch of other things in this test that could 
>>> be cleaned up as well.
>>> It's up to you how you'd like to proceed.
>>> s'marks

More information about the core-libs-dev mailing list