A race problem about select in a small time window
zhouyx at linux.vnet.ibm.com
Mon Dec 17 00:56:44 PST 2012
This is the detail problem, there is a small time window in which a 3
threads race makes select() always return 0 without blocking.
I wrote a testcase(http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/)
which needs to modify the lib code to reproduce, because the time windows
The reproduce scenario is described in follow, use Tx for thread x:
1. T1 (the user code) is selecting a channel(suppose C), it just returns
from native select function, and niolib select method is checking if the
returned channel is interested in the event, then 2 happens;
2. T2 is closing channel C, it just set the open variable to false but not
yet closed the channel actually, and then 3 happens;
3. T3 set the interedOps of the channel to 0. // 0 means the channel is not
interested in anything, the channel will be put into cancel list normally.
In this senario, T1 returns from select, and return 0 which means no
channel is selected(because the channel C returned from native invocation
has nothing insterested in, it is not returned to application). Then T1
goes to invoke select again(usually in a loop, this is how select is
designed to be used). In normal case, select method checks if any channels
those should be cancelled and remove them from the set to be selected.
Then, goes to native select function.
The problem is: select method first checks if the channel is closed, if it
is closed, select method doesn't put it into cancel list.
In above senario, channel C is in close state, but not closed indeed, and
setInteredOps to 0(which means cancel). So select method doesn't put C into
cancel list(due to the problem) which means the native select set still
contains channel C . So the native select always return C and nio select
always return 0. Until the channel is finally closed.
The testcase: http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev0.2/
A working fix: http://cr.openjdk.java.net/~zhouyx/OJDK-714/webrev_fix/
Please have a look.
On Wed, Dec 5, 2012 at 6:10 PM, Alan Bateman <Alan.Bateman at oracle.com>wrote:
> On 05/12/2012 02:47, Sean Chou wrote:
> A small problem I'm still checking. So the closeLock is just to make sure
> the channel is closed only once, is that right?
> This is how close is specified:
> "If this channel is already closed then invoking this method has no
> This method may be invoked at any time. If some other thread has already
> invoked it, however, then another invocation will block until the first
> invocation is complete, after which it will return without effect."
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the nio-dev