RFR 8014870: Faster KDC availability check in Kerberos
weijun.wang at oracle.com
Tue Jun 24 08:17:39 UTC 2014
Please review the code change at
In Kerberos, when trying to request for a ticket, we tried multiple KDC servers for multiple times. Before this fix, we connect to a server, wait for 30 seconds (the default kdc_timeout). If there is no answer, go to the next KDC, wait for another 30 seconds, and so on. If none of the KDCs replies. We do more rounds (max_retries, default 3) of connections, and fail at last. Altogether with 3 KDCs we will wait at most 3*30*3=270 seconds.
After this fix, connections are non-blocking and made every second, so they can wait at the same time. The kdc_timeout default is also reduced to 10 seconds (the same as other vendors). At the worst case, we will wait 3*3+10=19 seconds.
You might say that changing kdc_timeout to 10 seconds matters a lot here but actually the wait-together style is much more helpful. Suppose only the 3rd KDC is alive, the old code needs to wait for 60 seconds to be able to connect to it, while the new one only needs to wait for 2 seconds, and this has nothing to do with kdc_timeout.
Because of this, I've thrown away the old krb5.kdc.bad.policy security property. It's now not worth remembering which KDC is alive and which is not.
All changes are inside the KdcComm.java file. Others are test and removal of useless things.
I've included net-dev@ because these are all NIO calls, which I was no familiar with.
More information about the security-dev