RFR(S): 8155608: String intrinsic range checks are not strict enough
tobias.hartmann at oracle.com
Mon May 9 11:49:10 UTC 2016
thanks for the feedback!
On 30.04.2016 02:47, John Rose wrote:
>> On Apr 29, 2016, at 1:00 PM, Christian Thalinger <christian.thalinger at oracle.com> wrote:
>>> On Apr 28, 2016, at 9:23 PM, Tobias Hartmann <tobias.hartmann at oracle.com> wrote:
>>> Hi Chris,
>>> On 28.04.2016 22:11, Christian Thalinger wrote:
>>>>> On Apr 28, 2016, at 12:45 AM, Tobias Hartmann <tobias.hartmann at oracle.com <mailto:tobias.hartmann at oracle.com>> wrote:
>>>>> please review the following patch:
>>>> + checkBoundsOffCount(dstOff << 1, len << 1, dst.length);
>>>> It’s funny that we still do << 1 instead of * 2 when every compiler on this planet can optimize that. Yeah, yeah, I know, it’s because of the interpreter but does it really matter?
>>> I used it more for consistency because we use "<< 1" in all the other places in StringLatin1, StringUTF16 and String as well. I think this originated from the "value.length >> String.coder()" use case to get the length depending on the String encoding. Besides that, I'm not sure if interpreter speed really matters here but the String methods are executed a lot (especially during startup).
>>>> Actually, I would prefer:
>>>> + checkBoundsOffCount(dstOff * Character.BYTES, len * Character.BYTES, dst.length);
>>> I agree that this is more readable but for consistency I would like to go with the "<< 1" approach.
>> Again, a loss for maintainability versus consistency (a.k.a. “we’ve always done it this way”).
> That '1' is an anti-pattern I call "naked constant". Which "1" is it, after all?
> The maintainer of code needs to know which condition (of thousands of possibilities) dictates a "1" here.
> In HotSpot we discourage such nakedness, in favor of named constants:
> The Java code should follow this practice also.
Right, I agree. I filed JDK-8156530  to fix this.
> Also, independently, I'm having a hard time figuring out how to prove that, between the Java code and the HotSpot intrinsic (and interpreter/C1/C2), the range checking logic is consistently applied.
> My best advice for this is, always, put the range checking logic in one place, in Java code. This makes me profoundly suspicious of *any* public method that is also marked @HSIC, if it takes any array arguments. If the public method is an intrinsic, it means we are trusting C++ IR-assembly code to securely check Java array bounds. That is unnatural, and (as history proves repeatedly) subject to errors.
Yes, we had that initially but I had to move some of the checks from the Java code into the intrinsic because of JDK-8142303 (see ).
I filed JDK-8156534  to check if we can move the checks back into the Java code without problems similar to JDK-8142303. I will also change the implementation of some of the intrinsics with JDK-8139132 and JDK-8146547 and do some refactoring.
> Instead, we should have a Java routine which checks bounds, and a *private* intrinsic which is provably called *only after* the Java checker is called.
The problem is that some of the intrinsified String methods are heavily used and C2 is not always able to remove the range checks introduced by the public wrapper method. For example, StringUTF16::getChar() is used in places where we implicitly know that the access is in bounds (for example, because the array length is always a multiple of two) but C2 cannot prove this. Therefore, if we want to avoid a regression, we have to call the unchecked version of the intrinsic in some cases.
More information about the hotspot-dev