RFR - JDK-8202442 - String::unescape (Code Review)

Jim Laskey james.laskey at oracle.com
Thu Sep 20 12:52:08 UTC 2018

Modified as per Stuart's suggestion.

> On Sep 20, 2018, at 7:45 AM, Chris Hegarty <chris.hegarty at oracle.com> wrote:
>> On 19 Sep 2018, at 23:21, Stuart Marks <stuart.marks at oracle.com> wrote:
>> ...
>> 2979      * Each unicode escape in the form \unnnn is translated to the
>> 2980      * unicode character whose code point is {@code 0xnnnn}. Care should be
>> 2981      * taken when using UTF-16 surrogate pairs to ensure that the high
>> 2982      * surrogate (U+D800..U+DBFF) is immediately followed by a low surrogate
>> 2983      * (U+DC00..U+DFFF) otherwise a
>> 2984      * {@link java.nio.charset.CharacterCodingException} may occur during UTF-8
>> 2985      * decoding.
>> I know you're going to update this based on Naoto's comments, but I'd suggest rethinking this section. The \unnnn construct is called a "Unicode escape" per JLS 3.3, but how it's handled has little to do with Unicode. The nnnn digits are simply translated into a 16-bit 'char' value. Any such value will work, even if it's an invalid UTF-16 code unit (such as 0xFFF0) or an unpaired surrogate.
> I had a similar comment/question. CCE is a checked exception, and
> since the method does not declare that it throws CCE, I took a look
> at the implementation and came to the same conclusion as Stuart.
> Additionally, why should non-character code points, like \uFFFE, be
> translated? If it’s a non-character code point or a malformed surrogate
> pair, would it not be better to just leave it as-is?
> -Chris.

More information about the core-libs-dev mailing list