<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Hi Martin,<br>
      <br>
      I have updated the test based on your comments and split the old
      logic into different functions to test legal or illegal surrogate
      pairs. Can you please help to review it?<br>
      <a
        href="http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.01/">http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.01/</a><br>
      <br>
      For your last comment "reversed surrogate", I'm not sure whether
      my understanding is correct. Does that mean reverse legal
      surrogate e.g. \uD800\uDC00 to \uDC00\uD800 which is kind of
      malformed surrogate.<br>
      I have covered this case in the test.<br>
      <br>
      Thanks,<br>
      Eric<br>
      On 2013/12/14 8:15, Martin Buchholz wrote:<br>
    </div>
    <blockquote
cite="mid:CA+kOe095rJ4-jsHkXfn=NoxVMy+8d-WNmqpwawE0qGRU9kZe3A@mail.gmail.com"
      type="cite">
      <div dir="ltr">I do like this approach. &nbsp;I think we can have some
        cleanup to make it a little clearer what is being tested.
        <div><br>
        </div>
        <div>Below, it makes no sense to check canEncode when you are
          creating a decoder.</div>
        <div>
          <pre style="color:rgb(0,0,0)"><span class="" style="color:blue;font-weight:bold">  60             try {</span>
<span class="" style="color:blue;font-weight:bold">  61                 de = charset.newDecoder();</span>
<span class="" style="color:blue;font-weight:bold">  62             } catch (UnsupportedOperationException ex) {</span>
<span class="" style="color:blue;font-weight:bold">  63                 if (canEncode) {</span>
<span class="" style="color:blue;font-weight:bold">  64                     throw ex;</span>
<span class="" style="color:blue;font-weight:bold">  65                 }</span>
<span class="" style="color:blue;font-weight:bold">  66             }</span>
</pre>
        </div>
        <div><span class="" style="color:blue;font-weight:bold"><br>
          </span></div>
        <div><font color="#0000ff"><b>Once you do this:</b></font></div>
        <div><span class="">
            <pre style="color:rgb(0,0,0);font-weight:normal"><span class="" style="color:blue">  68             if (!canEncode) {</span>
<span class="" style="color:blue">  69                 continue;</span>
<span class="" style="color:blue">  70             }</span>
</pre>
            <div style="color:blue;font-weight:bold"><span class=""
                style="color:blue">the loop should no longer test
                canEncode, but does, obscuring the logic.</span></div>
            <div style="color:blue;font-weight:bold"><span class=""
                style="color:blue"><br>
              </span></div>
            <div style="color:blue;font-weight:bold"><span class=""
                style="color:blue">Use standard whitespace:</span></div>
            <div><span class="">
                <pre style="color:rgb(0,0,0);font-weight:normal"><span class="" style="color:blue"> 109                 }finally {</span>
</pre>
                <div style="color:blue;font-weight:bold"><span class=""
                    style="color:blue">When encoding legal surrogate
                    pair, should see&nbsp;</span><span
                    style="font-weight:normal">UnmappableCharacterException,
                    not&nbsp;</span><span style="font-weight:normal">MalformedInputException.</span></div>
                <div style="color:blue;font-weight:bold"><span
                    style="font-weight:normal">Reverse is true with
                    illegal surrogate pair.</span></div>
                <div style="color:blue;font-weight:bold"><span
                    style="font-weight:normal"><br>
                  </span></div>
                <div style="color:blue;font-weight:bold"><span
                    style="font-weight:normal">should probably tests
                    isolated, in addition to reversed, surrogates.</span></div>
                <div style="color:blue;font-weight:bold"><span
                    style="font-weight:normal"><br>
                  </span></div>
                <div><span style="color:blue;font-weight:normal">Compare
                    my old test&nbsp;</span><font color="#0000ff">FindCanEncodeBugs.</font></div>
              </span></div>
          </span></div>
        <div><span class="" style="color:blue;font-weight:bold"><br>
          </span></div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Thu, Dec 12, 2013 at 4:51 AM, Eric
          Wang <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:yiming.wang@oracle.com" target="_blank">yiming.wang@oracle.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <div>Hi Martin &amp; All,<br>
                <br>
                Please review the changes for MalformedSurrogates.<br>
                <a moz-do-not-send="true"
                  href="http://cr.openjdk.java.net/%7Eewang/JDK-8022879/webrev.00/"
                  target="_blank">http://cr.openjdk.java.net/~ewang/JDK-8022879/webrev.00/</a><br>
                <br>
                The logic of changes are<br>
                1. Iterate all charsets of system to check whether the
                charset supports encode.<br>
                2. If yes, check whether it supports surrogate.<br>
                3. If yes, check whether it rejects malformed surrogate
                and can en(de)code normal surrogate.<br>
                <br>
                Thanks,<br>
                Eric
                <div>
                  <div class="h5"><br>
                    On 2013/11/8 0:23, Martin Buchholz wrote:<br>
                  </div>
                </div>
              </div>
              <div>
                <div class="h5">
                  <blockquote type="cite">
                    <div dir="ltr">I still like my old idea of iterating
                      over all charsets and checking their
                      reasonableness properties.
                      <div><br>
                      </div>
                      <div>Probably all charsets bundled with the jdk
                        should reject unpaired surrogates when encoding.
                        &nbsp;Check it! &nbsp;Failure to do so might be considered
                        a security bug.</div>
                      <div><br>
                      </div>
                      <div>I would use&nbsp;CodingErrorAction.REPORT &nbsp;instead
                        of REPLACE and examine that the resulting
                        exception occurs and has reasonable detail.</div>
                      <div><br>
                      </div>
                      <div>For properly paired surrogates, check that
                        the charset can encode the resulting codepoint
                        using canEncode, and if so, check that encoding
                        succeeds.</div>
                    </div>
                    <div class="gmail_extra"><br>
                      <br>
                      <div class="gmail_quote">On Thu, Nov 7, 2013 at
                        7:22 AM, Eric Wang <span dir="ltr">&lt;<a
                            moz-do-not-send="true"
                            href="mailto:yiming.wang@oracle.com"
                            target="_blank">yiming.wang@oracle.com</a>&gt;</span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <div text="#000000" bgcolor="#FFFFFF">
                            <div lang="x-western">
                              <div>
                                <p class="MsoNormal">&nbsp;H<span
                                    style="font-size:12.0pt">i Everyone</span></p>
                                <p class="MsoNormal"><span
                                    style="font-size:12.0pt">I am
                                    working on bug </span><span
                                    style="font-size:12.0pt"><a
                                      moz-do-not-send="true"
                                      href="https://bugs.openjdk.java.net/browse/JDK-8022879"
                                      target="_blank">https://bugs.openjdk.java.net/browse/JDK-8022879</a>.&nbsp;


                                    <br>
                                    The test <a moz-do-not-send="true"
href="http://hg.openjdk.java.net/jdk8/tl/jdk/file/44fa6bf42846/test/sun/nio/cs/MalformedSurrogates.java"
                                      target="_blank">sun/nio/cs/MalformedSurrogates.java</a>
                                    doesn't run if the system default
                                    encoding is UTF-8. But
                                    unfortunately, UTF-8 is the default
                                    charset of most test machines, it
                                    means the test get few chances to be
                                    executed. <br>
                                    Another defect is the test would
                                    failed if the default charset is
                                    UTF-16 or UTF-32 as the test doesn't
                                    take the 2 charsets into
                                    consideration.<br>
                                  </span></p>
                                <p class="MsoNormal"><span
                                    style="font-size:12.0pt">The idea of
                                    fix&nbsp; is no matter what system
                                    charset it is, the test should
                                    always be executed. Here thanks
                                    Martin's suggestion that instead of
                                    checking byte size, we can use
                                    CharsetEncoder.canEncode() and </span><span
                                    style="font-size:12.0pt"><span
                                      style="font-size:12.0pt">CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)


                                    </span>to check and replace
                                    malformed chars.<br>
                                  </span></p>
                                <p class="MsoNormal"><span
                                    style="font-size:12.0pt">So the test
                                    can be re-designed as below:<br>
                                    <br>
                                    1. To use CharsetEncoder.canEncode()
                                    to check whether the string includes
                                    malformed characters. <br>
                                    2. If a string includes malformed
                                    characters e.g.
                                    "abc\uD800\uDB00efgh", then set
                                    CharsetEncoder.onMalformedInput(CodingErrorAction.REPLACE)
                                    to replace the malformed characters
                                    to the replacement "?" when calling
                                    CharsetEncoder.encode() method. <br>
                                    3. Verified by decoding the encoded
                                    ByteBuffer to CharBuffer, check
                                    whether it includes replacement "?"
                                    and compare it with old string, if
                                    not equal, then test passed.<br>
                                    4. If a sting doesn't include
                                    malformed characters e.g.
                                    "abc\uD800\uDC00efgh", the
                                    CharsetEncoder.encode() converts it
                                    to ByteBuffer which doesn't include
                                    replacement "?" <br>
                                    5. Verified by decoding the encoded
                                    ByteBuffer to CharBuffer, confirm
                                    that it doesn't include replacment
                                    "?" and compare it with old string,
                                    if equal, then test passed.<br>
                                  </span></p>
                                <p class="MsoNormal"><span
                                    style="font-size:12.0pt">Please let
                                    me know if you have any comments or
                                    suggestions.</span></p>
                                <p class="MsoNormal"><span
                                    style="font-size:12.0pt">Thanks,<br>
                                    Eric<br>
                                  </span></p>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>