<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body>
    <p>Re SubChar.java</p>
    <p>The correct solution is to dynamically generate any files with
      non-standard whitespace, even if it is simple as <br>
      <tt>Files.writeString(tmpPath, Fies.readString(srcPath).trim())</tt><br>
    </p>
    <p>-- Jon<br>
    </p>
    <div class="moz-cite-prefix">On 8/14/20 11:09 AM, Jim Laskey wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:2570DC5F-0534-41C1-9D07-A9C73178D560@oracle.com">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      Thank you.
      <div class=""><br class="">
      </div>
      <div class="">The commentary for SubChar.java reads.</div>
      <div class=""><br class="">
      </div>
      <div class="">
        <div class="">
          <div class="">"Note: this source file has been crafted very
            carefully to end with the</div>
          <div class="">unicode escape sequence for the control-Z
            character without a</div>
          <div class="">following newline.  The scanner is specified to
            allow control-Z there.</div>
          <div class="">If you edit this source file, please make sure
            that your editor does</div>
          <div class="">not insert a newline after that trailing line."</div>
        </div>
      </div>
      <div class=""><br class="">
      </div>
      <div class="">The person that checked in the test did exactly that
        - added a newline after that trailing line (this was done when
        moving to mercurial in 2007.)</div>
      <div class=""><br class="">
      </div>
      <div class="">This patch is to remove that newline. Note that
        jcheck doesn't check tests for missing last newline.</div>
      <div class=""><br class="">
      </div>
      <div class="">Cheers,</div>
      <div class=""><br class="">
      </div>
      <div class="">-- Jim</div>
      <div class=""><br class="">
      </div>
      <div class="">
        <div><br class="">
          <blockquote type="cite" class="">
            <div class="">On Aug 14, 2020, at 2:42 PM, Maurizio
              Cimadamore <<a
                href="mailto:maurizio.cimadamore@oracle.com" class=""
                moz-do-not-send="true">maurizio.cimadamore@oracle.com</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <meta http-equiv="Content-Type" content="text/html;
                charset=windows-1252" class="">
              <div class="">
                <p class="">Hi Jim,<br class="">
                  this is a very good cleanup. I like how the new code
                  makes the tokenizers a lot less verbose than before,
                  and I like how you have refactored the various
                  UnicodeReader vs. DocReader (now
                  PositionTrackingReader), as the status quo was messy,
                  and we had a lot of flexibility on paper that wasn't
                  really used in practice and ended up making the code
                  more complex than it needed to be.</p>
                <p class="">Big thumbs up from me.</p>
                <p class="">Minor comment: what's up with SubChar.java?
                  Webrev is empty, but patch reports following diff:</p>
                <pre class="">iff --git a/test/langtools/tools/javac/unicode/SubChar.java b/test/langtools/tools/javac/unicode/SubChar.java
--- a/test/langtools/tools/javac/unicode/SubChar.java
+++ b/test/langtools/tools/javac/unicode/SubChar.java
@@ -45,4 +45,4 @@
         return;
     }
 }
-/* \u001A */
+/* \u001A */
\ No newline at end of file</pre>
                <p class=""><br class="">
                </p>
                <p class="">Is that deliberate?<br class="">
                </p>
                <p class=""><br class="">
                </p>
                <p class="">Maurizio<br class="">
                </p>
                <p class=""><br class="">
                </p>
                <div class="moz-cite-prefix">On 13/08/2020 18:32, Jim
                  Laskey wrote:<br class="">
                </div>
                <blockquote type="cite"
                  cite="mid:721293C9-C990-4AB2-808A-D3C9A973DC88@oracle.com"
                  class="">
                  <meta http-equiv="Content-Type" content="text/html;
                    charset=windows-1252" class="">
                  <span class="">
                    <div style="font-size: 12px;" class=""><span
                        class="" style="font-size: 14px;"><span class="">webrev: <a
href="http://cr.openjdk.java.net/~jlaskey/8224225/webrev-04" class=""
                            moz-do-not-send="true">http://cr.openjdk.java.net/~jlaskey/8224225/webrev-04</a></span></span></div>
                    <div style="font-size: 12px;" class=""><span
                        class="" style="font-size: 14px;">jbs: <a
                          href="https://bugs.openjdk.java.net/browse/JDK-8224225"
                          class="" moz-do-not-send="true">https://bugs.openjdk.java.net/browse/JDK-8224225</a></span></div>
                    <div style="font-size: 12px;" class=""><br class="">
                    </div>
                    <div style="font-size: 12px;" class=""><span
                        style="font-size: 14px;" class="">I recommend
                        looking at the "new" versions of 1.
                        UnicodeReader, then 2. JavaTokenizer and then 3.
                        JavadocTokenizer before venturing into the
                        diffs.</span></div>
                    <div style="font-size: 12px;" class=""><span
                        style="font-size: 14px;" class=""><br class="">
                      </span></div>
                    <div class="">
                      <div style="font-size: 12px;" class=""><span
                          class="" style="font-size: 14px;"><br class="">
                        </span></div>
                      <div class=""><span class="" style="font-size:
                          14px;">Rationale, under the heading of
                          technical debt: T</span><span
                          style="font-size: 14px;" class="">here is a
                          lot "going on" in the </span><span
                          style="font-size: 14px;" class="">JavaTokenizer/</span><span
                          style="font-size: 14px;" class="">JavadocTokenizer</span><span
                          style="font-size: 14px;" class=""> that needed
                          to be cleaned up.</span></div>
                    </div>
                  </span>
                  <div class="">
                    <div class=""><span class="" style="font-size:
                        14px;"><br class="">
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="">- The </span>UnicodeReader<span
                          class=""> shouldn't really be accumulating
                          characters for literals.</span></span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- A tokenizer shouldn't need to be aware
                        of the unicode translations.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- There is no need for peek ahead.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- There were a lot of repetitive tasks
                        that should be done in methods instead of
                        complex expressions.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- Names of existing methods were often
                        confusing.</span></div>
                    <div class=""><br class="">
                    </div>
                    <div class=""><span class="" style="font-size:
                        14px;">To avoid disruption, I avoided changing
                        logical, except in the UnicodeReader. There are
                        some relics in the JavaTokenizer/</span><span
                        style="font-size: 14px;" class="">JavadocTokenizer</span><span
                        style="font-size: 14px;" class=""> that could be
                        cleaned up but require deeper analysis.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><br class="">
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">Some details;</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">
                        <div class="" style="font-size: 12px;"><span
                            class="" style="font-size: 14px;">  </span></div>
                        <div class="" style="font-size: 12px;"><span
                            style="font-size: 14px;" class="">-
                            UnicodeReader was reworked to provide
                            tokenizers a running stream of unicode
                            characters/codepoints. Steps:</span></div>
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>-
                        characters are read from the buffer.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>-
                        if the character is a '\' then check to see if
                        the character is the beginning of an unicode
                        escape sequence, If so, then translate.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>-
                        if the character is a high surrogate then check
                        to see if next character is the low surrogate.
                        If so then combine.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">                </span>- A
                        tokenizer can test a codepoint with the
                        isSurrogate predicate (when/if needed.)</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">  The result of putting this logic on
                        UnicodeReader's shoulders means that a tokenizer
                        does not need have any </span><span
                        style="font-size: 14px;" class="">unicode</span> "<span
                        style="font-size: 14px;" class="">logical."</span></div>
                    <div class=""><br class="">
                    </div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="">- The old </span>UnicodeReader
                        modified the source buffer to insert an EOI
                        character at the end to mark the last
                        character. </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>- This
                        meant the buffer had to be large enough (grown)
                        to accommodate.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>- There
                        really was no need since we can simply return an
                        EOI when trying to read past the end of buffer.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><br class="">
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- The only buffer mutability left behind
                        is when reading digits.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>- Unicode
                        digits are still replaced with ASCII digits.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space:pre">          </span>- This
                        seems unnecessary, but I didn't want to risk
                        messing around with the existing logic. Maybe
                        someone can enlighten me.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><br class="">
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- The sequence '\\' is special cased in
                        the UnicodeReader so that the sequence "\\uXXXX"
                        is handled properly.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>- Thus,
                        tokenizers don't have to special case '\\'
                        (happened frequently in the JavadocTokenizer.)</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><br class="">
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- JavaTokenizer was modified to
                        accumulate scanned literals in a StringBuilder.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>- This
                        simplified/clarified the code significantly.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><br class="">
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- Since a lot of the functionality needed
                        by the JavaTokenizer comes directly from
                        a UnicodeReader, I made JavaTokenizer a subclass
                        of UnicodeReader.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>- Otherwise,
                        I would have had to reference "reader."
                        everywhere or would have to create
                        JavaTokenizer methods to repeat the same logic.
                        This was simpler and cleaner.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><br class="">
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- Since the pattern "if (ch == 'X')
                        bpos++" occurred a lot, I switched to using "if
                        (accept('X')) " patterns.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><span class="Apple-tab-span" style="white-space: pre;">        </span>- Actually,
                        I tightened up a lot of these patterns, as you
                        will see in the code.</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><br class="">
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">- There are a lot of great mysteries
                        in JavadocTokenizer, but I think I cracked most
                        of them. The code is simpler and more modular.</span></div>
                    <div class=""><br class="">
                    </div>
                    <div class=""><span class="" style="font-size:
                        14px;">
                        <div class="" style="font-size: 12px;"><span
                            class="" style="font-size: 14px;">- The </span><span
                            style="font-size: 14px;" class="">new
                            scanner is slower to warm up due </span><span
                            style="font-size: 14px;" class="">to new
                            layers of method calls (ex. HelloWorld is 5%
                            slower). However, once warmed up, this new </span><span
                            style="font-size: 14px;" class="">scanner</span><span
                            style="font-size: 14px;" class=""> is faster
                            than the existing code. The JDK java code
                            compiles 5-10% faster.</span></div>
                        <div class=""><br class="">
                        </div>
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">All comments, suggestions and
                        contributions welcome.</span></div>
                    <div class=""><br class="">
                    </div>
                    <div class=""><span class="" style="font-size:
                        14px;">Cheers,</span></div>
                    <div class=""><span class="" style="font-size:
                        14px;"><br class="">
                      </span></div>
                    <div class=""><span class="" style="font-size:
                        14px;">--- Jim</span></div>
                  </div>
                </blockquote>
              </div>
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
    </blockquote>
  </body>
</html>