RFR: 8170769 Provide a simple hexdump facility for binary data

Vincent Ryan vincent.x.ryan at oracle.com
Sun Dec 2 22:54:53 UTC 2018

Thanks for the thorough review.
Responses in-line.

> On 26 Nov 2018, at 19:49, Roger Riggs <Roger.Riggs at oracle.com> wrote:
> Hi,
> Thanks for updating this proposal.  Some comments on the javadoc/spec.
> The class name Hex isn't very evocative of its function.    
> HexFormat would convey a more complete idea as to its function and be similar to
> to the existing DecimalFormat and NumberFormat classes though they do not share
> any of the APIs or framework.  


> - The 2nd example using Files.newInputStream is not recommended
> since the stream to the file may not be closed.  
> The recommended form would use try-with-resources.
>  try (InputStream is = Files.newInputStream(Paths.get("mydata"))) {
>      Hex.dumpAsStream(is, 64,
>        (offset, chunk, fromIndex, toIndex) ->
>            String.format("%d %s",
>                offset / 64 + 1,
>                Hex.toPrintableString(chunk, fromIndex, toIndex)))
>        .forEachOrdered(System.out::println);
>  } catch (IOException ioe) { 
>   ... 
>  }
> The 'forEachOrdered' should not be necessary and may raise questions about why.
> if there's no good reason, use 'forEach’.


> - The HEXDUMP_FORMATTER should be explicit about the format it generates.
>   Though it may declare itself compatible with hexdump(1) it should specify
>   the format it produces to avoid an external reference that might change unexpectedly.


> - toFormattedHexString:
> References to the line-separator should be javadoc linked to System#lineSeparator.

OK. And maybe shorten this method to simply: toFormattedString ?

> Alternatively, should this class follow the lead of String.align/indent and use
> the normalized line terminator "\n" consistently? In which case, it should be explicit.

> Clarify exactly the behavior:
> "If the final chunk is less than 16 +bytes+ then +the result+ is padded with spaces to match the length of the preceding chunks. "
> Only the built-in formatter asserts that every result is the same length.  A provided formatter
> may not conform.  soften "will be" -> "may be" in "the final chunk …"


> - everywhere: "binary buffer" -> "byte buffer"  or byte array


> - toPrintableString:  The ASCII limitation seems quite dated, 8-bit clean is the norm and
> except for control characters  and 0x7f, the rest of the range is printable in most Locales.  
> Strict conformance to hexdump is not a requirement.

I can see the benefit to including the accented letters > 0x7F but is it really useful to also display
graphical and non-printable characters > 0x7F ?

> - dumpAsStream(InputStream)
>   It is inconsistent (and possibly wrong) that the final chunk may be shorter since the HEXDUMP_FORMATTER
>   uses the formatter that always pads.  (Btw, the implementation appends \n and should be using lineSeparator.)
>    "If the input is not a multiple of 16 bytes then the final chunk will be shorter than the preceding chunks.”

I’ve clarified the language. The final chunk will indeed be shorter but the output 'will be' padded by HEXDUMP_FORMATTER
and 'may be' padded when a custom formatter is supplied.

> - dumpAsStream(bytes, from, to, chunk, formatter)
>  Can it more explicit about the unchecked exception?
>  "If an error occurs in the formatter then an unchecked exception will be thrown from the underlying Stream method.”

What do you suggest? The formatter may throw any exception and it doesn’t get thrown until the stream is processed.

> - Another thought about the methods returning Stream<String>.
>   They require a concrete string to be created for each chunk.  
>   There may be some efficiency and flexibility to be gained if formatters can return a CharSequence.
>   In many cases it would be a string, but it would allow the formatter to return a StringBuilder
>   that could be used directly without needing to construct a short lived string.

Sounds OK but I’ll need to prototype this and get back to you.

> Thanks, Roger
> p.s.
> I wrote code for the use case of formatting bytes in the form that can be compiled with javac.
> it took rather more code than I expected but I don't have any specific suggestions.
>         byte[] b = ...;
>         final String indent = " ".repeat(8);
>         final StringBuilder sb = new StringBuilder(16 * 5);
>         sb.append(indent);
>         System.out.print("    byte[] bytes = {\n");
>         try (ByteArrayInputStream is = new ByteArrayInputStream(b)) {
>             dumpAsStream(is, 16, (o, c, s, e) -> {
>                 sb.setLength(indent.length());
>                 for (int i = s; i < e; i++) {
>                     sb.append(c[i]);
>                     sb.append(", ");
>                 }
>                 sb.append(System.lineSeparator());
>                 return sb.toString();
>             }).forEach(s -> System.out.print(s));
>         }
>         System.out.print("    }\n");
> On 11/23/2018 09:51 AM, Vincent Ryan wrote:
>> Hello, 
>> Please review this proposal for a new API to conveniently generate and display binary data using hexadecimal string representation. 
>> It supports both bulk and stream operations and it can also generate the well-known hexdump format [1]. 
>> This latest revision addresses review comments provided earlier. 
>> These include adding methods to support for data supplied in a ByteBuffer, exposing the default formatter implementation as a public 
>> static and dropping the superfluous 'Hex' term from several method names. 
>> Thanks 
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8170769 <https://bugs.openjdk.java.net/browse/JDK-8170769> 
>> API: http://cr.openjdk.java.net/~vinnie/8170769/javadoc.06/api/java.base/java/util/Hex.html <http://cr.openjdk.java.net/~vinnie/8170769/javadoc.06/api/java.base/java/util/Hex.html>
>> Webrev: http://cr.openjdk.java.net/~vinnie/8170769/webrev.06/ <http://cr.openjdk.java.net/~vinnie/8170769/webrev.06/> 
>> ____ 
>> [1] https://docs.oracle.com/cd/E88353_01/html/E37839/hexdump-1.html <https://docs.oracle.com/cd/E88353_01/html/E37839/hexdump-1.html> 

More information about the core-libs-dev mailing list