RFR 8124977 cmdline encoding challenges on Windows

Kirk Shoop Kirk.Shoop at microsoft.com
Thu Aug 6 02:45:09 UTC 2015

> -----Original Message-----
> From: Xueming Shen [mailto:xueming.shen at oracle.com]
> Sent: Monday, July 20, 2015 11:50 AM
> On 07/20/2015 10:22 AM, Kirk Shoop wrote:
> > So when default system locale differs from the active one, we have
> different behavior on Linux and Windows. The new options allow a windows
> user to select the same behavior that one would expect on unix. The
> switches can certainly be removed, if the compatibility impact is acceptable.
> Kirk, on Windows file.encoding is from the user locale and the
> sun.jnu.encoding is from the system locale setting. sun.jnu.encoding is
> purely for those text encoding sensitive jnu functiond to communicate with
> the underlying windows system api, when the system locale and the user
> locale are set to different value. On unix/linux/osx, these two are always set
> to the same value. Yes, they might be input/output issue if the encoding
> used by the console (oem codepage) is not compatible with the encoding
> used by the "user locale"
> and you are trying to use System.in/out/err for the input/output to the
> console.
> Here is the original CCC request regarding the sun.jnu.encoding, which might
> provide some background info.
> http://cr.openjdk.java.net/~sherman/4958170.html
> If you/we are NOT going to change the encoding used by the underlying
> console, I don't think we need/should change the "encoding" used by the
> java.io.Console. As I suggested in my previously email, the
> Java_java_io_Console_encoding() implementation probably need to update
> to return utf8 if the cp == 65001 (that was 10 years ago, I'm not sure if the
> 65001 was really used back then when we wrote this code).  My
> understanding of the issue here is that if you continue to use the "A" version
> of the API to parse/get the arguments, and try to solve the possible issue
> triggered by the "incompatibility" of the oem encoding used by the console
> and the user locale encoding used by the System.in/ out/err, it's fine to
> define a new system property to specify a preferred encoding for the
> launcher to use, but this "preferred" encoding should not be used by
> java.io.Console.
> But isn't it more reasonable to simply always use the "W" version for this
> purpose in launcher?
> -Sherman

Thank you for the valuable feedback. We have vastly simplified the original patch. 
The new webrev is here:

This webrev uses GetCommandLineW on windows to retrieve the UCS16 commandline and also supports the 65001(UTF-8) codepage (set by chcp 65001) so that when -Dsun.jnu.encoding="UTF-8" is supplied the console output (stdout & stderr) will be in UTF8.

There are no new commandline switches.

Please let us know if there is anything else that needs improvement.

Kirk and Valery

More information about the core-libs-dev mailing list