RFR 8124977 cmdline encoding challenges on Windows

Kirk Shoop Kirk.Shoop at microsoft.com
Mon Jul 20 17:22:53 UTC 2015


Yes, the changes don't update the underlying console encoding. There is guidance that apps should not change the console encoding as it would affect other commands in the same console. A script must explicitly use chcp to change the codepage.

The change has a new option "file.encoding.unicode" to avoid changing the existing ones: "file.encoding" and "sun.jnu.encoding"; as they are used in several places and touching them could break backward compatibility.

The two system properties: "file.encoding" and "sun.jnu.encoding" are set by getEncodingInternal(). Then, during initialization file.encoding is set with the value of sun.jnu.encoding, so the two properties have the same value.

These properties are used to set the default charset for:
*	"platform strings" encoding
*	output streams (encoding of System.out is initialized by this default charset)
*	replacing of unsafe symbols on URL
*	ftp client
*	network client
Any Java application could rely on these properties as well.

On OSX and Linux both these properties are set according to currently active locale, while on Windows they are filled with system default locale.

This behavior could be observed with this simple test:

class Main {
    public static void main(String[] args) throws Exception {
        System.out.println("file.encoding=" + System.getProperty("file.encoding"));
        System.out.println("sun.jnu.encoding=" + System.getProperty("sun.jnu.encoding"));

Linux test:
$ sudo localedef -c -i ru_RU -f CP1251 ru_RU.CP1251

$ javac main.java

$ java Main

$ export LC_ALL=ru_RU.cp1251

$ java Main

Windows test:
>javac main.java

>java Main

>chcp 65001
Active code page: 65001

>java Main

So when default system locale differs from the active one, we have different behavior on Linux and Windows. The new options allow a windows user to select the same behavior that one would expect on unix. The switches can certainly be removed, if the compatibility impact is acceptable.

The defaultUnicodeCharset() method because it is called from java.io.Console() constructor as well as from LauncherHelper. So its scope should be wider than just LauncherHelper.java.

Kirk and Valery

> -----Original Message-----
> From: Xueming Shen [mailto:xueming.shen at oracle.com]
> Sent: Saturday, July 11, 2015 11:51 AM
> Hi Kirk,
> Two questions
> (1) Why do you need to change the "encoding" used by the java.io.Console class. My understanding
>       is that the console encoding is specifically used to "talk" to the underlying terminal, it should just
>       be the one used by the underlying terminal/console. I don't think the proposed change updates
>       the underlying console encoding (something like chcp)  when -Dwindows.UnicodeConsole=true
>       specified, if I read the webrev correctly. Instead, the Java_java_io_Console_encoding() probably
>       need to be updated to return utf8 if the cp == 65001, so if the underlying terminal/console is
>       using cp65001, the java.io.Console should encode/decode in utf8.
>       I would assume the encoding of java.io.Condole should have nothing to do with using
>       GetCommandLIneW() to parse the arguments in unicode in launcher?
> (2) Why do you need a defaultUnicodeCharset() in Charset class? Seems to me the scope should/could
>       be limited inside LauncherHelper.java?
> Thanks,
> -Sherman

More information about the core-libs-dev mailing list