file.encoding vs. sun.jnu.encoding(?) on OS X
Alan.Bateman at oracle.com
Fri Nov 9 05:12:21 PST 2012
On 08/11/2012 19:25, Scott Kovatch wrote:
> I want to bring up something that is causing a lot of confusion, and is generating a lot of bugs on OS X.
> What is the relationship between path names and file.encoding? Or, maybe more correctly, _why_ is there some relationship between path names and file.encoding? On OS X filenames are ALWAYS in UTF-8, so the current locale should never come into play.
> I was about to launch into a discussion (rant) about our use of nl_langinfo(CODESET) for file.encoding, but the more I look into it, I don't think that's the problem, though you can also make a case that all text files on OS X are UTF-8 by default as well. I'm wondering if this has something to do with sun.jnu.encoding being set to the same value as file.encoding.
> -- Scott K.
I've seen several mails on macosx-port-dev about this, although I think
several issues have been conflated which makes for confusing reading.
One of the issues is that HFS normalizes to a variant of NFD and the
changes that came via the Mac port weren't right. Sherman has
re-implemented this via 7130915 in jdk8 and it has been back-ported to
7u for 7u12. We've had confirmation from several people that this
resolves the issues that they were seeing.
The other issue, and I think the issue that you are trying to get at, is
that sun.jnu.encoding is being set based on the locale whereas you are
saying that it should always be UTF-8 on Mac. I think we need to create
a bug on that and it would be great if you can get technical references
so that we know this is the right thing to do. There are at least two
places in the property initialization that would need to be updated to
do this. I don't think we should change file.encoding as that would
change the default encoding for the file contents whereas the issues all
seem to related to the encoding/decoding of file names.
The final issue is just consistent use of sun.jnu.encoding. This
property was originally only used for Windows but now we have cases
where it may different to file.encoding on other platforms. Sherman
brought up 7050570, which is addressing something different again but
part of it does fix up the new file system API to use sun.jnu.encoding.
I haven't seen any mails on macosx-port-dev that look like this issue
but we should get it in anyway (Sherman - you ask why I hadn't pushed
that several months ago, sorry, it's been on my list as a low priority
item and low priority items have been starved of cycles).
More information about the jdk8-dev