possible problem with JNI GetStringUTFChars
david.holmes at oracle.com
Fri Jan 25 06:04:24 UTC 2019
On 25/01/2019 4:39 am, Alan Snyder wrote:
> Thank you. That post does explain what is happening, but leaves open the question of whether GetStringUTFChars should be changed.
> What is the value of the current implementation of GetStringUTFChars versus one that returns true UTF-8?
Well that's really a Hotspot question as it concerns JNI, but this is
ancient history. There's little point musing over the "why" of decisions
made back in the late 1990's. But I suspect the main reason is the
avoidance of embedded NUL characters.
The only bug report I can see on this (basically the same issue you are
reporting) was back in 2004:
so it simply has not been an issue. As per the SO article that Claes
referenced anyone needing true UTF8 has a couple of paths to achieve that.
>> On Jan 24, 2019, at 10:32 AM, Claes Redestad <claes.redestad at oracle.com> wrote:
>> Hi Alan,
>> GetStringUTFChars unfortunately doesn't give you true UTF-8, but a modified UTF-8 sequence
>> as used by the VM internally for historical reasons.
>> See answers to this related question on SO (which contains links to official docs):
>> On 2019-01-24 19:23, Alan Snyder wrote:
>>> I am having a problem with file names that contain emojis when passed to a macOS system call.
>>> Things work when I convert the path to bytes in Java, but fail (file not found) when I convert the path to bytes in native code using GetStringUTFChars.
>>> For example, where String.getBytes() returns
>>> -16 -97 -115 -69
>>> GetStringUTFChars returns:
>>> -19 -96 -68 -19 -67 -69
>>> I’m not a UTF expert, so can someone say whether I should file a bug report?
>>> (Tested in JDK 9, 11, and a fairly recent 12)
More information about the core-libs-dev