RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters
ralf.schmelter at sap.com
Mon Mar 2 13:24:22 UTC 2020
thanks for your patch. I have some remarks:
1. wcstombs_s creates UTF-16, as does MultiByteToWideChar. That wasn't the problem. The problem was that wcstombs_s uses the current locale, which is "C" initially. One is normally expected to set the locale to the system locale via setlocale(LC_ALL, ""), as it is done in ParseLocale() on Unix.
One could fix the original problem by setting the locale (probably not a good idea, since this changes the locale of the whole process). Or one could specify the correct locale to wcstombs like:
static _locale_t loc = _create_locale(LC_ALL, "");
err = ::_mbstowcs_s_l(&converted_chars, path_start, buf_len + 1, buf, buf_len, loc);
Both fix the problem. But I agree, using the MultiByteToWideChar method is better, since you don't have to create the locale.
There are some other places in the code where mbstowcs is used on Windows (canonicalize_md.c for example). They should probably fixed in the same way.
2.I don't think that using strlen(buf) as the number of wchars to allocate for the converted string is really wrong. There are no code pages supported by Java which would map a single byte to a Unicode character > U+ffff. But since one can get the always correct value via MultiByteToWideChar, this is futureproof and obviously right.
3. I would not use the CP_THREAD_ACP code page in MultiByteToWideChar and instead use CP_ACP, because the latter always uses the system locale. And this is the locale used to convert the arguments to bytes in the first place.
4. You replaced the malloc/memcpy for the string which will be used to call native_path() with a strdup. This leads to a memory overwrite, when a path like "C:" is used, since native_path() will convert it to "C:.". And the debug version of the VM will detect this, since the tail guard will be partially overwritten. You can check this by running:
make run-test TEST=gtest:os_windows
5. Using WCHAR and LPWSTR seems odd to me. Nobody uses LPCSTR or CHAR for char* and char. But if you think WCHAR is better, I would adjust the return value of the method and the functions which use it too.
6. When you handle a relative path, you now only allocate enough space to hold the relative path, not the generated absolute path. Since a relative path is likely to be smaller than the absolute path, the _wfullpath call later will likely fail. That was the reason we used JVM_MAXPATHLEN for relative paths. You could fix this like in this patch http://cr.openjdk.java.net/~rschmelter/webrevs/tmp/unicode_convert.patch
More information about the hotspot-runtime-dev