JDK 9 RFR of 8039474: sun.misc.CharacterDecoder.decodeBuffer should use getBytes(iso8859-1)

Chris Hegarty chris.hegarty at oracle.com
Thu Apr 10 19:03:16 UTC 2014

On 10 Apr 2014, at 19:50, Xueming Shen <xueming.shen at oracle.com> wrote:

> On 04/10/2014 11:38 AM, Mike Duigou wrote:
>> On Apr 10 2014, at 11:08 , Chris Hegarty<chris.hegarty at oracle.com>  wrote:
>>>> On 10 Apr 2014, at 18:40, Mike Duigou<mike.duigou at oracle.com>  wrote:
>>>>> On Apr 10 2014, at 03:21 , Chris Hegarty<chris.hegarty at oracle.com>  wrote:
>>>>>> On 10 Apr 2014, at 11:03, Ulf Zibis<Ulf.Zibis at CoSoCo.de>  wrote:
>>>>>> Hi Chris,
>>>>>> Am 10.04.2014 11:04, schrieb Chris Hegarty:
>>>>>>> Trivially, you could ( but of not have to ) use java.nio.charset.StandardCharsets.ISO_8859_1 to avoid the cost of String to CharSet lookup.
>>>>>> In earlier tests Sherman and I have found out, that the cost of initialization of a new charsets object is higher than the lookup of an existing object in the cache.
>>>>>> And it's even better to use the same String instance for the lookup which was used to cache the charset.
>>>>> Interesting… thanks for let me know.  Presumably, there is an assumption is StandardCharsets is not initialized elsewhere, by another dependency.
>>>> Generally it's safe to assume that StandardCharsets will already be initialized. If it isn't initialized we should consider it an amortized cost.
>>> I'm which case why would the string version be more performant than the version that already takes the Charset? Doesn't the string version need to do a lookup?
>> There is a cache in StringCoder that is only used in the byte[] getBytes(String charsetName) but not in the byte[] getBytes(Charset charset) case. The rationale in StringCodding::decode(Charset cs, byte[] ba, int off, int len) may need to be revisited as it is certainly surprising that the string constant charset name usage is faster than the CharSet constant.
> It's a surprising :-) In theory you can't cache the de/encoder of a charset from
> external world, as the same charset might return a different de/encoder next
> time. So it is decided to not cache the de/encoder for a coming charset back
> then. It might be reasonable to cache those from the StandardCharsets though.

I would say that it is more than reasonable. ;-) And it is surprising to me too that this usage is not as fast as a constant string.


> -Sherman
>> Mike

More information about the core-libs-dev mailing list