JDK 9 RFR of 8039474: sun.misc.CharacterDecoder.decodeBuffer should use getBytes(iso8859-1)
xueming.shen at oracle.com
Thu Apr 10 19:17:22 UTC 2014
On 04/10/2014 12:03 PM, Chris Hegarty wrote:
> On 10 Apr 2014, at 19:50, Xueming Shen<xueming.shen at oracle.com> wrote:
>> On 04/10/2014 11:38 AM, Mike Duigou wrote:
>>> On Apr 10 2014, at 11:08 , Chris Hegarty<chris.hegarty at oracle.com> wrote:
>>>>> On 10 Apr 2014, at 18:40, Mike Duigou<mike.duigou at oracle.com> wrote:
>>>>>> On Apr 10 2014, at 03:21 , Chris Hegarty<chris.hegarty at oracle.com> wrote:
>>>>>>> On 10 Apr 2014, at 11:03, Ulf Zibis<Ulf.Zibis at CoSoCo.de> wrote:
>>>>>>> Hi Chris,
>>>>>>> Am 10.04.2014 11:04, schrieb Chris Hegarty:
>>>>>>>> Trivially, you could ( but of not have to ) use java.nio.charset.StandardCharsets.ISO_8859_1 to avoid the cost of String to CharSet lookup.
>>>>>>> In earlier tests Sherman and I have found out, that the cost of initialization of a new charsets object is higher than the lookup of an existing object in the cache.
>>>>>>> And it's even better to use the same String instance for the lookup which was used to cache the charset.
>>>>>> Interesting… thanks for let me know. Presumably, there is an assumption is StandardCharsets is not initialized elsewhere, by another dependency.
>>>>> Generally it's safe to assume that StandardCharsets will already be initialized. If it isn't initialized we should consider it an amortized cost.
>>>> I'm which case why would the string version be more performant than the version that already takes the Charset? Doesn't the string version need to do a lookup?
>>> There is a cache in StringCoder that is only used in the byte getBytes(String charsetName) but not in the byte getBytes(Charset charset) case. The rationale in StringCodding::decode(Charset cs, byte ba, int off, int len) may need to be revisited as it is certainly surprising that the string constant charset name usage is faster than the CharSet constant.
>> It's a surprising :-) In theory you can't cache the de/encoder of a charset from
>> external world, as the same charset might return a different de/encoder next
>> time. So it is decided to not cache the de/encoder for a coming charset back
>> then. It might be reasonable to cache those from the StandardCharsets though.
> I would say that it is more than reasonable. ;-) And it is surprising to me too that this usage is not as fast as a constant string.
Charset.equals() does explicitly mention "same canonical name" as below
* Tells whether or not this object is equal to another.
* <p> Two charsets are equal if, and only if, they have the same canonical
* names. A charset is never equal to any other type of object. </p>
* @return <tt>true</tt> if, and only if, this charset is equal to the
* given object
But it is very reasonable :-) to assume someone might pass in a home-made
charset implementation with the same canonical name as the one in our/jdk
charset repository. Then we have another debate on which one should be
used in this case.
More information about the core-libs-dev