Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

Xueming Shen xueming.shen at
Thu Apr 28 06:34:27 UTC 2011


This is motivated by Neil's request to optimize common-case UTF8 path 
for native ZipFile.getEntry calls [1].
As I said in my replying email [2] I believe a better approach might be 
to "patch" UTF8 charset directly to
implement sun.nio.cs.ArrayDecoder/Encoder interface to speed up the 
coding operation for array based
encoding/decoding under certain circumstance, as we did for all single 
byte charsets in #6636323 [3]. I
have a old blog [4] that has some data for this optimization.

The original plan was to do the same thing for our new UTF8 [5] as well 
in JDK7, but then (excuse, excuse)
I was just too busy to come back to this topic till 2 days ago. After 
two days of small tweaking here and there
and testing those possible corner cases I can think of, I'm happy with 
the result and think it might be
worth sending it out for a codereview for JDK7, knowing we only have 
couple days left.

The webrev is at

Those tests are supposed to make sure the coding result from the new 
paths for String.getBytes()/
toCharArray() matches the result from the existing implementation.

The performance results of running StrCodingBenchmarkUTF8 (included in 
webrev) on my linux
box in -client and -server mode respectively are included at

The microbenchmark measures 1-byte, 2-byte, 3-byte and 4 bytes utf8 bits 
separately with different
length of data (from 12 bytes to thousands)



More information about the core-libs-dev mailing list