java-nio-charset-enhanced -- Milestone 4 is released

Martin Buchholz martinrb at
Fri Mar 27 21:49:24 UTC 2009

Again, Ulf, I love the sort of stuff you're doing.

I hope to be able to contribute some enginering
to your effort myself someday.

In the meantime, we need some infrastructure to guarantee that
the behavior of the charsets is completely unchanged as we optimize.
I have some code left behind at Sun to do that, i.e. compare different
JDKs w.r.t charset compatibility.
Hopefully Sun engineers can resurrect that code and perhaps put it
into a public mercurial repo somewhere.

Another approach is to take the code in tests like my
Find{En,De} tests which compare direct
vs. regular buffers, and retarget it to compare two different jdks.

It's too difficult to give credit to external contributors.
One problem is that the Contributed-by: line is a red flag to
lawyers and other folks that might cause the legality of the change
to be questioned without end.  Let's try to get Ulf a proper commit bit
and make sure the legal questions come to an end.


On Fri, Mar 27, 2009 at 13:29, Ulf Zibis <Ulf.Zibis at> wrote:
> Hi folks,
> milestone 4 of charset enhancement is released.
> - I reduced the jar-footprint, concerning entire single-byte needs, compared
> to original JDK 6 binaries, down to 7 %, which also should perform class
> loading, (not to forget: encoder maps are lazy initialized), even though
> there are added 21 specialized coder algorithms.
> - In this release there is only 1 class <SingleByteCharset> for all
> single-byte charsets, which reads decoder mapping + all names including
> aliases from a small data file (69..731 Bytes, average 250 Bytes). This is
> possible, because numerous charsets can inherit their mappings (256 2-byte
> chars) from each other, and empty or 1:1 ranges (especially \u0000..\u007F)
> are filled by constructor.
> - Additionally a set of 7 Decoder and 14 Encoder classes do there work,
> specially speed + memory optimised for the charsets, having diverse
> character spreading and frequency of occurrence. A special MapCalculator
> class for playing with different parameters is provided in the test package.
> - The aliases and historical names should no more statically and entirely
> loaded, provided and linked from StandardCharsets class. They additionally
> could be easy edited in files standard-charsets and extended-charsets (refer
> Bug Id: 6795538). If some day they are defined entirely upper-case, they
> could be omitted completely, as they are redundantly case-standardised
> existing in the FastCharsetProvider lookup maps. Determining the
> 'contains()' references by this way would be also reasonable (refer Bug Id:
> 6761481), but containment of ASCII is already calculated automatically.
> See my projects home: --->
> I believe, these techniques could also be used for most multi-byte charsets,
> especially inheriting maps to reduce entire charsets footprint.
> Outlook Milestone 5 : Final performance optimisation by dedicated inlining,
> exception catching, surrogate handling etc..
>                                   Urgently waiting for Christian Thalinger's
> optimization of "widening conversions".
> Happy easter,
> -Ulf
> P.S.: I'm on the way, providing changesets slice by slice for OpenJDK 7.
> BTW: Is there a way to add author and/or contributor annotation in the
> sources to honour the investigation of external collaborators (almost 1 year
> in my case)?

More information about the core-libs-dev mailing list