Charset.lookupViaProviders uses new ServiceLoader instance on each miss.

Bernd Eckenfels ecki at
Tue Jan 13 00:37:55 UTC 2015

Am Mon, 12 Jan 2015 15:31:24 -0800
schrieb Xueming Shen <xueming.shen at>:

> We do have map based cache in standard and extended charset provider
> implementation already. The standard version is prehashedmap based, it
> probably should be fast enough. The extended version is a treemap
> based, which probably is slow.
> I'm a little surprised that the 2-elements cache in Charset actually
> makes big difference for the prehashedmap based cache in standard
> charsets provider.

One option would be to have the cache after the prehased fast
provider and/or at least check for well known standard charsets
before they pollute the cache: UTF8, ISO1 and ASCII.

Both will avoid that the often used but easily obtained charsets
drain the 2 slots.

> Yes, we don't cache the non-builtin provider.  It might trigger
> performance hit if certain number of charsets from that provider is
> frequently looked up.

Especially because not even the ServiceLoader (and therefore the
provider) is cached. And since AbstractCharsetProvider is caching in an
instance variable, its losing its memory every time. This may
affect custom providers.

BTW: I just found out that the stacktrace I was showing is from Java 6,
where the ExtendedCharsets is found via Service Loader and
reinitialized on each lookup (hence the many hasmap.put() operations).
I just tested it, on 7 and 8, this is no longer a problem. So the
really slow re-initizing of ExtendedCharsets confused the picture.

With 6 the service loader returns:
provider class=sun.nio.cs.ext.ExtendedCharsets
  instance=sun.nio.cs.ext.ExtendedCharsets at 5e743399

seems to not
only affect custom charset providers, but in my stacktraces I often see
the constructor of ExtendedCharsets which puts 2 dozen of
pre-initialized charstes (again and again).



> -Sherman
> On 01/12/2015 02:41 PM, Martin Buchholz wrote:
> > I don't disagree about 3 elements in the cache being better than 2.
> > When I implemented this crappy cache, it was all about "2 is better
> > than 1". Yeah, maybe we could get a "real" cache someday...
> >
> > On Mon, Jan 12, 2015 at 1:26 PM, Ulf Zibis<Ulf.Zibis at>  wrote:
> >
> >> Am 12.01.2015 um 20:42 schrieb Martin Buchholz:
> >>
> >>> Historical notes:
> >>>
> >>> I added the two-element cache many years ago, assuming that code
> >>> that repeatedly accessed more than 2 charsets would be rare.
> >>>
> >> I suspect this opinion, see:
> >> bugdatabase/
> >>
> >> -Ulf
> >>
> >>

More information about the core-libs-dev mailing list