Charset.lookupViaProviders uses new ServiceLoader instance on each miss.

Bernd Eckenfels ecki at
Mon Jan 12 23:57:27 UTC 2015


I see typical use of "ASCII", "ISO88591", "UTF8" and then the 1-2
platform/filename encodings in case of Windows (ANSI+OEM). That makes at least 5
commonly used ones (not sure about UTF-16((BL)E) internally?).

And this hopes, that all charsets are not used by any of their aliases.

But of course, this does not mean they are requested in high frequency
and alternating. (Maybe we should instrument that?)

The problem is the standard charset provider is rather fast, but it will also evict the hit from the cache. Thats why I was thinking a somewhat larger cache which is only used after first+2nd level and standard/extended charset lookups (ie before ServiceLoader)

While digging through the source, I found a place where defaultCharset() -> string -> string verify -> string lookup is done:

Can this be changed to:

# if (csn == null)
#   return return new StreamDecoder(in, l, Charset.defaultCharset());

and there are a few places where literal lookup can be replaced with static StandardCharsets.XXX to further reduce number of lookups: (defaultCharset())


Am Mon, 12 Jan 2015 14:41:20 -0800
schrieb Martin Buchholz <martinrb at>:

> I don't disagree about 3 elements in the cache being better than 2.
> When I implemented this crappy cache, it was all about "2 is better
> than 1". Yeah, maybe we could get a "real" cache someday...
> On Mon, Jan 12, 2015 at 1:26 PM, Ulf Zibis <Ulf.Zibis at> wrote:
> >
> > Am 12.01.2015 um 20:42 schrieb Martin Buchholz:
> >
> >> Historical notes:
> >>
> >> I added the two-element cache many years ago, assuming that code
> >> that repeatedly accessed more than 2 charsets would be rare.
> >>
> >
> > I suspect this opinion, see:
> > bugdatabase/
> >
> > -Ulf
> >
> >

More information about the core-libs-dev mailing list