Dismal performance of String.intern()
aleksey.shipilev at oracle.com
Wed Jun 12 18:38:20 UTC 2013
On 06/12/2013 10:27 PM, Steven Schlansker wrote:
> Thank you everyone for the valuable input!
> On Jun 11, 2013, at 1:52 AM, Aleksey Shipilev
> <aleksey.shipilev at oracle.com> wrote:
>> On 06/11/2013 12:31 PM, Remi Forax wrote:
>>> On 06/10/2013 08:06 PM, Steven Schlansker wrote: Hi Steven, the
>>> main issue is that intern() doesn't work in isolation,
>>> I think it's better to change the JSON Parser implementation to
>>> use it's own cache (or not) and not rely on String.intern().
>> IMO, String.intern() is the gateway into VM symbol table, and
>> should be regarded as such. The improvements for String.intern(),
>> if any, then should be on the VM (native) side.
>> Also, I think most people confuse String interning and String
>> de-duplication. Using interning to improve memory footprint is the
>> overkill. Smart deduplicators may carefully balance the overheads
>> of deduplication vs. the memory footprint
> Yes, maybe this is in fact the real problem here. The JavaDoc for
> String does not in anyway reflect what you and the other JDK
> developers seem to assume -- that intern() is mostly a "for JVM use"
> method and is not really intended for use by end users. Maybe a
> documentation update to reflect that fact would be appropriate?
> Something indicating that the implementation is specialized for VM
> usage and is not optimal for end user code might help clear up
> confusion. Does that sound like a good idea?
Yes, that's a good idea. I'd like to hear the opinions of the real
corelibs maintainers though. Again, String.intern() performance
improvements are very much welcome, because it is still needed in many
> I understand that this is confusing the contract of the method with
> the implementation a bit. I just feel that the sentiment I get here
> ("Why would you do that? Don't use intern, just do it yourself!") is
> mismatched with the implicit fit-for-purpose I expect from core Java
> classes, and a warning might help reduce confusion.
I think in JDK 8 timeframe, we could even make a bold move and introduce
some other auxiliary method, say, String.deduplicate(), which relax the
interning requirement. That will untie us from initialization races,
when we can ignore deduplication while VM is still booting, and only
then allow e.g. ConcurrentHashMap to load and plug-in to deduplication.
Although, I vaguely remember the pending VM improvements for
deduplicating String on-the-fly without user hints (really, walk the
Strings during full GC), so it might be redundant.
More information about the core-libs-dev