RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes
stuart.marks at oracle.com
Mon May 20 23:53:48 UTC 2019
On 5/16/19 11:48 PM, Peter Levart wrote:
> As Stuart says, optimizations mustn't change semantics. So is there a possibly
> narrower optimization, that doesn't change the semantics?
> Here's a sketch of one:
> - create a marker interface (could be JDK-internal) and mark all conforming Set
> implementations with it
> - use AbstractSet.removeAll optimization only when both collections implement
> the marker interface
It's certainly reasonable for there to be some concern over the potential
performance loss. I don't particularly like the fact that even as we're
improving the semantics, that some programs will suffer a slowdown, even those
that aren't actually affected by the semantic issues.
However, I'm not convinced that we really understand the performance issue. When
this was brought up in January,  the claim was that this wasted thousands of
CPU hours. I'm skeptical. To back this up, in a followup message  the
original poster linked to a Stack Overflow question  that turns out to have
been derived from an old article by Jon Skeet . To be sure, this is an
interesting technical discussion, but it doesn't count as evidence that this is
an actual performance problem.
If you look at the example closely, it turns out that it calls removeAll() and
passes an argument a List with 300,000 elements. If you're doing operations on
large lists, maybe you shouldn't be surprised to see a slowdown. Maybe instead
you should have been surprised (in the iteration-switched case) that it ran
reasonably fast at all.
Nonetheless it remains a possibility that some code that worked properly and
performed reasonably well will suffer a slowdown. One way -- and I stress that
this is only one way -- that performance might be improved is to switch the
collection upon which iteration is done. The linked bug reports have a bunch of
suggestions for doing this, including checking to see whether the argument is a
List. To these I'd add the stipulation that, the iteration can be switched if it
can be determined that it is *safe* to do so; and the safety depends on whether
the two collections use compatible membership semantics.
Collections that use equals() are compatible with each other; keysets of
IdentityHashMap are compatible with each other; SortedSets that use identical
comparison methods are compatible with each other. But note, SortedSets that use
a comparisons methods that are consistent with equals are also compatible with
collections that use equals(). And it seems likely to be undecidable whether two
different (!=) comparison methods are in fact compatible.
In any case it would interesting to try to chew on this notion of compatible
membership semantics for a while. It might be difficult to come up with a
complete and correct test. It's probably easier to come up with an
approximation, like ConcurrentSkipListMap's equals() method.  I'm not sure
such a discussion would be fruitful, though, until we're sure we understand how
much of a performance problem this actually is.
More information about the core-libs-dev