JDK-6982173: Small problem causing thousands of wasted CPU hours

Stuart Marks stuart.marks at oracle.com
Fri Jan 25 00:07:42 UTC 2019

On 1/23/19 12:05 PM, Jan Peter Stotz wrote:
> like many other I ran into bug JDK-698217 which is about AbstractSet.removeAll() 
> and it's "aptitude" in wasting CPU time if you accidentally hit this bug. And 
> there are hundred of developers hitting this bug every year.
> I simply don't understand why there was no progress in 8 years, on a severe 
> performance issue (a removeAll method on an efficient set that can require 
> O(n^2)!) caused by code that was written to speed-up the removeAll implementation.
> Which makes this bug worse is that it is triggered by the relative size of the 
> current set compared to the collection passed as parameter.
> Therefore for most developers this means not to use this buggy function at all 
> (once they realized how worse it is).

I wasn't aware that hundreds of developers are hitting this bug every year. I 
haven't seen any mention of it (besides in the bug database) on Twitter, on 
Reddit, on DZone, at the conferences I attend, or in several years of 
core-libs-dev emails. Well, it was mentioned on core-libs-dev once in 2011 [1] 
although that was a suggestion for improvement, not a complaint about performance.

Nonetheless this is a real problem, and if it's causing difficulties we can 
certainly take a look at it.

There is, however, more to the story. The ACTUAL problem is a semantic one; see 
JDK-6394757. [2] Briefly, consider x.removeAll(y). Depending on the relative 
sizes of x and y, this method might end up using either x's or y's definition of 
membership, which could differ from each other. (See the bug report for an 
example.) Thus the semantics of this method depend upon the relative sizes of 
the collections, which is arguably flawed.

Worse, this behavior is specified to iterate this set or the argument, depending 
upon their relative sizes. [3] So, fixing this will require an incompatible 
specification change.

The obvious way to fix this is to get rid of the "optimizations" (that turn out 
not to be optimizations at all in some cases) and replace it with a simple loop:

     public boolean removeAll(Collection<?> c) {
         boolean modified = false;
         for (Object e : c)
             modified |= remove(e);
         return modified;

I would argue that iterating the argument and calling remove() on "this" are the 
right semantics, because you want set membership to be determined by this set, 
not by whatever collection you pass as an argument. However, I note that 
AbstractCollection.removeAll and other removeAll implementations iterate over 
"this" and do a contains() check on the argument. The revised 
AbstractSet.removeAll would be an outlier here, though it makes sense to me to 
do it this way.

Is it worth the incompatibility?


[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-July/007125.html

[2] https://bugs.openjdk.java.net/browse/JDK-6394757


More information about the core-libs-dev mailing list