# AbstractCollection.removeAll(Collection) and AbstractSet.removeAll(Collection)

Jeff Hain jeffhain at rocketmail.com
Wed Jul 13 20:02:07 UTC 2011

```> It doesn't work because the complexity of size() can be O(n).
> In my opinion, only the checks for emptyness are Ok.

The calls to "size()" are already there in current implementation,
and I think with the idea that in most cases they are O(1).
If they are O(n), it's of course pointless to use them, since it
makes the whole treatment at least O(max(n1,n2)) (with n1 = this.size()
and n2 = c.size()), while simply iterating on "c" without checking
sizes makes it O(n2) (considering this.remove(Object) to be in O(1) (*)).
But, then, the whole treatment remains linear at worse, and if,
as it is often the case, they are O(1), it allows to make it
O(min(n2,n1)) if "c" is also a set.
Considering sizes  therefore allows to have O(max(n1,n2))  instead
of O(n2) in rare cases, and O(min(n1,n2)) instead of O(n2) in a fair
amount of cases, which can be considered a win. Also, if size()'s
are O(n), you are most likely doing concurrency, so you are smart,
know what you do, and can make your own removeAll ;)
What seems to have been overlooked in current implementation,
is that this optimization, that works for sets, makes things actually
worse if "c" is a list, in which case it's best to always iterate on
"c", iterating on "this" resulting in a quadratic behavior.

To make things simple, and have a treatment always in O(n2),
we could just stick to iterating on "c" without checking sizes:

public boolean AbstractSet.removeAll(Collection<?> c) {
if (isEmpty() || c.isEmpty()) {
return false;
}
boolean modified = false;
for  (Iterator<?> i = c.iterator(); i.hasNext(); ) {
if (remove(i.next())) {
modified = true;
if (isEmpty()) {
break;
}
}
}
return modified;
}

(*) We could refine considering the case of tree sets.

> The overriden equals method should be no problem
> if the Implementation is an set. So it should be
> possible to exit in this case in the AbstractSet
> Implementation, isn't it?

For AbstractCollection.removeAll(Collection),
that wouldn't work, since each instance to remove
in "c" can have multiple occurrences in "this".

For  AbstractSet.removeAll(Collection),
that wouldn't work either, because if "c" is
based on "equals", and "c" is an identity set
(based on "=="), multiple elements of "this"
could be equal to a same element of "c",
and removing as many as "c" contains doesn't
mean you can't find more elements in "this"
that would be equal to elements of "c".

A best effort implementation, that should
work if collections are not sets of different
kinds, would look like this:

public boolean AbstractSet.removeAll(Collection<?> c) {
if (isEmpty() || c.isEmpty()) {
return false;
}
boolean modified = false;
final int n1 = size();
final int n2 = c.size();
final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2;
if (n2 <= n1 *  (long)cContainsAssumedComplexity) {
for (Iterator<?> i = c.iterator(); i.hasNext(); ) {
if (remove(i.next())) {
modified = true;
if (isEmpty()) {
break;
}
}
}
} else {
// Here, we know "c" is a Set
// (else, we tested "n2 <= n1 * (long)n2",
// which is always true for n1 > 0 and n2 > 0).
if (n2 != Integer.MAX_VALUE)  {
// Will stop iterating once
// we removed from "this" as
// many elements as "c" contains.
// XXX Does not work if "this" is an identity set,
// and "c" based on "equals".
int toRemove = n2;
for (Iterator<?> i = iterator(); i.hasNext(); ) {
if (c.contains(i.next())) {
i.remove();
modified =  true;
if (--toRemove == 0) {
break;
}
}
}
} else {
// "c" might contain more than Integer.MAX_VALUE
// elements, so we can't break early.
for (Iterator<?> i = iterator(); i.hasNext(); ) {
if (c.contains(i.next()))  {
i.remove();
modified = true;
}
}
}
}
return modified;
}

Jeff

________________________________
De : Rémi Forax <forax at univ-mlv.fr>
À : core-libs-dev at openjdk.java.net
Envoyé le : Mer 13 juillet 2011, 18h 27min 48s
Objet : Re: AbstractCollection.removeAll(Collection) and
AbstractSet.removeAll(Collection)

It doesn't work because the  complexity of size() can be O(n).
In my opinion, only the checks for emptyness are Ok.

Rémi

On 07/13/2011 05:40 PM, Sebastian Sickelmann wrote:
> Jeff Hain wrote:
>> Hello.
>>   I have some remarks about the methods named in the subject (JDK 6u26, JDK
>> 7b147).
>>     1) AbstractCollection.removeAll(Collection)
>>     This method could return immediately if the specified collection is
>> empty, not to iterate uselessly over all "this", or if "this" is empty, not
to
>> create an iterator for nothing.
>>     I also thought about returning as soon as as many elements as the
>> specified collection contains have been removed, but it would not behave
>>  properly with collections containing more than Integer.MAX_VALUE elements,
>> or with some overriden equals methods etc.
>  The overriden equals method should be no problem if the Implementation  is an
>set. So it should be possible to exit in this case in the  AbstractSet
>Implementation, isn't it?
>
>>     2) AbstractSet.removeAll(Collection)
>>     Let n1 be the size of "this", and n2 the size of the specified
collection.
>>     This method is in O(n1*n2) if the specified collection has a
>>contains(Object)
>> method in O(n2) (as an ArrayList does), and n1<= n2.
>>     I think a better implementation could be done by taking care of the
>> "setness" of the specified collection, and taking care of collections sizes
>> differently.
>>     Also:
>> - when iterating on the specified collection, the method  could return as
>> soon as "this" is empty, not to continue useless iterations.
>> - when iterating on "this", the method could return if the specified
>> collection is empty, not to iterate over all "this" uselessly.
>>
>>   # Current implementation:
>> public boolean removeAll(Collection<?>  c) {
>>     boolean modified = false;
>>     if (size()>  c.size()) {
>>        for (Iterator<?>  i = c.iterator(); i.hasNext(); )
>>           modified |= remove(i.next());
>>     } else {
>>        for (Iterator<?>  i = iterator(); i.hasNext(); ) {
>>           if (c.contains(i.next())) {
>>              i.remove();
>>               modified = true;
>>           }
>>        }
>>     }
>>     return modified;
>> }
>>
>>   # Alternative implementation (not tested):public boolean
>> removeAll(Collection<?> c) {
>>     // Quick check on emptinesses, to make later treatments simpler.
>>     if (isEmpty() || c.isEmpty()) {
>>        return false;
>>     }
>>     boolean modified = false;
>>     final int n1 = size();
>>     final int n2 = c.size();
>>     // If the "c" is not a set, we are pessimistic about its "contains"
>>    // method
>>    // complexity.
>>     final int  cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2;
>>     // If we iterate over "this", assumed complexity is O(n2).
>>     // If we iterate over "c", assumed complexity is O(n1 *
>>cContainsAssumedComplexity).
>>     // Cast to long to avoid overflow.
>>     if (n2<  n1 * (long)cContainsAssumedComplexity) {
>>        for (Iterator<?>  i = c.iterator(); i.hasNext(); ) {
>>           if (remove(i.next())) {
>>              modified = true;
>>              if (isEmpty()) {
>>                 break;
>>              }
>>           }
>>         }
>>     } else {
>>        for (Iterator<?>  i = iterator(); i.hasNext(); ) {
>>           if (c.contains(i.next())) {
>>              i.remove();
>>              modified = true;
>>           }
>>        }
>>     }
>>     return modified;
>> }
> look good to me.
>
>>
>>
>> Regards,
>>
>> Jeff
>>
```