AbstractCollection.removeAll(Collection) and AbstractSet.removeAll(Collection)
Jeff Hain
jeffhain at rocketmail.com
Fri Jul 15 19:51:05 UTC 2011
So, it looks like bug 4266874 pointed out the same performance issue I did,
and that its fix did break things as described in bug 6394757
(Why wasn't the AbstractSet implementation removed?
Not to break performances?).
A work-around for the quadratic behavior is proposed in
discussion about bug 5028425 : "foo.removeAll(new HashSet(baz));"
The fix I proposed makes use of the faulty algorithm systematically if
"c" is not a Set, whereas as it is now it only occurs if "size() > c.size()".
Anyway, in case anyone would like to override AbstractSet.removeAll(Collection)
with it, here is a n'th version of this removeAll operation, which uses possibly
O(n)
"isEmpty()" in less cases (not in the loop, and allowing "good garbage"
iterators instead),
and only checks sizes if "c" is a Set:
public boolean removeAll(Collection<?> c) {
boolean modified = false;
if (c instanceof Set<?>) {
// Not using "c.isEmpty()", which might be in O(n):
// if "c" is empty, "c.size()" should be fast even if also in O(n),
// and if "c" is not empty, we need to compute it anyway.
final int cSize = c.size();
if (cSize == 0) {
return false;
}
final int thisSize = size();
if (thisSize < cSize) {
if (thisSize == 0) {
return false;
}
for (Iterator<?> i = iterator(); i.hasNext(); ) {
if (c.contains(i.next())) {
i.remove();
modified = true;
}
}
return modified;
}
// Will iterate on "c".
} else {
if (isEmpty()) {
return false;
}
}
// Sometimes bad (bug 6394757) but often faster.
for (Iterator<?> i = c.iterator(); i.hasNext(); ) {
modified |= remove(i.next());
}
return modified;
}
Jeff
________________________________
De : Jason Mehrens <jason_mehrens at hotmail.com>
À : mike.duigou at oracle.com; jeffhain at rocketmail.com
Cc : core-libs-dev at openjdk.java.net
Envoyé le : Jeu 14 juillet 2011, 0h 19min 04s
Objet : RE: AbstractCollection.removeAll(Collection) and
AbstractSet.removeAll(Collection)
Mike,
The history is in the evaluation of
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6394757
I don't think that even adding a empty check can be considered an optimization
when dealing with two abstract things. The iterator creation here is 'good
garbage' and worst case AbstractCollection.isEmpty could be O(N).
JDKs 5 and 6 shipped with two collections (CHM views) that had O(N) isEmpty
methods.
I think the following evaluations really cover the issues regarding this:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6519662
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5028425
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6633605
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6250140
Jason
> Subject: Re: AbstractCollection.removeAll(Collection) and
>AbstractSet.removeAll(Collection)
> From: mike.duigou at oracle.com
> Date: Wed, 13 Jul 2011 13:34:43 -0700
> To: jeffhain at rocketmail.com
> CC: core-libs-dev at openjdk.java.net
>
> It looks to me that AbstractCollection could benefit from a c.isEmpty() check
>but that's the only optimization I currently see. It's necessary to iterate the
>target collection because values can be repeated for some collection types.
>
> I agree that AbstractSet.removeAll appears to make a poor judgement that the
>cost of this.contains() and c.contains() are similar. I'd like to know if there
>are common usage patterns where iteration over c isn't the best choice. (adding
>isEmpty() checks for both this and c of course).
>
> Josh? Martin? Doug? Any history why AbstractSet has this optimization?
>
> Mike
>
> On Jul 13 2011, at 13:02 , Jeff Hain wrote:
>
> >
> >> It doesn't work because the complexity of size() can be O(n).
> >> In my opinion, only the checks for emptyness are Ok.
> >
> > The calls to "size()" are already there in current implementation,
> > and I think with the idea that in most cases they are O(1).
> > If they are O(n), it's of course pointless to use them, since it
> > makes the whole treatment at least O(max(n1,n2)) (with n1 = this.size()
> > and n2 = c.size()), while simply iterating on "c" without checking
> > sizes makes it O(n2) (considering this.remove(Object) to be in O(1) (*)).
> > But, then, the whole treatment remains linear at worse, and if,
> > as it is often the case, they are O(1), it allows to make it
> > O(min(n2,n1)) if "c" is also a set.
> > Considering sizes therefore allows to have O(max(n1,n2)) instead
> > of O(n2) in rare cases, and O(min(n1,n2)) instead of O(n2) in a fair
> > amount of cases, which can be considered a win. Also, if size()'s
> > are O(n), you are most likely doing concurrency, so you are smart,
> > know what you do, and can make your own removeAll ;)
> > What seems to have been overlooked in current implementation,
> > is that this optimization, that works for sets, makes things actually
> > worse if "c" is a list, in which case it's best to always iterate on
> > "c", iterating on "this" resulting in a quadratic behavior.
> >
> > To make things simple, and have a treatment always in O(n2),
> > we could just stick to iterating on "c" without checking sizes:
> >
> > public boolean AbstractSet.removeAll(Collection<?> c) {
> > if (isEmpty() || c.isEmpty()) {
> > return false;
> > }
> > boolean modified = false;
> > for (Iterator<?> i = c.iterator(); i.hasNext(); ) {
> > if (remove(i.next())) {
> > modified = true;
> > if (isEmpty()) {
> > break;
> > }
> > }
> > }
> > return modified;
> > }
> >
> > (*) We could refine considering the case of tree sets.
> >
> >
> >
> >> The overriden equals method should be no problem
> >> if the Implementation is an set. So it should be
> >> possible to exit in this case in the AbstractSet
> >> Implementation, isn't it?
> >
> > For AbstractCollection.removeAll(Collection),
> > that wouldn't work, since each instance to remove
> > in "c" can have multiple occurrences in "this".
> >
> > For AbstractSet.removeAll(Collection),
> > that wouldn't work either, because if "c" is
> > based on "equals", and "c" is an identity set
> > (based on "=="), multiple elements of "this"
> > could be equal to a same element of "c",
> > and removing as many as "c" contains doesn't
> > mean you can't find more elements in "this"
> > that would be equal to elements of "c".
> >
> > A best effort implementation, that should
> > work if collections are not sets of different
> > kinds, would look like this:
> >
> > public boolean AbstractSet.removeAll(Collection<?> c) {
> > if (isEmpty() || c.isEmpty()) {
> > return false;
> > }
> > boolean modified = false;
> > final int n1 = size();
> > final int n2 = c.size();
> > final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2;
> > if (n2 <= n1 * (long)cContainsAssumedComplexity) {
> > for (Iterator<?> i = c.iterator(); i.hasNext(); ) {
> > if (remove(i.next())) {
> > modified = true;
> > if (isEmpty()) {
> > break;
> > }
> > }
> > }
> > } else {
> > // Here, we know "c" is a Set
> > // (else, we tested "n2 <= n1 * (long)n2",
> > // which is always true for n1 > 0 and n2 > 0).
> > if (n2 != Integer.MAX_VALUE) {
> > // Will stop iterating once
> > // we removed from "this" as
> > // many elements as "c" contains.
> > // XXX Does not work if "this" is an identity set,
> > // and "c" based on "equals".
> > int toRemove = n2;
> > for (Iterator<?> i = iterator(); i.hasNext(); ) {
> > if (c.contains(i.next())) {
> > i.remove();
> > modified = true;
> > if (--toRemove == 0) {
> > break;
> > }
> > }
> > }
> > } else {
> > // "c" might contain more than Integer.MAX_VALUE
> > // elements, so we can't break early.
> > for (Iterator<?> i = iterator(); i.hasNext(); ) {
> > if (c.contains(i.next())) {
> > i.remove();
> > modified = true;
> > }
> > }
> > }
> > }
> > return modified;
> > }
> >
> >
> >
> > Jeff
> >
> >
> >
> > ________________________________
> > De : Rémi Forax <forax at univ-mlv.fr>
> > À : core-libs-dev at openjdk.java.net
> > Envoyé le : Mer 13 juillet 2011, 18h 27min 48s
> > Objet : Re: AbstractCollection.removeAll(Collection) and
> > AbstractSet.removeAll(Collection)
> >
> > It doesn't work because the complexity of size() can be O(n).
> > In my opinion, only the checks for emptyness are Ok.
> >
> > Rémi
> >
> >
> > On 07/13/2011 05:40 PM, Sebastian Sickelmann wrote:
> >> Jeff Hain wrote:
> >>> Hello.
> >>> I have some remarks about the methods named in the subject (JDK 6u26, JDK
> >>> 7b147).
> >>> 1) AbstractCollection.removeAll(Collection)
> >>> This method could return immediately if the specified collection is
> >>> empty, not to iterate uselessly over all "this", or if "this" is empty, not
>
> > to
> >>> create an iterator for nothing.
> >>> I also thought about returning as soon as as many elements as the
> >>> specified collection contains have been removed, but it would not behave
> >>> properly with collections containing more than Integer.MAX_VALUE elements,
> >>> or with some overriden equals methods etc.
> >> The overriden equals method should be no problem if the Implementation is an
>
> >> set. So it should be possible to exit in this case in the AbstractSet
> >> Implementation, isn't it?
> >>
> >>> 2) AbstractSet.removeAll(Collection)
> >>> Let n1 be the size of "this", and n2 the size of the specified
> > collection.
> >>> This method is in O(n1*n2) if the specified collection has a
> >>> contains(Object)
> >>> method in O(n2) (as an ArrayList does), and n1<= n2.
> >>> I think a better implementation could be done by taking care of the
> >>> "setness" of the specified collection, and taking care of collections
sizes
> >>> differently.
> >>> Also:
> >>> - when iterating on the specified collection, the method could return as
> >>> soon as "this" is empty, not to continue useless iterations.
> >>> - when iterating on "this", the method could return if the specified
> >>> collection is empty, not to iterate over all "this" uselessly.
> >>>
> >>> # Current implementation:
> >>> public boolean removeAll(Collection<?> c) {
> >>> boolean modified = false;
> >>> if (size()> c.size()) {
> >>> for (Iterator<?> i = c.iterator(); i.hasNext(); )
> >>> modified |= remove(i.next());
> >>> } else {
> >>> for (Iterator<?> i = iterator(); i.hasNext(); ) {
> >>> if (c.contains(i.next())) {
> >>> i.remove();
> >>> modified = true;
> >>> }
> >>> }
> >>> }
> >>> return modified;
> >>> }
> >>>
> >>> # Alternative implementation (not tested):public boolean
> >>> removeAll(Collection<?> c) {
> >>> // Quick check on emptinesses, to make later treatments simpler.
> >>> if (isEmpty() || c.isEmpty()) {
> >>> return false;
> >>> }
> >>> boolean modified = false;
> >>> final int n1 = size();
> >>> final int n2 = c.size();
> >>> // If the "c" is not a set, we are pessimistic about its "contains"
> >>> // method
> >>> // complexity.
> >>> final int cContainsAssumedComplexity = (c instanceof Set<?>) ? 1 : n2;
> >>> // If we iterate over "this", assumed complexity is O(n2).
> >>> // If we iterate over "c", assumed complexity is O(n1 *
> >>> cContainsAssumedComplexity).
> >>> // Cast to long to avoid overflow.
> >>> if (n2< n1 * (long)cContainsAssumedComplexity) {
> >>> for (Iterator<?> i = c.iterator(); i.hasNext(); ) {
> >>> if (remove(i.next())) {
> >>> modified = true;
> >>> if (isEmpty()) {
> >>> break;
> >>> }
> >>> }
> >>> }
> >>> } else {
> >>> for (Iterator<?> i = iterator(); i.hasNext(); ) {
> >>> if (c.contains(i.next())) {
> >>> i.remove();
> >>> modified = true;
> >>> }
> >>> }
> >>> }
> >>> return modified;
> >>> }
> >> look good to me.
> >>
> >>>
> >>>
> >>> Regards,
> >>>
> >>> Jeff
> >>>
>
More information about the core-libs-dev
mailing list