RFR (M): JDK-6394757: rev1: AbstractSet.removeAll semantics are surprisingly dependent on relative sizes

Alan Snyder javalists at cbfiddle.com
Wed May 22 15:32:52 UTC 2019

Hi Stuart,

It seems we have one major disagreement. You believe the collections framework specification is inconsistent. I believe it is consistent. (We agree that there is room for improvement.) That disagreement is making communication more difficult.

Based on your reply, I assume the disagreement revolves around this fact:

Using public classes provided by the framework, it is possible to:

* Create an object that is an instance of Set but does not support the general contract of Set.

* Create an object that is an instance of Map but does not support the general contract of Map.

I quoted the relevant paragraphs from the specifications of these classes in my previous message. These paragraphs make it clear that the area where these objects fail to support the general contract of their interfaces is that they define membership in a way that is inconsistent with equals() and hashCode(). The specification goes to some length to explain that membership is based on equals() and hashCode() and that optimizations based on that dependence are permitted.

Obviously, there is a conflict here, but I would not say that the specification itself is inconsistent (which would imply that the specification needs to be “fixed”).

Instead, I see this as a pragmatic decision (one might say “kludge”, but that is not the point) by the designers of the framework to allow these concrete classes to be used both to create objects that conform (semantically) to their interfaces and to create objects that do not (fully) conform to their interfaces. The designers could have instead said that the non-conforming uses are erroneous and not supported, which would have been a cleaner solution. Instead, they documented the potential non-conforming cases.

I presume that the non-conforming cases are blessed (not rejected) because they are useful (the classes are described as being well defined either way) and possibly also because conformance is not generally testable.

Having an object that does not conform (semantically) to its interface is an anomaly. It introduces the possibility of unexpected behavior if one does not fully understand the divergence between the object’s behavior and the interface specification and the potential impact of that divergence. This is the sense in which these objects are “second class” or “ersatz”. It is not an overstatement or a new idea, it is the plain reading of the current specification.

If you pass one of these objects as a parameter to a method that declares the parameter to be of the interface type, then that method is free to assume that object satisfies the (general contract of the) interface. If the object does not, the method may not behave as expected. If there is a bug, it is in the caller, not the method.

removeAll() is an example of such a method. As its parameter is of type Collection, the framework specification implies that it is a correct implementation to iterate over the parameter, call hashCode() on the elements returned by the iterator, and call equals() only on the target elements with a matching hash, which is what removeAll() currently may do when the target is a HashSet. (The fact that removeAll() uses the sizes of the collections to choose an implementation may be troubling, but that does not imply that either of the two implementations are incorrect. I believe they are both correct. Using your terms, I believe that removeAll(), viewed as a binary operation, is symmetric or reversible in the current specification.)

Does this disagreement over consistency of the specification matter? Perhaps not in practical terms. Even if the current specification is consistent, it is too easy to make mistakes using the above classes. A developer must read the specification carefully and make logical deductions to understand why removeAll() might behave in an unexpected way. That is not a good thing. In addition, whether or not the disputed implementation of removalAll() is strictly a bug, your proposed change to removeAll() is still an incompatible change, both in specification and performance, and needs to be evaluated relative to the benefits of the change. (I assume a CSR is forthcoming to initiate this evaluation formally.)

By the way, I would describe the benefit of your change in practical terms as reducing the probability of programming errors. Improving consistency is vague; it is also problematic if one does not believe that the specification is currently inconsistent.


> On May 20, 2019, at 3:57 PM, Stuart Marks <stuart.marks at oracle.com> wrote:
> On 5/16/19 10:06 PM, Alan Snyder wrote:
>> Could you explain the inconsistency in the specification that affects removeAll()? I don’t see it.
> It's the assumption that the operation can be reversed without changing its semantics. This isn't true, given the existence of SortedSet et. al. This is the point of this bug report. The inconsistency exists in the collections framework specifications taken as a whole, not removeAll() in isolation.
>> I believe I already accepted the fact of ersatz Collections.
> Of course I don't know your state of mind, but it it doesn't sound to me like you've accepted them. You're calling them "ersatz" collections, which sound to me like "fake" collections. You're carving them out of "real" collections and treating them like second-class citizens. I would have thought that was an overstatement, until I read this:
>> Promoting ersatz Collections to first class citizens would be a significant incompatible change...
> My goal here is to reconcile the various parts of the collections framework specifications so that they are more self-consistent as a whole. Treating certain pieces as second-class citizens makes the problem worse, not better.
>> What I would most like to preserve is the performance advantages of using hash coding, which apparently was a goal of the original design:
>> "implementations of the various Collections Framework interfaces are free to take advantage of the specified behavior of underlying {@link Object} methods wherever the implementor deems it appropriate"
>> Promoting ersatz Collections to first class citizens would be a significant incompatible change to the specification because it invalidates this statement.
> My changes don't affect the relationship of equals() and hashCode() in the slightest.
> s'marks

More information about the core-libs-dev mailing list