Review for CR 6728865 : Improved heuristics for Collections.disjoint() [updated]
Ulf.Zibis at gmx.de
Tue Jan 4 05:06:53 UTC 2011
Am 04.01.2011 02:24, schrieb Rémi Forax:
> On 01/04/2011 02:02 AM, Ulf Zibis wrote:
>> For the same reason you could set the empty-check at 1st place. This would result in the most
>> simple (and fast) code (see my before post).
>> - in all implementations, I have seen, "isEmpty()" is equivalent to "size() == 0"
> It's semantically equivalent by definition but the complexities aren't the same.
> near line 1740
Yes, of coarse, but ...
For Collections.disjoint(c1, c2), having n1 = c1.size(), n2 = c2.size()
Cost at minimun:
- c1.isEmpty() || c2.isEmpty()
Cost at worst:
- n1 iterations over c1 for computing c1.size()
- if n1 != 0: n2 iterations over c2 for computing c2.size()
- if n1 == 0 or n2 == 0: instatiation of c1.iterator() or n1 * (iteration + c2.contains())
- if n1 > n2: (n1 - n2)/2 * (iteration + comparison)
- if n1 <= n2: nothing
Note: comparison is likely much more expensive than iteration
As the optimization against the collection's sizes only makes sense if at least 1 collection is very
small, it would be advantageous to have a size(int max) method, which internally should iterate at
most max times instead of size times.
In the meantime we could do a workaround by iterating let's say at most 5 steps on each collection
to check it's size.
Similar: Resultset.sizeIsLessThan(int n).
BTW: containsKey() and size() could be simplified by introducing doFind(V value) similar to doGet(K
key), similar for ConcurrentHashMap
> near line 400
Thanks for the additional example.
> near line 700.
ConcurrentHashMap is NOT a Collection!
>> - "cost a small amount in others" applies rarely; the only implementation, I have seen, where
>> "Set.size() can be expensive", is ConcurrentSkipListSet (are there others?).
> By example, Collections.newSetFromMap(new ConcurrentHashMap<>).
As substitute for missing ConcurrentHashSet.
> Also don't forget all Sets that map JDBC results.
JDBC results size computation could be much faster. See:
The few collections, for which "Set.size() can be expensive" holds, are from concurrent package or
wrap a JDBC result.
Can't we check that?
More information about the core-libs-dev