RFR: JDK-8205461 Create Collector which merges results of two other collectors

Stuart Marks stuart.marks at oracle.com
Tue Sep 25 00:33:04 UTC 2018

Webrev looks good.

In the CSR, I updated the webrev link to point to the latest, I set the 
fix-version to 12, and I set the scope to SE. I've marked the CSR reviewed.

The next thing is for you to mark the CSR as Finalized.



On 9/24/18 3:39 AM, Tagir Valeev wrote:
> Ok, teeing. Webrev updated:
> http://cr.openjdk.java.net/~tvaleev/webrev/8205461/r6/
> CSR updated accordingly:
> https://bugs.openjdk.java.net/browse/JDK-8209685
> With best regards,
> Tagir Valeev.
> On Fri, Sep 21, 2018 at 8:26 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>> The example of ISS is a good one.  It is analogous to the question of
>> "when is it right to write a class, and when it is right to write a
>> function?"  And the answer is, of course, "it depends."  ISS was an
>> obvious grouping, but even there there was significant disagreement
>> during its design about what it should support and not (especially with
>> regard to sum-of-squares calculations), and extra work done to make it
>> extensible.  If you're writing from scratch, you might well consider
>> writing something like ISS.
>> But ... the whole motivation for having "teeing" _at all_ is that you
>> have some existing collectors you want to reuse!  It seems a little
>> silly to claim "I definitely will want to reuse two collectors, so much
>> so that we need a new method, but can't imagine ever wanting to reuse
>> three."
>> So, while I am not saying we have to solve the N-way problem now, but I
>> think we'd be silly to pick a naming scheme that falls apart when we try
>> to go past two.   So I'm still at "teeing".  It works for two, and it
>> works for larger numbers as well.
>> On 9/16/2018 5:23 AM, Tagir Valeev wrote:
>>> Hello, Brian!
>>> Regarding more than two collectors. Some libraries definitely have
>>> analogs (e.g. [1]) which combine more than two collectors. To my
>>> opinion combining two collectors this way is an upper limit for
>>> readable code. Especially if you are going to collect to the list, you
>>> will have a list of untyped and unnamed results which positionally
>>> correspond to the collectors. If you have more than two collectors to
>>> combine, writing a separate accumulator class with accept/combine
>>> methods and creating a collector from the scratch would be much easier
>>> to read and support. A good example is IntSummaryStatistics and the
>>> corresponding summarizingInt collector. It could be emulated combining
>>> four collectors (maxBy, minBy, summingInt, counting), but having a
>>> dedicated class IntSummaryStatistics which does all four things
>>> explicitly is much better. It could be easily reused outside of Stream
>>> API context, it has well-named and well-typed accessor methods and it
>>> may contain other domain-specific methods like average(). Imagine if
>>> it were a List of four elements and you had to call summary.get(1) to
>>> get a maximum. So I think that supporting more than two collectors
>>> would encourage obscure programming.
>>> With best regards,
>>> Tagir Valeev
>>> [1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae2804d2ee86/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple.java#L1282
>>> (don't ask me why!)
>>> On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz at oracle.com> wrote:
>>>> tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely
>>>> to be a name we regret, for reasons outlined below.
>>>> The behavior of this Collector is:
>>>>     - duplicate the stream into two identical streams
>>>>     - collect the two streams with two collectors, yielding two results
>>>>     - merge the two results into a single result
>>>> Obviously, a name like `duplexingAndCollectingAndThenMerging`, which,
>>>> entirely accurate and explanatory, is "a bit" unwieldy.  So the
>>>> questions are:
>>>>     - how much can we drop and still be accurate
>>>>     - which parts are best to drop.
>>>> When we pick names, we are not just trying to pick the best name for
>>>> now, but we should imagine all the possible operations one might ever
>>>> want to do in the future (names in the JDK are forever) and make a
>>>> reasonable attempt to imagine whether this could cause confusion or
>>>> regret in the future.
>>>> To evaluate "duplexing" here (which seems the most important thing to
>>>> keep), I'd ask: is there any other reasonable way to imagine a
>>>> `duplexing` collect operation, now or in the future?
>>>> One could imagine wanting an operation that takes a stream and produces
>>>> two streams whose contents are that of the original stream.  And
>>>> "duplex" is a good name for that.  But, it is not a Collector; it would
>>>> be a stream transform, like concat.  So that doesn't seem a conflict; a
>>>> duplexing collector and a duplexing stream transform are sort of from
>>>> "different namespaces."
>>>> Can one imagine a "duplexing" Collector that doesn't do any collection?
>>>> I cannot.  Something that returns a pair of streams would not be a
>>>> Collector, but something else. So dropping AndCollecting seems justified.
>>>> What about "AndThenMerging"?  The purpose of collect is to reduce the
>>>> stream into a summary description.  Can we imagine a duplexing operation
>>>> that doesn't merge the two results, but instead just returns a tuple of
>>>> the results?  Yes, I can totally imagine this, especially once we have
>>>> value types and records, which makes returning ad-hoc tuples cheaper
>>>> (syntactically, heap-wise, CPU-wise.)  So I think this is quite a
>>>> reasonable possibility. But, I would have no problem with an overload
>>>> that didn't take a merger and returned a tuple of the result, and was
>>>> still called `duplexing`.
>>>> So I'm fine with dropping all the extra AndThisAndThat.
>>>> Finally, there's one other obvious direction we might extend this --
>>>> more than two collectors.  There's no reason why we can only do two; we
>>>> could take a (likely homogeneous) varargs of Collectors, and return a
>>>> List of results -- which itself could then be streamed into another
>>>> collector.  This actually sounds pretty useful (though I'm not
>>>> suggesting doing this right now.) And, I think it would be silly if this
>>>> were not called the same thing as the two-collector version (just as it
>>>> would be silly to have separate names for "concat two" and "concat n".)
>>>> And, this is where I think "duplexing" runs out of gas -- duplex implies
>>>> "two".  Pedantic argue-for-the-sake-of-argument folks might observe that
>>>> "tee" also has bilateral symmetry, but I don't think you could
>>>> reasonably argue that a four-way "tee" is not less of an arity abuse
>>>> than a four-way "duplex", and the plumbing industry would agree:
>>>> https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
>>>> So, for these reasons, I still think "teeing" has a better balance of
>>>> being both evocative what it does and likely to stand the test of time.
>>>> On 9/14/2018 1:09 PM, Stuart Marks wrote:
>>>>> First, naming. I think "duplex" as the root word wins! Using
>>>>> "duplexing" to conform to many of other collectors is fine; so,
>>>>> "duplexing" is good.

More information about the core-libs-dev mailing list