RFR: JDK-8205461 Create Collector which merges results of two other collectors

Tagir Valeev amaembo at gmail.com
Mon Sep 24 10:39:53 UTC 2018

Ok, teeing. Webrev updated:
CSR updated accordingly:

With best regards,
Tagir Valeev.
On Fri, Sep 21, 2018 at 8:26 PM Brian Goetz <brian.goetz at oracle.com> wrote:
> The example of ISS is a good one.  It is analogous to the question of
> "when is it right to write a class, and when it is right to write a
> function?"  And the answer is, of course, "it depends."  ISS was an
> obvious grouping, but even there there was significant disagreement
> during its design about what it should support and not (especially with
> regard to sum-of-squares calculations), and extra work done to make it
> extensible.  If you're writing from scratch, you might well consider
> writing something like ISS.
> But ... the whole motivation for having "teeing" _at all_ is that you
> have some existing collectors you want to reuse!  It seems a little
> silly to claim "I definitely will want to reuse two collectors, so much
> so that we need a new method, but can't imagine ever wanting to reuse
> three."
> So, while I am not saying we have to solve the N-way problem now, but I
> think we'd be silly to pick a naming scheme that falls apart when we try
> to go past two.   So I'm still at "teeing".  It works for two, and it
> works for larger numbers as well.
> On 9/16/2018 5:23 AM, Tagir Valeev wrote:
> > Hello, Brian!
> >
> > Regarding more than two collectors. Some libraries definitely have
> > analogs (e.g. [1]) which combine more than two collectors. To my
> > opinion combining two collectors this way is an upper limit for
> > readable code. Especially if you are going to collect to the list, you
> > will have a list of untyped and unnamed results which positionally
> > correspond to the collectors. If you have more than two collectors to
> > combine, writing a separate accumulator class with accept/combine
> > methods and creating a collector from the scratch would be much easier
> > to read and support. A good example is IntSummaryStatistics and the
> > corresponding summarizingInt collector. It could be emulated combining
> > four collectors (maxBy, minBy, summingInt, counting), but having a
> > dedicated class IntSummaryStatistics which does all four things
> > explicitly is much better. It could be easily reused outside of Stream
> > API context, it has well-named and well-typed accessor methods and it
> > may contain other domain-specific methods like average(). Imagine if
> > it were a List of four elements and you had to call summary.get(1) to
> > get a maximum. So I think that supporting more than two collectors
> > would encourage obscure programming.
> >
> > With best regards,
> > Tagir Valeev
> >
> > [1] https://github.com/jOOQ/jOOL/blob/889d87c85ca57bafd4eddd78e0f7ae2804d2ee86/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple.java#L1282
> > (don't ask me why!)
> >
> > On Sat, Sep 15, 2018 at 10:36 PM Brian Goetz <brian.goetz at oracle.com> wrote:
> >> tl;dr: "Duplexing" is an OK name, though I think `teeing` is less likely
> >> to be a name we regret, for reasons outlined below.
> >>
> >>
> >> The behavior of this Collector is:
> >>    - duplicate the stream into two identical streams
> >>    - collect the two streams with two collectors, yielding two results
> >>    - merge the two results into a single result
> >>
> >> Obviously, a name like `duplexingAndCollectingAndThenMerging`, which,
> >> entirely accurate and explanatory, is "a bit" unwieldy.  So the
> >> questions are:
> >>    - how much can we drop and still be accurate
> >>    - which parts are best to drop.
> >>
> >> When we pick names, we are not just trying to pick the best name for
> >> now, but we should imagine all the possible operations one might ever
> >> want to do in the future (names in the JDK are forever) and make a
> >> reasonable attempt to imagine whether this could cause confusion or
> >> regret in the future.
> >>
> >> To evaluate "duplexing" here (which seems the most important thing to
> >> keep), I'd ask: is there any other reasonable way to imagine a
> >> `duplexing` collect operation, now or in the future?
> >>
> >> One could imagine wanting an operation that takes a stream and produces
> >> two streams whose contents are that of the original stream.  And
> >> "duplex" is a good name for that.  But, it is not a Collector; it would
> >> be a stream transform, like concat.  So that doesn't seem a conflict; a
> >> duplexing collector and a duplexing stream transform are sort of from
> >> "different namespaces."
> >>
> >> Can one imagine a "duplexing" Collector that doesn't do any collection?
> >> I cannot.  Something that returns a pair of streams would not be a
> >> Collector, but something else. So dropping AndCollecting seems justified.
> >>
> >> What about "AndThenMerging"?  The purpose of collect is to reduce the
> >> stream into a summary description.  Can we imagine a duplexing operation
> >> that doesn't merge the two results, but instead just returns a tuple of
> >> the results?  Yes, I can totally imagine this, especially once we have
> >> value types and records, which makes returning ad-hoc tuples cheaper
> >> (syntactically, heap-wise, CPU-wise.)  So I think this is quite a
> >> reasonable possibility. But, I would have no problem with an overload
> >> that didn't take a merger and returned a tuple of the result, and was
> >> still called `duplexing`.
> >>
> >> So I'm fine with dropping all the extra AndThisAndThat.
> >>
> >> Finally, there's one other obvious direction we might extend this --
> >> more than two collectors.  There's no reason why we can only do two; we
> >> could take a (likely homogeneous) varargs of Collectors, and return a
> >> List of results -- which itself could then be streamed into another
> >> collector.  This actually sounds pretty useful (though I'm not
> >> suggesting doing this right now.) And, I think it would be silly if this
> >> were not called the same thing as the two-collector version (just as it
> >> would be silly to have separate names for "concat two" and "concat n".)
> >>
> >> And, this is where I think "duplexing" runs out of gas -- duplex implies
> >> "two".  Pedantic argue-for-the-sake-of-argument folks might observe that
> >> "tee" also has bilateral symmetry, but I don't think you could
> >> reasonably argue that a four-way "tee" is not less of an arity abuse
> >> than a four-way "duplex", and the plumbing industry would agree:
> >>
> >> https://www.amazon.com/Way-Tee-PVC-Fitting-Furniture/dp/B017AO2WCM
> >>
> >> So, for these reasons, I still think "teeing" has a better balance of
> >> being both evocative what it does and likely to stand the test of time.
> >>
> >>
> >>
> >>
> >> On 9/14/2018 1:09 PM, Stuart Marks wrote:
> >>> First, naming. I think "duplex" as the root word wins! Using
> >>> "duplexing" to conform to many of other collectors is fine; so,
> >>> "duplexing" is good.

