Extending Collector to handle a post-transform

Brian Goetz brian.goetz at oracle.com
Fri Jun 14 18:42:17 PDT 2013

Browsing the Javadoc for Collectors, the merge functions (throw, 
first-wins, last-wins) seem kind of out of place, and are hardly used 
anywhere in our implementation.  If we ditched them, Collectors would be 
entirely about Collectors, which I think would be an improvement? 
They're trivial to just write inline:

   (x,y) -> x

instead of firstWinsMerger().

On 6/14/2013 5:01 PM, Brian Goetz wrote:
> BTW, this notion of a parallel reduction as a quad of functions:
> (initial-result, accumulate-element, merge-result, final-transform)
> shows up in a lot of places.  Here are just two that were pointed out to
> us as we explored this feature:
> User defined aggregates in MS SQL Server:
> http://technet.microsoft.com/en-us/library/ms131051(v=sql.90).aspx
> (Thanks Erik for this pointer.)
> Ypnos: declarative, parallel structured grid programming.
> (http://doi.acm.org/10.1145/1708046.1708053), which describes a
> Haskell-hosted EDSL for parallel stencil computations:
>     Some reductions generate values of a different type to the element
> type of a grid. A structure called a
>     Reducer packs together a number of functions for parallel reduction
> under reduction operators of this type.
>     The mkReducer constructor builds a Reducer, taking four parameters:
>      • A function reducing an element and partially-reduced value to
> another partially-reduced value: (a → b → b)
>      • A function combining two partially-reduced values, possibly from
> two reduction processes on subgrids: (b → b → b)
>      • An initial partial result: b
>      • A final conversion function that converts the partial-result to a
> final value: (b → c).
> (Thanks Guy for this pointer.)
> In addition, we got requests for this feature from the Oracle "Sumatra"
> team, which is exploring the practicality of transparently translating
> Java bulk operations to run on GPUs.  The notions from the "Ypnos" paper
> above show up all over the GPGPU literature.
> On 6/12/2013 11:39 PM, Mike Duigou wrote:
>> On Jun 11 2013, at 10:04 , Brian Goetz wrote:
>>>> What's bad?
>>>>   - More generics in Collector signatures.  For Collectors that
>>>> don't want to export their intermediate type, they are declared as
>>>> Collector<T, ?, R>, which users may find disturbing. (The obvious
>>>> attempts to make the extra type arg go away don't work.)
>> For me this extra type parameter for the intermediary on Collector is
>> no different than the extra type param on BaseStream. Any time you
>> have a type variable that is not part of the user's generification
>> it's going to feel uncomfortable. For Collector the extra param goes
>> largely un-noticed though Collector is rarely assigned. Collector is
>> mostly used as an argument and in this case the wildcard is invisible.
>> The types (and wildcards) just flow through unobserved. This seems
>> fine and overall it's a huge benefit to handle the post-transform in
>> the Collector.
>> Mike

More information about the lambda-libs-spec-experts mailing list