Extending Collector to handle a post-transform
brian.goetz at oracle.com
Fri Jun 14 18:42:17 PDT 2013
Browsing the Javadoc for Collectors, the merge functions (throw,
first-wins, last-wins) seem kind of out of place, and are hardly used
anywhere in our implementation. If we ditched them, Collectors would be
entirely about Collectors, which I think would be an improvement?
They're trivial to just write inline:
(x,y) -> x
instead of firstWinsMerger().
On 6/14/2013 5:01 PM, Brian Goetz wrote:
> BTW, this notion of a parallel reduction as a quad of functions:
> (initial-result, accumulate-element, merge-result, final-transform)
> shows up in a lot of places. Here are just two that were pointed out to
> us as we explored this feature:
> User defined aggregates in MS SQL Server:
> (Thanks Erik for this pointer.)
> Ypnos: declarative, parallel structured grid programming.
> (http://doi.acm.org/10.1145/1708046.1708053), which describes a
> Haskell-hosted EDSL for parallel stencil computations:
> Some reductions generate values of a different type to the element
> type of a grid. A structure called a
> Reducer packs together a number of functions for parallel reduction
> under reduction operators of this type.
> The mkReducer constructor builds a Reducer, taking four parameters:
> • A function reducing an element and partially-reduced value to
> another partially-reduced value: (a → b → b)
> • A function combining two partially-reduced values, possibly from
> two reduction processes on subgrids: (b → b → b)
> • An initial partial result: b
> • A final conversion function that converts the partial-result to a
> final value: (b → c).
> (Thanks Guy for this pointer.)
> In addition, we got requests for this feature from the Oracle "Sumatra"
> team, which is exploring the practicality of transparently translating
> Java bulk operations to run on GPUs. The notions from the "Ypnos" paper
> above show up all over the GPGPU literature.
> On 6/12/2013 11:39 PM, Mike Duigou wrote:
>> On Jun 11 2013, at 10:04 , Brian Goetz wrote:
>>>> What's bad?
>>>> - More generics in Collector signatures. For Collectors that
>>>> don't want to export their intermediate type, they are declared as
>>>> Collector<T, ?, R>, which users may find disturbing. (The obvious
>>>> attempts to make the extra type arg go away don't work.)
>> For me this extra type parameter for the intermediary on Collector is
>> no different than the extra type param on BaseStream. Any time you
>> have a type variable that is not part of the user's generification
>> it's going to feel uncomfortable. For Collector the extra param goes
>> largely un-noticed though Collector is rarely assigned. Collector is
>> mostly used as an argument and in this case the wildcard is invisible.
>> The types (and wildcards) just flow through unobserved. This seems
>> fine and overall it's a huge benefit to handle the post-transform in
>> the Collector.
More information about the lambda-libs-spec-observers