Initial request for review of JDK-8006572 DoubleStream.sum()/average() implementations that reduce numerical errors
Joe Darcy
joe.darcy at oracle.com
Wed Nov 20 08:21:35 UTC 2013
Hi Paul,
On 11/18/2013 07:38 AM, Paul Sandoz wrote:
> Hi Joe,
>
> You can use the three arg form of collect for DoublePipeline.sum/average impls, which is already used for average:
>
> public final OptionalDouble average() {
> double[] avg = collect(() -> new double[2],
> (ll, i) -> {
> ll[0]++;
> ll[1] += i;
> },
> (ll, rr) -> {
> ll[0] += rr[0];
> ll[1] += rr[1];
> });
> return avg[0] > 0
> ? OptionalDouble.of(avg[1] / avg[0])
> : OptionalDouble.empty();
> }
>
> That would be the most expedient way.
Thanks for the tip. For the moment, I'm feeling a bit expedient and used
the array-based approach in an iteration of the change:
http://cr.openjdk.java.net/~darcy/8006572.3/
This allows one of the kernels of the compensated summation logic to be
shared in two of the use-sites.
For the test, all 6 scenarios fail on a JDK *without* the compensated
summation code and all 6 pass with it; all the other java/util/streams
regression tests pass with the new code too.
Off-list, I've asked a numerically inclined colleague to look over the
summation logic and how the compensated summation states are combined.
Thanks,
-Joe
> However, for sum() it is somewhat unfortunate to have to create a double[1] box for each leaf. If you are willing to write a little more more code you can create your own double sum ReducingSink implementation as Brian suggests.
>
>
> Testing wise you don't need to implement an Iterator, you can do the following:
>
> DoubleStream.iterate(base, e -> increment).limit(count)
>
> It might be more convenient to test as follows:
>
> Stream<Boolean> s = Stream.iterate(false, e -> true).limit(count); // [*]
> DoubleSummaryStatistics stats = s.collect(Collectors.summarizingDouble(e -> e ? increment : base)); // Cross fingers that compiles!
>
> Stream<Boolean> s = Stream.iterate(false, e -> true).limit(count);
> Double d = s.iterate(false, e -> true).limit(count)..collect(Collectors.summingDouble(e ? increment : base));
>
> Thereby covering both Collector implementations.
>
> I guess it might be harder to test the combining step using parallel streams since combining will be platform dependent (depends on #cores) unless you track how things are combined. Perhaps the Collector instance could be tested directly with combination?
>
> Paul.
>
> [*] Another way is to use stream concatenation:
>
> Stream.concat(Stream.of(false), IntStream.range(1, count).mapToObj(e -> true))
> Stream.concat(Stream.of(false), Stream.generate(() -> true)).limit(count)
>
>
> On Nov 14, 2013, at 9:08 AM, Joe Darcy <joe.darcy at oracle.com> wrote:
>
>> Hello,
>>
>> Please take an initial look over a fix for
>>
>> JDK-8006572 DoubleStream.sum() & DoubleSummaryStats implementations that reduce numerical errors
>> http://cr.openjdk.java.net/~darcy/8006572.0/
>>
>> The basic approach is to use compensated summation
>>
>> http://en.wikipedia.org/wiki/Kahan_summation_algorithm
>>
>> to computed streams-related sum and average statistics in the various locations that this can be done.
>>
>> All existing streams tests pass and new newly-written test passes too.
>>
>> I believe the DoubleSummaryStatistics.java portion, including the test, is fully review-worthy. In the test, for the sample computation in question, the naive summation implementation had a error of 500,000 ulps compared to 2 ups with the new implementation.
>>
>> Two other locations I've found where this summation technique should be used are in
>>
>> java.util.stream.Collectors.{summingDouble, averagingDouble}
>>
>> and
>>
>> java.util.stream.DoublePipeline.{sum, average}
>>
>> DoublePipeline is the primary implementation class of DoubleStream.
>>
>> For Collectors, the proposed code is a fairly clear adaptation of how the current code passes state around; there is not currently a dedicated test for the new summation technique in this location.
>>
>> I'm new to the streams API so for DoublePipeline I don't know the idiomatic way to phrase the collect I want to perform over the code. (Based on my current understanding, I believe I want to perform a collect rather than a reduce since for the compensated summation I need to maintain some additional state.) Guidance here welcome.
>>
>> Thanks,
>>
>> -Joe
More information about the core-libs-dev
mailing list