RFR: 8180352: Add Stream.toList() method

Brian Goetz brian.goetz at oracle.com
Fri Feb 5 15:27:49 UTC 2021

> I have been reading previous threads, the original bug request, and exploring the javadoc and implementation of toList() on Stream in JDK 16. I don’t want to waste time rehashing previous discussions, but I want to understand and prioritize the motivation of this change, and propose what I believe is a safer alternative name for this method based on the current implementation: Stream.toUnmodifiableList().

Big +1 to "let's not rehash previous discussions, but help us understand 
the motivation."  Stewarding the core libraries is a complex task, and 
there are rarely hard-and-fast rules for doing the Right Thing.

Your question seems to have two main aspects:

  - Why this method, why not others, and why now
  - Why take such a strong anti-mutability position with this method

The desire for a Stream::toList method has a long history; when we first 
did streams, it was one of the first convenience methods to be 
"requested".  We resisted then, for good reasons, but we knew this saga 
was not over.

"Convenience" methods are a constant challenge in the JDK.  On the one 
hand, they are, well, convenient, and we want Java to be easy and 
pleasant to program in.  On the other, the number of potentially-useful 
imaginable convenience methods is infinite, and the widespread 
perception is that they are so easy, that all that is needed is for 
someone to propose the idea.  (The (admittedly soft) criteria we use for 
judging whether a convenience method meets the bar is an interesting 
one, which we can have separately.)

There are basically two stable points with respect to convenience 
methods in API design; zero tolerance, and "don't worry, be happy". In 
the former, the methods of an API are like a basis (ideally, an 
orthonormal one) of a vector space; the minimum number of API points 
from which you can derive all possible usages.  At the other extreme, 
every reasonable combination of methods gets its own special form of 
expression.  Of course, both are extremes (Stream::count and 
IntStream::sum are conveniences for reduce, and even Haskell's Monad has 
multiple ways to represent bind), but APIs tend to align themselves in 
one direction or another.  And, as the JDK APIs go, Streams treats 
sparsity and orthogonality as virtues to be striven for.

Eclipse Collections chooses a different (and also valid!) philosophy: 
completeness, and it walks the walk.  (Having 81 (template-generated) 
implementations of HashMap is proof.) Similarly, Tagir's StreamEx is an 
example of an extension to Stream that takes the other approach.  And 
both are great!  But also, they are not how the JDK rolls.  Which is 
fine; it's a big ecosystem, and there's room for multiple philosophies, 
and each can find its fans and detractors.

The calls for a convenience for Stream::toList have come pretty much 
continuously since we first resisted it (but, we knew even then that if 
we had a lifetime budget for just one convenience method, it would end 
up being toList.)  We knew then that there would be questions to ask 
about what the ideal dial settings would be for toList, and were not yet 
ready to confront the question, nor did we want to add fuel to the 
demands for more convenience methods ("No toSet?  Inconsistent!")

When an API is new, and all things are possible, we tend to be in 
"imagine everything we could put into it" mode, and streams was no 
different.  It is wise to resist this temptation -- and maybe even 
over-rotate in the other direction -- to allow for some time for the 
spirit of what you've built to make itself clear; even creators are not 
always immediately clear on the nuances of their creation.  So we tried 
hard to resist the calls for unnecessary methods, knowing that they 
could always be added, but not taken away, and also, allowing for the 
true gaps to emerge from usage.  (The first method to be added, 
takeWhile(), was the very opposite of a convenience; it represented a 
reasonable use case that the original design didn't support.)

So, why toList now?  Well, a number of reasons.  Collecting to a list is 
one of the most common terminal operations, so any small irritant (like 
a clumsy locution) adds up.  And, as has been pointed out, it can be 
more efficient if it is brought into the stream core rather than held at 
arm's length through Collector.  So if we're going to compromise our 
principles in one place, after thinking about it for a long time, this 
seemed a worthy candidate.  (And still, we hesitate, because we knew it 
would be firing the starting gun for the "But where's toSet?" arguments.)

So yes, there are lots of good reasons to continue to Just Say No to 
conveniences, but, there are also reasonable times to make exceptions -- 
especially when it is not purely about convenience. And, data suggests 
that toList is 5-10x more popular than the next most popular collector, 
so there's a clear argument to say that toList is pretty special, and we 
can stop there.

> List is a mutable interface.

This is true to an extent (though even the specification of List makes 
it clear the mutative methods are strictly optional), but even if it 
were absolutely true, I am still not sure how relevant it is to what 
streams should do.  When I wrote Collectors::toList, ArrayList was 
indeed the obvious default implementation choice -- but it was also 
obviously not a very good choice.  We didn't have an efficient 
unmodifiable collection at the time, and wrapping with unmodifiableList 
seemed like taxing a lot of well-behaved users for the would-be sins of 
the few.  But if we had efficient unmodifiable collections then, I would 
absolutely, positively have made that choice.

Streams is an API that takes functional principles to heart, sometimes 
even in ways that are uncomfortable to Java developers. (For example, it 
imposes constraints on the lambdas we pass to its methods, which are the 
Java analogues of purity and side-effect freedom -- which are not 
necessarily familiar constraints.)  Data structures are about managing 
and organizing data in memory, but streams are about capturing and 
composing behavior, not data. (Obviously, streams consume and produce 
data at their extreme points, but it tries to make the fewest possible 
assumptions about the form that data takes.)  Where Stream meets List, 
Stream is allowed to have an opinion about what kinds of lists it likes 
better, and an unmodifiable list seems far more in the spirit of 
Streams.  And of course, collect(toCollection(f)) lets you collect to 
whatever sort of collection you like.

> A convention was established in 2014 with Collectors.toList() returning a mutable List (ArrayList).
I am having a hard time expressing just how much I disagree with the 
sentiment behind this claim.  I knew, when I was writing 
Collectors::toList, that I would someday be having this discussion; my 
best efforts to head this discussion off were memorialized in the 
specification for Collectors::toList:

> There are no guarantees on the type, mutability, serializability, or 
> thread-safety of the|List|returned; if more control over the 
> returned|List|is required, use|toCollection(Supplier)| 
> <https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toCollection-java.util.function.Supplier->.

I'd hope this would be interpreted as: "Dear developer who assumes that, 
just because this returns ArrayList today, that somehow it is reasonable 
to assume toList will always return an ArrayList: you are wrong, and I 
hope you have the good sense to never make this argument out loud."

The reason that this "reasonable-seeming" assumption -- that what the 
first implementation does is reasonable to take as normative, even when 
the spec says otherwise -- is so toxic, is that it cripples the ability 
of the platform to evolve.  There's a reason we write specifications for 
APIs; because implementations are intrinsically accidental and 
contextual, and context changes out from under us.  Even when writing 
it, I was aware of the degree to which programmers would be 
overwhelmingly tempted (despite what I hoped was their better judgment) 
to count on the mutability of the returned list if that is what they 
wanted.  Saying `toCollection(ArrayList::new)`, which guarantees exactly 
the characteristics such users would want, is Just Not That Hard.  Sure, 
saying toList() is easier, but the tradeoff there is you accept whatever 
(compliant) List the library wants to serve up, and the library gets 
some say in what that is, and which might even vary from tuesday to 
wednesday.   A toList() method should try to balance the competing 
concerns for what is the most reasonable default, and when the JDK 
improves in a way that shifts that balance, or the context shifts, the 
JDK should be able to improve with it.

So, this "establish a convention" claim is dangerous because it pushes 
us towards the assumption that everything the JDK does, even the things 
it *clearly specifies as implementation details that might change*, can 
never change.  Which means we would have to be *even more deliberate* 
about anything we do, which means the rate at which we can move forward 
is *even slower*.

But, you are making an even stronger claim than that!   We're not trying 
to change the implementation of Collectors::toList (which the spec makes 
clear should might happen.)  We're adding _another_ method with that 
name, somewhere else.  Which makes the above argument even more 
dangerous -- essentially, it says "don't use a word in any API ever, 
unless you are prepared to interpret it exactly the same way in all 
future contexts."  Surely, you see how this doesn't lead to a world we 
want to live in.

So, what should `Stream::toList` mean?  it should mean: return whatever 
kind of list that Streams thinks is the best all-around default 
implementation to use, based on the best understanding of what typical 
users want.  This involves balancing a lot of things, and that balance 
can move over time.

We could call this toUnmodifiableList, and there's surely a certain 
logic to that.  But, this is likely to have unintended consequences.  
First, the fact that the name is fussier makes it even less attractive 
as a convenience, which is an argument to not do it at all.  Users who 
mostly count characters (which is sadly common) would be more likely to 
continue to use collect(toList()), even if the new method is better in 
multiple ways.  If we have Stream::toUnmodifiableList, it is *even more 
likely* to generate demands for other toXxxList conveniences.  Worse, it 
would likely generate arguments for a toList that works the same as 
collect(toList()) -- which takes an existing "accidental mutability" 
problem and guarantees that problem into the infinite future.  It's bad 
enough that collect(toList()) yields a mutable list -- it would be even 
worse for Stream::toList to do the same.  Most users don't need 
mutability, and are better off not getting it if they don't need it; 
they should ask for it if they need it.

> [1] Example usages of Eclipse Collections toList:
> // toList result is mutable for all of these usages with Eclipse Collections
> List list1 = mutableSet.toList();
> List list2 = mutableSet.asLazy().toList();
> List list3 = mutableSet.asParallel(Executors.newWorkStealingPool(), 10).toList();
> List list4 = mutableSet.stream().collect(Collectors.toList());
> List list5 = mutableSet.stream().collect(Collectors2.toList());

These are nice, but there's a subtle difference here that is salient.  
Eclipse Collections attempts to integrate data management and behavioral 
composition into a single library.  This is a fine goal, but it does 
mean that the behavioral methods have more responsibility to fit with 
the data-management side of the story.

Streams took an almost opposite interpretation -- one reason NOT to do a 
Stream::toList method was that it overly coupled Streams to 
Collections.  Laundering stream-to-List via a specific collector (which 
is clearly more of a "plug in" than core functionality) seemed 
preferable.  We chose more of an arms-length relationship between Stream 
and Collections.  Again, different philosophies. (Adding Stream::toList 
goes back on that a bit, after thinking about it for a bunch of years, 
and deciding it was OK in this case.)

The primary cost here is a seeming "inconsistency", because people have 
been able to convince themselves that `toList()` means "to ArrayList", 
and now, there will be cases where that is not true. Given the choice 
between catering to explicitly wrong assumptions (the spec even says 
"don't make this assumption"!), and improving the platform over time, I 
choose the latter.  Consistency is a good baseline goal, but 
consistencies can be taken to foolish extremes.

More information about the core-libs-dev mailing list