Design for collections upgrades
Craig P. Motlin
cmotlin at gmail.com
Mon Mar 14 06:30:15 PDT 2011
There are two forms of lazy evaluation and I see people using the same term
(Stream) to refer to both. In Scala they are called view and stream and I
was confused about the difference so I asked on Stack Overflow. Basically
the difference is:
In a view elements are recomputed each time they are accessed. In a stream
elements are retained as they are evaluated.
I think that Java collections ought to support both, which is another reason
I think that eager ought to be the default. See the full explanation here:
On Mon, Mar 14, 2011 at 9:25 AM, Craig P. Motlin <cmotlin at gmail.com> wrote:
> The problem is that the result of calling filter or transform on a lazy
> collection is another lazy collection, which makes them very hard to adopt.
> There's a lot of code out there that's already doing transformations like
> public List<Address> doTransform(List<Person> people)
> List<Address> result = new ArrayList<Address>();
> for (Person person : people)
> return result;
> Since it's public, it might be really hard to change the return type to a
> lazy collection. The same argument applies for fields. In order to use a
> lambda here, the code would become transform().asList(), which will be
> slower than the original code. And if it's slower, it will hurt adoption.
> There is one problem with covariant return types at compile time. It's easy
> for filter(), it can always be covariant. It's a little different for
> transform(). When you transform an ArrayList, you should get back another
> ArrayList by default. Sometimes people will want to collapse duplicates but
> you shouldn't have to create an ArrayList just to create a HashSet right
> afterwards. So it makes sense to have a second form of transform() and
> filter() that take a second argument, a mutable collection to dump their
> output into. HashSet.transform() should also return a HashSet by default.
> For TreeSet, that's impossible though. If you're starting with a
> TreeSet<Person> then it was constructed with some Comparator<Person>. You
> can't return SortedSet<Address> from transform() because there's no
> Comparator<Address>. It actually makes sense for SortedSet.transform() to
> return a List. Since SortedSet extends Set, there are two difficult choices.
> - Have Set and SortedSet both return HashSet
> - Have Set return Set and SortedSet return List but the static type
> they return is just Collection
> On Mon, Mar 14, 2011 at 4:07 AM, Peter Levart <peter.levart at marand.si>wrote:
>> On 03/10/11, Rémi Forax wrote:
>> > Le 10/03/2011 13:10, Peter Levart a écrit :
>> > > Let Iterable by default (via defender methods) be lazy but let
>> Collection (also via defender methods) override the same methods an make
>> them eager. Also privide two methods to switch between eager and lazy on the
>> Iterable itself (and override in Collection and subtypes).
>> > If Collection.filter overrides Iterable.filter, they should have the
>> > same semantics.
>> > And as I already say, how to implement toEager() ?
>> Now after more discussion on the list, I'm not supporting my suggestion
>> any further. But I'm not keen on a new "Stream" type either. The original
>> fear that lazy evaluation of filtering/mapping/reducing/etc... operations on
>> collections would feel unnatural to casual programmer, because collections
>> library is currently not using any form of lazy evaluation is, I think, at
>> least partialy unfounded.
>> Lazy evaluation of operations on collection can be described as having two
>> subtle semantic properties:
>> 1. The evaluation of predicates, mappers, reducers, etc. happens at the
>> latest possible moment, when the resulting data is requested (iterated over,
>> collected into a propper collection, etc...)
>> 2. The original data that is the source for the operations is stored in
>> the source collection (the resulting Iterable is not detatched from original
>> data - it is a transformed view over source collection). So modifying source
>> collection before or in the middle of iteration over the result has a
>> consequence on the result.
>> The 2nd property is already known to users of collections library
>> (Map.keySet(), Map.values(), ...) and therefore, if propperly documented,
>> can not present any additional confusion.
>> The 1st property is something new that everyone will have to learn since
>> more and more APIs can be expected to exhibit this behaviour after lambdas
>> are introduced to Java. Programmers will have to learn about pure functions
>> and why they are their friends...
>> I'm now a supporter of lazy java.lang.Iterable (and by inheritance also
>> java.util.Collection). Overall it represents simpler and more consistent API
>> with no need for additional Stream type. It does not exhibit problems that
>> eager evaluation does:
>> - what should the eager result type be and what implementation should be
>> choosen (ArrayList/HashSet/TreeSet)
>> - inherent sub-optimabillity (think of how easy it is to write: s = s +
>> "xxx"; s = s + "..."; ...instead of using StringBuilder)
>> > Rémi
More information about the lambda-dev