RFR(m): 8177290 add copy factory methods for unmodifiable List, Set, Map

John Rose john.r.rose at oracle.com
Wed Nov 22 03:05:24 UTC 2017

On Nov 18, 2017, at 7:34 PM, John Rose <john.r.rose at oracle.com> wrote:
> On Oct 30, 2017, at 6:50 PM, Stuart Marks <stuart.marks at oracle.com> wrote:
>> (also includes 8184690: add Collectors for collecting into unmodifiable List, Set, and Map)
> Now I'm going to be picky about the names of the Collectors;
> please bear with me a moment.  Consider `toUnmodifiableList`
> and its two cousins (`toUSet`, `toUMap`).
> The most natural names `toList` (etc.) are already taken.
> They mean "do whatever is most useful for the collector,
> perhaps returning unmodifiable or modifiable results".

Let me adjust my position here, FTR.  I am now aware (thanks
Brian) that `toList` is not broken but intentionally under-specified,
pending future changes.  It is my personal hope that the future
changes will specify that the result of `toList` is safely publishable
and an unmodifiable non-view.  This issue is tracked as JDK-8180352.

(Logically possible alternatives would seem to include an otherwise
unconstrained mutable list, an ArrayList, or a continuation of the
"Chef's Choice" policy in effect today.  I suspect we could find
advocates for all of those positions, as evidenced by comments
on JDK-8180352.  I just added mine for the record.)

If at some point in the future `toList` does produce the same
kind of safe list as `List.of`, then I won't have anything to be
picky about.  Other folks can use `toCollection(ArrayList::new)`
or some other explicit op-in for mutability.

For now, a security-conscious user like me can work with
`Collections.toUnmodifiableList`.  That API point will (I hope)
have a short career, ending when `toList` does something at
least as useful.

If at some point in the future `toList` switches to guarantee
the mutable ArrayList (as it seems to supply today), then the
hope for a simple and safe `toList` will be dashed, and we
will have to look for something else that is explicitly oriented
towards value based classes, such as `Collectors.valueList`
and/or `Stream.values`.

Value-based classes are an important "attractor" for API
design, because they can be simultaneously safe and
performant, compared to Java arrays and ArrayList.

(The performance comes from the elimination of copies
under the hood, as well as structure flattening, optimizations
which potential modification makes impossible.  The safety
comes from reduction of difficult-to-find TOCTTOU attack
surfaces, as beyond the usual claimed benefits of "FP style".)

When we get to Valhalla value types, there will be many
more value-based classes in the world, since a value
type cannot be anything other than a value-based class.
Whatever we do with `toList` in the future, it should take
into account value-based usages.  They are here and
more are coming.

One more point:  When programming with value-based
classes, the container is almost irrelevant, and the contents
are the whole story.  Thus, API points which return value-based
multiple values should probably not mention the container
type (List) unless there is a real danger of ambiguity.
If I have a tree node type and I want to use a value-based
List to encode its children, it should look like this:

   List<Node> children() { return this.children; }

or this:

   List<Node> children() { return List.copyOf(this.children); }

not these:

   List<Node> childrenList() ...
   List<Node> listOfChildren() ...
   List<Node> childrenUnmodifiableList() ...
   private List<Node> secretChildrenArray() ...


An API in the value-based style could say, in a class header or
package intro page, that collection values are value-based if not
otherwise specified, and then simply use unmodifiable building
blocks everywhere.

In any case, I think the de-emphasis of container identity enabled
by value-based types provides a subtle but strong push on API
design, towards plural nouns, and away from explicit discussion
of container details.

— John

P.S. Ten years ago I designed the MethodType API, which talks 
about a single return type and multiple parameter types.  The
corresponding API points are returnType and (not parameterTypes
but) parameterList *and* parameterArray.  Because at the time
arrays and lists were both heavily used carriers for multiple values,
we had to focus on the box types (List, Array).  Also we didn't
have VBC rules yet.  In the future I hope similar API designs can
(a) ignore legacy arrays, and (b) just use VBC lists, and then
(c) be a little less noisy about how the data is carried around.

P.P.S. Fully retiring arrays will require dislodging them from
their favored position with varargs and similar spots in the stack.

More information about the core-libs-dev mailing list