Updated document on data classes and sealed types
Alan Malloy
amalloy at google.com
Thu Mar 7 19:12:30 UTC 2019
I have two remarks about this proposal. The first is basically: why allow
overriding accessors? If a record is required to have a one-to-one
correspondence between its (private final) fields and its public accessors,
and is required to “give up [its] data freely to all requestors” what
possible override could be correct? It makes sense to allow overriding the
constructor, for validation and normalization, but once the fields are
cemented in place, what could an accessor do but return its corresponding
field?
My second remark is much more long-winded, and inspired by the first. The
TL;DR version is: what about normalization and derived fields? In the
longer version below, I’ll be using Fraction as an example of a simple
class that could be a record instead, where normalization is reducing a
fraction to simplest form. However, please generalize from this: it could
apply to any record where a derived field can be computed from the provided
fields by computing a perhaps-expensive pure function on them.
I tried answering my first question by saying, “ah, we could do
normalization in the accessors instead of the constructor”. Besides Point,
one classic example of a pair type is Fraction. I can easily imagine
Fractional Fran rejoicing over the introduction of records, and saying
“time to implement Fraction as a record!” And of course a Fraction should
always be in simplest form (in particular because this is necessary for
equals() to behave well). Therefore, Fran adds a GCD call to her
constructor, so that (even without overriding them) her numerator() and
denominator() field accessors always return values that are relatively
prime, and denominator() is always positive. She publishes her Fraction
record as a library, and all is well.
Productive Peter is happy too: he has a use case in mind for Fraction. He
has a List<Fraction>, and wants to multiply them all together. He of course
wants the final result in simplest form, but doesn’t want to waste a bunch
of time reducing intermediate results to their simplest form. He’d rather
just build the next term as new Fraction(num1*num2, denom1*denom2), and
reduce the final result at the end (assume for this example that we are not
worried about intermediate terms exceeding the size of an int, or that
Fraction#mul(Fraction) is smart enough to reduce when overflowing is the
alternative). Peter is a plausible use-case for overriding the accessor
methods: he wants a constructor that does no normalization, with accessor
functions normalizing instead. But there’s still wasted work here:
numerator() and denominator() each have to perform a GCD calculation on the
same numbers, and will even have to repeat it if someone calls them
multiple times (e.g. for equals() and hashCode() when using a Fraction as a
map key).
A Fraction library can’t satisfy both Fran and Peter. It has to choose a
place to do this normalization, or else decline to do it at all - but this
is no solution, as now the class has very sharp edges, really no more
useful than a Pair<Integer, Integer>.
There are two possible solutions I see to this. The first is to permit some
kind of derived-field mechanism, preferably lazy. Then, Fraction’s
constructor would save a thunk for producing the reduced form, and refer to
that thunk in the numerator() and denominator() accessors, but ignore it in
the #mul method so that we don’t pay the cost of reducing unless we want it
(here, imagine reducing a Fraction is more expensive than allocating a
thunk).
The second is to simply say that Fraction is a bad candidate for a record,
because it wants to decouple its interface from its implementation. I think
this is actually the right approach, but it may be unconvincing because of
how “obvious” it is that a Fraction is just a pair with some extra
calculations to perform based on its components. If we say that Fraction is
a bad record, I worry that many more bad records like it will be built, and
their subtle problems discovered only after their APIs have been published
and committed to. Further, if this is indeed a bad record, I can’t think of
any other good use case for overriding an accessor method (my first remark).
On Fri, Mar 1, 2019 at 12:28 PM Brian Goetz <brian.goetz at oracle.com> wrote:
> I've updated the document on data classes here:
>
> http://cr.openjdk.java.net/~briangoetz/amber/datum.html
>
> (older versions of the document are retained in the same directory for
> historical comparison.)
>
> While the previous version was mostly about tradeoffs, this version
> takes a much more opinionated interpretation of the feature, offering
> more examples of use cases of where it is intended to be used (and not
> used). Many of the "under consideration" flexibilities (extension,
> mutability, additional fields) have collapsed to their more restrictive
> form; while some people will be disappointed because it doesn't solve
> the worst of their boilerplate problems, our conclusion is: records are
> a powerful feature, but they're not necessarily the delivery vehicle for
> easing all the (often self-inflicted) pain of JavaBeans. We can
> continue to explore relief for these situations too as separate
> features, but trying to be all things to all classes has delayed the
> records train long enough, and I'm convince they're separate problems
> that want separate solutions. Time to let the records train roll.
>
> I've also combined the information on sealed types in this document, as
> the two are so tightly related.
>
> Comments welcome.
>
More information about the amber-spec-observers
mailing list