IdentityObject & abstract superclasses
daniel.smith at oracle.com
Thu Aug 20 22:07:00 UTC 2020
> On Aug 17, 2020, at 5:44 PM, Dan Smith <daniel.smith at oracle.com> wrote:
> Alternative story:
> An inline class must not extend IdentityObject through any of its superclasses. (That's it.)
> A non-inline class implicitly implements IdentityObject if it 1) is concrete, 2) declares an instance field, or 3) has a non-empty <init> method. Abstract classes can also explicitly implement IdentityObject.
> Changing a class so that it implements IdentityObject is a binary incompatible change.
> Now we have a highly-visible concept (IdentityObject) that programmers should generally be aware of anyway, and they should readily understand a difference between abstract classes that implement IdentityObject and those that don't.
Refining some of these details. I spent some time looking at abstract classes in the JDK to figure out what the best strategy is for deciding which implement IdentityObject.
Something I want to be careful about: adding or removing 'IdentityObject' is a binary incompatible change. (Adding will break inline subclasses, removing will break clients depending on subtyping—although, FWIW, the verifier doesn't check superinterface types.) So we don't want to make it too easy to accidentally change whether a class implements IdentityObject.
First, the low-hanging fruit: concrete non-inline classes (except Object) always implement IdentityObject. An abstract class indirectly implements IdentityObject when it extends a concrete class.
In java.base, there are 479 abstract classes, and 455 that don't extend a concrete class.
In java.desktop (and dependencies other than java.base; built on macOS), there are 528 abstract classes, and 491 that don't extend a concrete class.
In jdk.compiler (and dependencies other than java.base), there are 479 abstract classes, and 459 that don't extend a concrete class.
Second: instance fields. An inline class can't have a superclass with instance fields, which makes sense because there's no mechanism for mutating superclass instance fields, during instance initialization or later in an instance method. (It would be possible to design a factory-based initialization process, compartmentalized across multiple superclasses, but we haven't done so and don't plan to.)
Inner classes have an implicit "enclosing instance" field.
125 extend another class with instance fields
17 more are inner classes
164 more declare their own instance fields
= 149 remaining
118 extend another class with instance fields
16 more are inner classes
173 more declare their own instance fields
= 184 remaining
210 extend another class with instance fields
21 more are inner classes
124 more declare their own instance fields
= 104 remaining
Most abstract classes—at least ⅔ in these code bases—declare or inherit instance fields.
I think it makes sense that an abstract class that declares instance fields, or an inner abstract class, automatically implements IdentityObject. We can infer this both at compile time and class load time.
There's a little risk here that refactorings involving private fields could cause surprises. Traditionally, factoring something out into a private field (maybe for performance?) is an implementation detail. But I think we can get programmers to buy into the mindset that declaring a field is an implicit opt-out from inline subclasses. It's something new that requires some attention as you're programming.
Third: constructors. There is no delegated instance initialization process for inline classes, so a superclass of an inline class must not have any instance initialization logic. Just where we draw the line on "any instance initialization logic" is unclear.
9 with a non-empty constructor (perform SecurityManager checks)
16 with a non-empty super constructor
1 with an empty constructor with a parameter (java.security.cert.CertStoreSpi)
1 with multiple constructors, all empty (java.security.SecureRandomSpi)
6 with an empty constructor with restricted access*
58 with an unrestricted empty no-arg constructor
58 with no constructor
2 with a non-empty constructor
2 with an empty constructor with restricted access*
46 with an unrestricted empty no-arg constructor
134 with no constructor
4 with a non-empty constructor
1 with an empty constructor with restricted access*
20 with an unrestricted empty no-arg constructor
79 with no constructor
(*Restricted access means a package-access or private constructor in a public or protected class, and a private constructor in a package-access class.)
As a baseline, all the "no constructor" cases should be inline superclass candidates, so should not implement IdentityObject.
However, it's quite common to redundantly declare an empty constructor, e.g. for documentation purposes. (In fact, javac just added an optional lint warning supporting a coding convention in which constructors should always be explicit—see JDK-8071961.) Those classes probably shouldn't be IdentityObjects, either.
Restricted access doesn't necessarily need to prevent inline subclasses, and doesn't have much to do with identity. We could leave it to the compiler to check super constructor access. But how to enforce that at run time? (There is no 'invokespecial' in the bytecode....)
Constructor parameters are a borderline signal that you expect something to happen on instance initialization, even if you don't actually do anything with the parameter right now. If you have *multiple* constructors, maybe we only care about the no-args one.
And then there are constructors with something more than 'super()' in their body. (Instance initializers, too, although I found no examples of them in any of the abstract classes I looked at.) These classes clearly need to implement IdentityObject in order to guarantee the traditional superclass initialization behavior.
An approach I like, though it may be too disruptive: it's a compile-time error to declare a non-empty constructor in an abstract class unless that class implements IdentityObject (explicitly, indirectly, or by virtue of having fields). That would trigger ~10 compiler errors in java.base, 2 in java.desktop, and 4 in jdk.compiler. The advantage is that there's no ambiguity or surprises—for an abstract class without fields, you either opt in to IdentityObject explicitly, or you implicitly assert that you don't intend to, and then the compiler catches any problems with that assertion.
Alternatively, we can once again infer 'implements IdentityObject' if a constructor has parameters or a body consisting of anything other than 'super()'. It's fairly easy to change the implicit 'implements' by doing something like adding a 'println' statement—but in practice, those sorts of changes may not be all that likely.
In the JVM:
- We've discussed a flag on <init> methods to indicate that they are "empty". It seems a shame to do that when the compiler can also generate 'implements IdentityObject'. But this is a "wrong default" situation—abstract classes need to be presumed IdentityObjects unless they explicitly choose not to be. (I worry, e.g., about bytecode generators that didn't get the memo and continue to assume checks in their <init> methods will always be executed.) So I think we'll need a rule that says you implicitly implement IdentityObject at class load time unless you have a special "empty" <init> method.
- Encoding "empty": current proposal is that an empty <init> is ACC_ABSTRACT and has no Code attribute. We could revisit if we like, maybe with a new flag, or a class-level attribute, but this still seems pretty reasonable.
- How to handle access control? I do think it's worth supporting limited access somehow (totally reasonable to allow inline subclasses in the current package, but reject subclasses elsewhere). I don't love any of the options, but maybe the best approach is to add a preparation- or initialization-time check that the superclass has a "<init>()V" method with appropriate access? Or maybe we could have some kind of class-level attribute that controls access for subclassing (this is sort of like sealing)?
Hmm, the more I think about this, the less enthusiastic I am about encoding important class properties in an "<init>" method that's never going to be referenced...
Finally: synchronized methods. These strike me as a pretty subtle implementation detail. But they're also *very* closely tied to the semantics of IdentityObject. Among our remaining candidate classes, they're pretty rare:
7 with synchronized methods
= 116 can safely allow inline subclasses (24% of all abstracts)
7 with synchronized methods
= 175 can safely allow inline subclasses (33% of all abstracts)
0 with synchronized methods
= 100 can safely allow inline subclasses (21% of all abstracts)
So synchronized methods on these field-free, initialization-free classes are pretty rare, but not unheard of. (Examples: java.net.URLStreamHandler, java.io.InputStream, java.lang.invoke.MethodHandleImpl.)
Again, I think we can go one of two ways:
- It's a compiler error to declare a 'synchronized' method if you don't already implement IdentityObject.
- Classes with synchronized methods implicitly implement IdentityObject.
The first strategy would produce 7 new compiler errors in java.base and 7 more in java.desktop... but fixing those errors might prompt some reflection about the tradeoff between synchronizing on 'this' and allowing inline subclasses. The second strategy is frictionless to adopt, but may lead to surprises when code is refactored (note that 'synchronized' isn't documented by javadoc).
In the JVM: I'm much less concerned about surprising inline subclasses in this case. The bytecode generator can explicitly implement IdentityObject if they want, and if they didn't/forgot/weren't aware of it, invoking the method on an inline class instance will throw an exception, just as it would if they'd used 'monitorenter' in the method body instead.
More information about the valhalla-spec-observers