Draft JVMS changes for Nestmates
daniel.smith at oracle.com
Wed Apr 19 19:21:56 UTC 2017
Thanks a lot for the feedback! My comments below:
> On Apr 19, 2017, at 12:40 AM, David Holmes <David.Holmes at oracle.com> wrote:
>> JEP name
>> I don't know how permanent JEP names are supposed to be, but I'd prefer a different name at this point. Something like: "Expanded JVM Access to Private Members"—shorter, focused on the feature itself rather than its relationship to the Java language. Or maybe "Class Nests for Access to Private Members".
> The intent of the name is to aid in the contextual positioning of this JEP ie address why do we even need this. The JEP addresses an inconsistency between access controls in the Java programming language and access controls in the JVM. The primary goal here is to align the two, while at the same time allowing for additional usage of nestmates in the future.
Yeah, but I think explaining the motivation in the title makes for a long title and limits the imagination to one particular use case.
>> The JEP uses "nest top" to describe the class that nest members reference; I prefer "host class", which better describes the class's role and isn't tied to the Java "top level class" concept. I know we use "host class" internally in Hotspot, perhaps when working with anonymous classes (of the JVM flavor), but I think in that context it will ultimately mean the same thing? Are we comfortable repurposing the term in this way?
> Given that a nest-top type must be a top-level type, I think nest-top fits perfectly.
This is a good example of my point about the title: the JVM has no concept* of "top level type", and doesn't need one. We just need a specific class to act as the reference point by which a group of classes can talk about a nest.
For example, I can imagine some language introducing a class file—foo.bar.Nest$$001—solely for the purpose of providing a hook for this reference point. javac's choice to designate a top-level class as the nest-top/nest-host is a compilation strategy.
(*Yes, there's an InnerClasses attribute, and Signature attributes, and lots of other stuff to facilitate compilation in the Java language. But those represent auxiliary metadata, not core JVM concepts.)
> I dislike "host class" because that is already used for the quite different case of VMACs. I can live with other names but I think they should incorporate "nest" eg. nest-holder, nest-host. But I think nest-top is an ideal match. (And no I didn't invent this terminology :) ).
Okay. Probably best to sit on this for awhile and circle back to it, see how we're all feeling.
>> I follow Brian's model when it comes to nest membership (5.4.4): every class belongs to a nest, possibly (in the absence of MemberOfNest) the nest hosted by itself. Many nests are singletons without any associated explicit attributes.
> I have not yet seen a follow up from John as to whether we require or just allow, an empty NestMembers attribute to indicate a singleton nest member.
If in our model *every* class has a nest, then we definitely don't want to require every class to have a NestMembers attribute.
In my spec (5.4.4), the presence of a NestMembers attribute is irrelevant. A class without a MemberOfNest attribute belongs to its own nest, and there's no need to validate anything about NestMembers.
>> Verification of MemberOfNest
>> I include a discussion block about different options of validating MemberOfNest. I think the consensus, and my preference, is to do it during verification of the member class. (NestMembers, on the other hand, is never validated, except as a side-effect of checking MemberOfNest.)
> Given verification is optional the current approach is to validate during linking just prior to the verification step. Whether that can conceptually be considered part of verification I am unsure.
I think this is the source of a lot of anomalies—spec treats verification as an essential part of linking, while Hotspot treats it as an optional add-on.
You should be okay, for some definition of "just prior". These are the constraints, as I read them:
- "A class or interface is completely loaded before it is linked." (5.4)
- Loading includes structural checks (CFE), version checks (UCVE), and validation and loading of superclasses and superinterfaces (5.3.5)
- "Errors detected during linkage are thrown at a point in the program where some action is taken by the program that might, directly or indirectly, require linkage to the class or interface involved in the error." (5.4)
That last one is pretty mysterious to me (what does "some action" mean? what about "indirectly"?), but seems to grant pretty wide latitude on the timing.
>> Verification of invokespecial
>> Allowing invokespecial to refer to classes other than the current class and its supers is a significant change. I noticed and relied heavily on the parallel with invokevirtual making protected method calls. So I tried to make the two as similar as possible. In a few places, the treatment of protected methods doesn't seem ideal, and rather than trying to mirror that with non-private invokespecial, I modified the protected method treatment.
> This concerns me because protected-access is a somewhat complex/messy issue so aligning private-access with it seems to go in the wrong direction. Cleaning up the protected access rules, while perhaps desirable, is technically out-of-scope for this JEP in my opinion.
>> The "protected check" of verification, in particular, was a mess before, and I think I've made it a lot more manageable, and compatible with a parallel rule for invokespecial. I could use some feedback on exactly what Hotspot's verifier has been doing here, though, since I'm pretty sure it didn't match the old specified rules.
> Might be better to take this up separately.
Yeah, I hear you. But I worry about the technical debt if we just look the other way and specify/implement a fresh invokespecial check separately. The subtle inconsistencies between the two will lead to all sorts of bugs and tweaks in the future; the duplicate code/spec will be an ongoing maintenance burden.
From the spec perspective, pulling on the thread looks like this:
- The new verification we need on invokespecial looks almost the same as the verification we need on protected methods.
- Okay, I'll re-use that rule. But it's a complicated mess, and has bugs, so I've got to clean it up first.
- What exactly are these rules trying to achieve? I can see the hard constraints (in configuration X, an error occurs), but there's a soft requirement to minimize class loading.
- To figure out when class loading is acceptable, I need to know what the reference implementation actually does.
>> MethodHandle resolution
>> The spec (126.96.36.199) is vague about what errors can occur during MethodHandle resolution. I assume any linkage error specified for the instruction can also occur via MethodHandle resolution, and that will include failures due to invokespecial improperly referencing a class.
> This is an area I have least understanding of, but this seems to cover it:
> "In each step, any exception that can be thrown as a result of failure of resolution of a class or interface or field or method reference can be thrown as a result of failure of method handle resolution."
The ambiguity is in the definition of "resolution". The preferred interpretation, which everyone seems to confirm reflects reality, is that it includes all linkage errors specified for the referencing instruction. I.e., errors from both both 5.4.3 and 6.5 can occur.
>> Dynamic checking of invokespecial receivers
>> When invokespecial involves interface types, the verifier can't guarantee that receiver objects actually implement that interface (JDK-8134358). It's an open question whether this is really a problem, but I included a tentative fix in the invokespecial runtime rules.
> This seems out of scope in the general case, but there may be nestmate specific actions required.
I touch it here because:
- It's a dynamic companion to verification, and we've modified the verification rules
- There's an existing rule that may or may not be intended to perform part of this dynamic check; that needs to be addressed somehow
- Any rule we introduce is impacted by the new treatment of private methods, so easier to get it right while working on this feature
- Easier to address small things like this when it's swapped in already
If necessary, we can yank it and address the bug separately in the future, but it seems convenient to tackle it now.
>> Compiler changes
>> The JEP text can't seem to decide if compiler changes are part of it, or a follow-up exercise. I think we're best off explicitly including the compiler changes, which will provide opportunities for design validation and testing.
> Not sure what gives you that impression, but javac changes are an essential part of this. Lacking a compiler spec I assume this will be handled informally. But the attribute definitions (see comments below) should dictate what a source compiler is required to do.
- Would be clearest if "Goals" mentioned modifying javac
- Language like "javac can" suggests optional followup work; should have a paragraph saying "javac will" instead
- Remove the paragraph saying "it is not strictly required to remove these immediately as part of this feature"
>> Security risk
>> The JEP text should acknowledge that, while this does allow compilers to grant finer-grained access to members shared by nestmates, it also pushes compilers to grant broader access to members that were previously kept private. It's a trade-off, and presumably a good one because nestmates are completely trusted, while package-mates might sometimes be suspect.
> I'm not following you here. What broader access is being granted to members that were previously kept private? This effort allows private-only access to members that were previously package-accessible.
Old approach is to widen access to private members to the package level on a per-member basis.
New approach is to widen access to private members to the nest level on a per-class basis.
Nest is narrower than package (good!), but per-class is more permissive than per-member (bad). Some members that are compiled as totally private in JDK 9 will be incidentally shared with all nest members in <future JDK version>.
> 4.7 Attributes
> Regarding the Exceptions attribute, it seems to be used only to store data to be passed back to the libverify and the reflection API.
I'm not familiar with libverify, but I think you're confirming that the attribute is on par with Signature? Something for tools and standard APIs to rely on, but irrelevant to execution?
> Regarding the definitions of NestMembers and MemberOfNest ... I modelled the definitions in:
> based on those for the innerclass (and related) attributes. I'm not sure why you moved away from those definitions as they pinned down exactly when these attributes are expected to appear. Without those parts the current definitions seem optional - ie they define what the attribute means if present, but they do not require its presence. I do not believe these attributes should be optional for classfile version 54, but required whenever a nest does exist. A java source compiler writer should look at these definitions and know when they must generate these attributes.
These sorts of assertions belong in the "Java Compiler Specification". (Yes, this document does not exist. Yes, that's kind of a problem, but we muddle through.)
In an ideal world, JVMS would not acknowledge that the Java Programming Language exists. It should stand on its own, and Java is merely a client.
As is, we have some history that has led to some dependencies on Java in spec, but these are generally auxiliary items: attributes that don't impact JVM behavior (4.7), a tutorial on how to compile Java code (3), occasional explanatory references to Java language concepts (e.g., 2.9).
But it would be wrong for the specification of a "critical" attribute (4.7) to be defined in terms of Java language concepts. It should stand on its own, general enough to meet the needs of Java language compilers and others with different use cases.
> 4.7.26 The MemberOfNest Attribute
> "the class is implicitly a member of its own nest"
> I suggest changing "a" to "the sole". Otherwise "a" implies there may be other members. As these are static properties being defined I don't think we need to be accounting for some future dynamic expansion of a nest.
I intended to include both the cases of a singleton nest and the case of a class that has a NestMembers attribute. But I'll add a clarifying sentence.
> 4.9.2 Structural Constraints
> The changes to invokespecial seem okay, but I have a problem with terminology. In:
> "If invokespecial is used to invoke a non-private, non-<init> method, referenced via a superclass or a direct superinterface, ..."
> what does "referenced via" mean?
I'll change to "where the referenced class is"
> "If getfield or putfield is used to access a protected field referenced in a superclass ..."
> and following, you changed "declared" to read "referenced". Again I do not understand what this is supposed to mean. It is the declaration site of the field or method that is needed to determine whether that member is in the same or another package. To me "declared in a superclass ..." was exactly correct. Your note does not make the change any clearer to me.
Good catch. It's important that the rule be limited to cases that reference a superclass*; but it's also important that the declaring class be used for the package check.
(*Here's why: if I reference my subclass, and that resolves to a field declared in my superclass, the verifier will perform no check, because it may not know that the referenced class is a subclass.)
"If `getfield` or `putfield` is used to access a `protected` field by reference to a superclass, and that field is declared in a different run-time package than the current class or interface, then the type of the class instance being accessed must be assignable to the current class or interface."
> 4.10.1 Verification by Type Checking
> Does the order in the rules mandate the order in which the VM must actually check things? classHasValidNest seems much more complex than the actual runtime process of validation (though it may just be exposing distinct steps that are lumped together at runtime).
Probably not, but dependencies between inputs/outputs do suggest an ordering.
(We could probably get into an argument with a logician or a Prolog expert about whether there's such a thing as an "output" at all, and whether it would be acceptable for a JVM to load every class in the universe in order to test certain predicates, which is why it's probably not a great idea to specify the verifier using Prolog rules. Alas.)
The important elements are:
1) Resolve the host class using the current class's loader
2) Check that the host class belongs to the same package
3) Check that the current class is named in the host's NestMembers
4) Resolve the current class's name in the host class's loader, and make sure it produces the same class
> 188.8.131.52 Type Checking for Restricted Member References
> I'm not very good at reading or understanding these rules, but I'm surprised that the invokespecial rules seem to make no mention of nestmates at all:
> "A method reference is allowed by an invokespecial instruction if it is not a restricted invokespecial reference, ..."
> This seems to open things up to all private method references??
There's a careful partitioning of error conditions that has to happen in order to avoid too much class loading.
184.108.40.206 is only concerned with ensuring, in narrow cases, that the type on the stack is assignable to the current class. For invokespecial, the narrow case is a method reference naming a superclass/direct superinterface and a method not named "<init>", where the resolved method is not private.
Separately, 5.4.4 ensures that resolving a private method will produce an IllegalAccessError if the method is not in the same nest.
Also separately, 6.5 ensures that if the resolved method is not private, the referenced class is the current class, a superclass, or a direct superinterface.
And also separately, 220.127.116.11 ensures that the type on the stack is always assignable to the referenced class.
> 5.4.4 Access Control
> I don't think you need to restate:
> "A class with a MemberOfNest attribute belongs to the nest hosted by the referenced host class.
> A class without a MemberOfNest attribute implicitly belongs to a nest hosted by the class itself. (If the class also lacks a NestMembers attribute, then the nest has only one member.)"
> The rule is simply stated as is: "belonging to the same nest as D" - belonging to the same nest is, or should be, already defined elsewhere.
My intent is that this *is* the definition. But sounds like you're expecting that definition to be in 4.7.25/4.7.26, which is probably more intuitive. I'll change that.
> I'm avoiding commenting on the protected access changes but do want to raise a concern that there are no changes to Method resolution (18.104.22.168) yet resolution relies on the definition of access control to prune/discard inaccessible methods. It appears now that we're allowing more potential successful resolutions, then using the additional rules on the invoke* bytecodes to try and discard any undesirable results.
Yeah, this motivated my question about MethodHandle resolution: it seems that when we talk about "resolution", we really mean the process defined in 5.4.3, followed by any linkage checks defined in 6.5. If so, this is just a presentational reshuffling (with the minor exceptions I noted in the document).
Why bother? Because if the check on the referenced class of a protected invokevirtual/getfield/putfield is considered an access check (22.214.171.124), then the check on the referenced class of a non-private invokespecial probably ought to be considered an access check, too, but that leads to a concept of "accessible" that is really unwieldy and context-dependent.
> 6.5 getfield (and others)
> You note "These rules are redundant: verification already guarantees them " but this brings me back to an area I keep raising concerns about: the split between static verification based checks and dynamic runtime checks. Yes the verifier precludes certain things but if we run without verification the rules expressed for the bytecodes are considered to be still required at runtime. If you delete them because they overlap with verifier rules we have no way to tell which rules must be enforced regardless of verification status.
I propose removing them because on their face they appear completely redundant, an artifact of a time when we didn't have clear definitions of loading, linking, verification, etc. But if we decide it really is helpful to keep some or all of these assertions, that's fine.
It seems to me that attempting to interpret unverified bytecodes is an implementation-specific feature that JVMS knows nothing about. Deciding which rules must be enforced in the absence of verification is an important part of designing that feature, and in an ideal world some comprehensive documentation about which verification assertions you've chosen to enforce dynamically would be maintained somewhere, but not in JVMS. If we want JVMS to maintain that list, then running without verification should be a first-class fully-specified feature.
The status quo, I'm guessing, is that a few dynamic checks are sprinkled throughout JVMS, but plenty of other checks are performed without any supporting spec text. Which isn't a great place to be in.
> 6.5 invokespecial
> You note regarding the IllegalAccessError:
> "This replaces a VerifyError previously specified by 4.9.2. The check must be delayed until after resolution in order to determine whether the referenced method is private."
> We know the access flags for the referenced method at verification time - shouldn't that be the sole basis for the verifier check? Afterall it is verifying the static properties of the classfile and bytecode. If the actual resolved method has different access to the referenced method then that may lead to ICCE (depending on the exact nature of the change - a private method made public may not be an issue for example).
Verification relies on the descriptor to tell it that this method expects an int and returns nothing. Those kinds of checks can be performed locally.
The descriptor doesn't tell you if the method is private. Only way to find out is to resolve SomeOtherClass, then resolve "foo(I)V".
If we wanted to do the check at verification time, we'd have to load every class named by an invokespecial instruction.
(I think you might be conflating declaration-site metadata, which we have easy access to, with use-site metadata, which requires resolving references.)
> 126.96.36.199 Method resolution
> You did not make any changes here, but as per my comment in the bug report we do require, IMHO, further tightening here to ensure nestmate invokespecial invocations do not resolve to method implementations that they should not. The example here is a hierarchy of nestmates (C extends B extends A) where A and C both declare a private method "void m()" and we have an invocation c.m() where c is an instance of class C. We then "separately compile" C to remove the definition of m(). At runtime method resolution will locate A.m, even though the method reference was for C.m. Normally the accessibility rules would exclude A.m from being a viable candidate but here the classes are nestmates so A.m is accessible. But it would be wrong to invoke A.m. This case should throw ICCE or NSME.
You're taking issue with the fact that method resolution will match private methods in superclasses, when our intuition (and the Java language model) is that private methods are not inherited.
I agree: https://bugs.openjdk.java.net/browse/JDK-8021581
Is this the right venue to address that issue? As demonstrated above, I'm happy to lump in tangentially-related bug fixes. :-)
You could argue that this situation isn't so different from the status quo, but it does seem unique that you can delete 1 private method from a consistently-compiled program and then get a silent behavioral change.
In practice, javac won't separately compile nestmates, so this would be hard to reproduce. More generally, the feature is not designed for separate compilation, so real-world scenarios may be hard to come by, even in other languages.
Anyway, I'm game to take another look at JDK-8021581 if we want.
More information about the valhalla-spec-observers