java.lang.constant.ClassDesc and TypeDescriptor for hidden class??

forax at univ-mlv.fr forax at univ-mlv.fr
Thu Apr 16 09:34:28 UTC 2020


Hi Mandy, 

We have forgotten MethodType.toMethodDescriptorString() [1], it has to be updated too. 

I still disagree with John, it's fine to use the scheme c' has a name for the Hidden Class. I still think the Java methods that says in their spec that they return a descriptor should return a valid descriptor thus throw an exception if there is a hidden class somewhere in the descriptor so a user has to decide how to handle Hidden Class before calling those methods. 
Anyway, we should move on that, the patch is fine for me if the javadoc of toMethodDescriptorString is updated too, so let's record that i disagree and move on. 

And for ASM, we have already talked to deprecate any methods that takes a live object in Type (j.l.Class and j.l.r.Method) because it causes surprising classloading loop when ASM is used in a classloader or in an agent, hidden class not having a proper name is another data point in that direction. 

regards, 
Rémi 

[1] https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/lang/invoke/MethodType.html#toMethodDescriptorString() 

> De: "mandy chung" <mandy.chung at oracle.com>
> À: "Peter Levart" <peter.levart at gmail.com>, "Remi Forax" <forax at univ-mlv.fr>
> Cc: "John Rose" <john.r.rose at oracle.com>, "Brian Goetz"
> <brian.goetz at oracle.com>, "valhalla-dev" <valhalla-dev at openjdk.java.net>
> Envoyé: Mercredi 15 Avril 2020 20:40:48
> Objet: Re: java.lang.constant.ClassDesc and TypeDescriptor for hidden class??

> Hi Peter, Remi,

> Lois and Harold have given further feedback. They are concerned with option c's
> impact to JVM implementations due to this new form of signature whose name is
> outside the "[;"] envelope whereas signatures are always inside the "[;]"
> envelopes since day 1. Option c' appears to have a higher compatibility risk
> not only in existing libraries but probably also VM implementations.

> Option c has a low compatibility risk while existing code still needs to be
> updated to support hidden classes. As John advices [1], it would be a mistake
> for `Class::descriptorString` to throw an exception in the long run. We have
> exhausted the differences among these options.

> Spec change proposal is:
> - extend TypeDescriptor for entities that cannot be described nominally. If it
> can be described nominally, TypeDescriptor::descriptorString returns a
> field/method descriptor conforming to JVMS 4.3. If it cannot be described
> nominally, the result string is not a type descriptor. No nominal descriptor
> can be produced.
> - specify in the javadoc for Class::descriptorString and
> MethodType::descriptorString that the result string when it can be described
> nominally conforming to JVMS 4.3 or when it cannot be described nominally.
> - No JVMS change

> Here is the updated specdiff and webrev:
> Specdiff:
> [
> http://cr.openjdk.java.net/~mchung/valhalla/webrevs/hidden-classes/specdiff-descriptor-string/overview-summary.html
> |
> http://cr.openjdk.java.net/~mchung/valhalla/webrevs/hidden-classes/specdiff-descriptor-string/overview-summary.html
> ]

> Webrev:
> [
> http://cr.openjdk.java.net/~mchung/valhalla/webrevs/hidden-classes/webrev.06-descriptor-string/
> |
> http://cr.openjdk.java.net/~mchung/valhalla/webrevs/hidden-classes/webrev.06-descriptor-string/
> ]

> Please review.
> Thanks
> Mandy
> [1] [
> https://mail.openjdk.java.net/pipermail/valhalla-dev/2020-April/007093.html |
> https://mail.openjdk.java.net/pipermail/valhalla-dev/2020-April/007093.html ]

> On 4/14/20 2:16 AM, Peter Levart wrote:

>> On 4/13/20 9:09 PM, Mandy Chung wrote:

>>> On 4/12/20 5:14 AM, Remi Forax wrote:

>>>>> The problem is not that 'c' is easier to parse, but that 'c`' is not
>>>>> parsable at all. Do we really want unparsable method descriptors?

>>>>> If the problem is preventing resolving of hidden class names or
>>>>> descriptors, then it seems that making the method descriptors unparsable
>>>>> is not the right place to do that.

>>>> I agree with Peter,
>>>> throwing an exception is better, there is no way to encode a hidden class in a
>>>> descriptor because a hidden class has no name you can lookup,
>>>> if the API return an unparsable method descriptor, the user code will throw an
>>>> exception anyway.

>>> Several points that are noteworthy:

>>> 1. A resolved method never has a hidden class in its signature as a hidden class
>>> cannot be discovered by any class loader.

>> Right.

>>> 2. When VM fails to resolve a symbolic reference to a hidden class, it might
>>> print its name or descriptor string in the error message. Lois and Harold can
>>> confirm if this should or should not cause any issue (I can't see how it would
>>> cause any issue yet).

>>> 3. The only way to get a method descriptor with a hidden class in it is by
>>> constructing `MethodType` with a `Class` object representing a hidden class.

>> Or by custom code that manipulates class descriptors using String operations.
>> Suppose there's code that doesn't want to eagerly resolve types and just
>> manipulates Strings. Surely a class descriptor of a HC can only be obtained
>> when there *is* a HC already present, but ... never underestimate programmers'
>> imagination when (s)he is combining information from various sources, some of
>> them might be resolvable types, some might be just descriptors, etc...

> True. Our imagination is powerful!

> The main thing is that a bad method descriptor will fail to resolve.

>>> 4. `Class::descriptorString` on a hidden class is human-readable but not a valid
>>> descriptor (both option c and c')

>>> 5. The special character chosen by option c and c' is an illegal character for
>>> an unqualified name ("." ";" "[" "/" see JVMS 4.2.2). This way loading a class
>>> of the name of a hidden class will always get CNFE via bytecode linkage or
>>> Class::forName etc (either from Class::getName or mapped from
>>> Class::descriptorString).

>> Right. The JVMS may remain unchanged. But that doesn't mean that
>> Class.descriptorString() couldn't be specified to return a JVMS valid
>> descriptor for classical named types, while for HCs (or derived types like
>> arrays) it would return a special unresolvable descriptor with carefully
>> specified syntax. Such a syntax that would play well when composed into the
>> syntax of higher-level descriptors like method type descriptor. Why would we
>> want that? Because by that we get a more predictable failure mode. We only fail
>> when/if the type described by such descriptor tries to be resolved.

>> In this respect, both variants 'c' and 'c`' as you said, violate JVMS spec for
>> valid class descriptor, but 'c' has a more carefully chosen syntax.

> 'c' keeps the "[;" envelope which is the long-standing format.

> I would say putting the name inside the "[;" envelope may not break existing
> tools (for example if they never use the name to find the class) whereas
> putting the suffix following ';' is harder to predict how existing tools are
> impacted.

>>> For existing tools that map a descriptor string by trimming "L;" envelope and/or
>>> replacing "/" with ".", "Lfoo/Foo;/123Z" (option c') may be mapped to "foo.Foo"
>>> and ".123Z" (if used ";" as a separator)...

>> I would say that for existing tools that treat a single class descriptor at
>> once, with option 'c`' they won't treat ';' as a separator between multiple
>> elements. I would say that existing code that tries to trim 'L;' would either:

>> - remove the 'L' prefix and strip the string of ';' character wherever it is,
>> which would produce "foo/Foo/123Z" and consequently "foo.Foo.123Z" (a valid
>> binary name)
>> - grok and fail (for example because something that starts with 'L' does not end
>> with ';')
>> - if the code is "hackish" it might blindly trim the last character if the 1st
>> is '[', so we would end up with "foo/Foo;/123" and consequently with
>> "foo.Foo;.123" (not a valid binary name)
>> - something else that neither of us can imagine now

>>> or "foo.Foo/123Z" which are invalid name whereas "Lfoo/Foo.123Z;" (option c) may
>>> have higher chance be mapped to "foo.Foo.123Z" which is a valid binary name.

>> Right, but neither is 'c`' immune to that interpretation. At least the failure
>> mode of 'c' is more predictable.

>>> ";" and "[" are already used for descriptor. The remaining ones are "." and "/".

>>> JDWP and JDI are examples of existing tools that obtain the type descriptor by
>>> calling JVM TI `GetClassSignature` and then trims the "L;" envelope and replace
>>> "/" with ".". Option c produces "foo.Foo.123Z" as the resulting string which
>>> might make it harder

>> And what does option 'c`' produce?

>> Existing JDK tools could be updated from day 0. Existing 3rd party tools would
>> have to be updated too in either case. Typical failure mode for option 'c'
>> would be that class "foo.Foo.123Z" can't be found. Who knows what kind of
>> failure modes would option 'c`' produce if parsing was done in C for example.
>> Are crashes excluded?

>>> 6. Throwing an exception (option a) may make existing libraries to catch issues
>>> very early on. I see the consistency that John made about dual-use APIs that
>>> prints a human-readable but not resolvable descriptor. I got convinced that
>>> option c and c' have the benefit over option a.

>>> 7. Existing tools or scripts that parse the descriptor string have to be updated
>>> in both option c and c' to properly handle hidden classes. Option c may just
>>> hide the problem which is bad if it's left unnoticed but happens in customer
>>> environments.

>> I doubt that the problem would be hidden with option 'c'. Either the code would
>> just work (because it needs not resolve the descriptor of HC) or it would grok
>> on trying to resolve it. In theory the binary name "foo.Foo.123Z" could be
>> resolved into a real class, but that's hardly possible in practice unless you
>> specifically construct such case. And option 'c`' is not immune to that as
>> well. So I don't think that we would suddenly see a bunch of wrong resolvings
>> where "foo.Foo.123Z" would actually be resolved successfully. You have a say in
>> how the suffix in "foo.Foo/suffix" is constructed and by using something that
>> is not a usual name the chances can be minimized.

>>> My only concern is the compatibility risk on existing agents that assume JVM TI
>>> `GetClassSignature` returns a valid type descriptor and use it to resolution.
>>> Both option c and c' return an invalid descriptor string and so I consider the
>>> impact is about the same. JDI and JDWP have to be updated to work with either
>>> new form. As John noted, option c' has the fail-fast properties that may help
>>> existing code to diagnose issues during migration.

>>> That's my summary why I went with option c'. The preference is "slightly".

>>> Any other thought?

>> I think that it is easier to debug a more predictable failure even if it happens
>> a little later (when resolving the descriptor) than it is to debug an
>> unpredictable (unimagined) failure which supposedly happens a little earlier.
>> In that respect, option 'a' is most predictable, but it might be "to early"
>> (for example, what if some code just wants to log the descriptor). And 'c`'
>> seems a little scary to me, because I can't imagine all the possible failures.

>> Regards, Peter


More information about the valhalla-dev mailing list