Proposal for generics over primitives needs a rethink

Remi Forax forax at univ-mlv.fr
Wed Dec 31 22:58:25 UTC 2014


Just to add that there is another cost with the C# approach, you must 
have exactly the same bytecode for Set<int> and Set<boolean> so you can 
not by example, specialize Set<boolean> to use a bitset.

About the introduction of a type Any in the VM, it means that now we 
have a type that can store a primitive or a reference,
we know how to represent this kind of type using tag bits (like V8 does) 
or Nan boxing (like Mozilla"s *Monkey VM does).
Basically, representing Any in the VM is like asking for transform a 
Java VM into something very close to a Javascript VM.

happy new year,
Rémi

On 12/31/2014 11:33 PM, Brian Goetz wrote:
> Thanks, Gavin, for bringing up this point.
>
> I'm actually a little surprised that no one has asked this question 
> before; after all, the "why not 'just' have an Any type" question is 
> kind of an obvious one after you start thinking about this problem for 
> a few minutes.
>
> *Obviously* it would be more desirable to integrate primitives and 
> values into generics by leaning on the existing notion of type bound, 
> rather than introducing all the additional complexity that we're 
> considering.  (Also obviously, this must have occurred to us in the 
> first five minutes of thought.  So why, after so much effort, have we 
> said nothing about this possible approach? Indeed, it's on our (long) 
> to-do list to write up some of our analysis of various roads not 
> taken, including this one.)
>
> When designing a language (at least, one intended for real work), you 
> need to pay attention to both the part where it meets the user, *and* 
> the part where it meets the compilation target; if the mapping between 
> source-level concepts and target-level concepts is not sufficiently 
> straightforward, bad things will happen.  But, most suggestions we 
> receive for evolving Java tend to focus only on the former.  (This is 
> natural; developers usually only see the source code, not the 
> bytecode, and even some language designers are willing to accept 
> dramatic impedance mismatches between source code and bytecode if it 
> gets them to their expressiveness goals.)
>
> But the reality is that, if we were to ignore the latter, people would 
> be happy for a few minutes and then unhappy forever due to the parade 
> of corner cases, complexity, and performance potholes that this 
> approach generally leads to.  We don't want to do this to our users.  
> (We are lucky enough to have some control over our compilation target, 
> but we're also constrained there as well by the same compatibility 
> requirements.)
>
> For the record, the reason we rejected a unified 'Any' type is: it is 
> a fiction.  (A "unifiction". 
> (https://twitter.com/BrianGoetz/status/461539994197585920)). Sure, 
> it's easy to use 'Any' as a pseudo-type bound, and we could certainly 
> choose to denote "Foo<any T>" as "Foo<T extends Any>", but all this 
> does is draw the user further into the Any fiction while not actually 
> making it a reality.
>
> Where the wheels start to come off the wagon is: how do we represent a 
> variable of type 'Any" in bytecode (a field, local variable, or method 
> parameter or return type)?  If we can't answer that, we can't allow 
> use of Any in these places.  And solving this problem amounts to only 
> slightly less than a total redesign of the JVM and bytecode 
> architecture.  So this harmless-seeming question (couched in claims of 
> "simpler" and "more elegant") amount to "Why not just redesign the VM 
> completely."
>
> Languages that have attempted to unify primitives and references on 
> the JVM, with the existing bytecode architecture, while retaining some 
> sort of compatibility with existing Java idioms, have failed at doing 
> so. (And I am thankful to have those experiments to inform our work 
> here!) As a concrete example, I point you to Paul Philips' excellent 
> "Scala War Stories" talk from JVM Language Summit 2013, which covers 
> the failure of such unifictions, and more:
>   http://medianetwork.oracle.com/video/player/2623635250001
>
> But you might say "Wait a second, C# managed to pull off this redesign 
> of the VM to support polymorphism over objects and primitives".  And 
> indeed they did, and overall their solution is quite elegant.  And 
> obviously, we must have known about this example, so why wouldn't we 
> explore this?
>
> Well, obviously we have.  The cost of the C# approach is that existing 
> classes could not be gradually migrated to be generic; existing 
> collections had to be effectively deprecated and replaced, or a "flag 
> day" had to be declared where all the code (library and client) 
> changes simultaneously.  These are not options for us.  At the risk of 
> being obnoxious, C# was able to get away with it because at the time, 
> they had a very small base of existing users and code and were not yet 
> successful enough to have to worry about compatibility.  Lucky for 
> them, unlucky for us.
>
> Some more comments inline.
>
>> I'm rather concerned with this proposal
>
> We're concerned with it too, as I think we've made quite clear. Here's 
> the position we're in: if we wait until we have a complete, 100% 
> solution before sharing anything, people throw rocks at us for doing 
> everything behind closed doors, but if we share our working thoughts 
> in progress, people throw rocks at us for being half-baked.  We've 
> chosen the latter poison, so by all means, throw your rocks, but don't 
> kid yourself that you've spotted something that no one else has.  (And 
> please, check the attitude at the door, it's just not helpful.)
>
>> What this proposal does is introduce parametric polymorphism over
>> primitive types, while leaving it impossible to abstract over
>> primitives and reference types with subtype polymorphism. Thus, at the
>> intersection of the two systems of abstraction, namely, *variance*, we
>> get the broken behavior that a List<int> isn't a List<?>.
>
> Indeed, we've already pointed this out, and its not pretty.  All 
> constructive suggestions accepted.  But implicitly dropping a key 
> requirement (like gradual migration compatibility), and then claiming 
> there's an obvious answer, is not really helpful.
>
>> I therefore suggest a different, simpler, and much more natural
>> starting point for this work: stop pretending that there is no type
>> Any.
>
> This is a particularly funny way to put it, as it is the notion that 
> there *is* an Any type which requires pretending!  There is simply no 
> way (without boxing) to represent this on the JVM as it currently 
> stands.  But if boxing were good enough, then we wouldn't need to do 
> anything -- we'd just write ArrayList<Integer> and be done with it!  
> But obviously boxing isn't good enough, since we're having this 
> conversation. Which means you need a VM story for how we're going to 
> represent a flattened array of XY-points or make ArrayList<int> 
> actually be backed by an int[] array or inline value types into 
> containing classes, and still plays nicely with generics.  Where the 
> data hits the heap is where the boxing story (and therefore the Any 
> story) falls apart.
>
> All in all, you paint a picture of a beautiful world, but not the one we
> find ourselves living in.  If we were designing a language from scratch,
> or didn't have users, or hated our users, we would certainly be 
> exploring this approach in preference to the current approach we've 
> staked out.  (This is so obvious I wish I didn't even have to say it.) 
> But we're not ready to throw our users under the bus to the degree 
> that this approach seems to entail.  But if we've missed something 
> obvious, by all means, point it out (but please, constructively).
>
> And, feel free to prove us wrong!  Try implementing the changes you 
> are envisioning in the JVM, and show how they can get us to the goal!
>
>



More information about the valhalla-dev mailing list