TruffleSOM Status Update
java at stefan-marr.de
Fri Dec 20 10:01:15 PST 2013
I finally got around to adapt the TruffleSOM parser to handle access to the outer lexical scopes of blocks differently. This avoids the use of materialized frames. Together with the type specializations for primitive operations/builtins, this completes my list of open known issues that need to be solved in TruffleSOM. However, the performance results are less then impressive: TruffleSOM is roughly 20-30x slower than RPySOM. And, I have no idea where to start fixing that.
Ok, so a few more details.
Upvalues: Access to Lexical Block Scopes
In SOM Smalltalk, control structures are implemented with message sends and blocks, so it is common to have some nesting of blocks, and access variables in their lexical scope.
foobar: arg = ( | a |
arg ifTrue: [ a := #bar ]
ifFalse: [ a := #foo ].
This simple example relies in the worst case on a library implementation of #ifTrue:ifFalse:, in TruffleSOM it is already specialized with a distinct AST node. But still, we have the issue that two blocks need to access `a` outside of its direct scope. Originally, I solved that the classic way and kept pointers to materialized frames in each block in order to find the lexical scope. Now, I solved it differently: methods and blocks know during compilation whether a variable is accessed outside of its context. If so, the variable will be turned into an `upvalue`. Thus, it is not stored in the frame, but in a so-called `upvalue` array.
In this example, the method foobar has an upvalue array with one element: `a`. The two blocks do not have any variables, so, they also don’t have any upvalues. But, the blocks need a reference to the outer lexical scope, in order to access the upvalues.
The upvalues arrays itself is kept in an `Arguments` object  pass on method invocation.
Since each method/block knows the number of upvalues it has, the array can be created together with the `Arguments` object for it.
While this approach avoids the materialization of frames, it probably has its own issue.
First, the `Arguments` object is now stored in block objects as the lexical outer context, in order to allow the necessary access to upvalues. Thus, the `Arguments` object ‘escapes’, which might be problematic, because the JavaDoc of com.oracle.truffle.api.Arguments says it shouldn’t.
Second, the `Arguments` object uses a plain Object array, and thus, all values that it stores need to be boxed, which might not be desirable for simple integer loops.
Arguments Object Use
Another aspect of my Arguments use might be problematic as well. I now use it as one combined structure to store all relevant information of the activation (method or block). Thus, it also has the reference to `self`, the method arguments as an Object array, and my marker to indicate whether a frame is on the stack or not. Earlier, I stored all this information in separate slots in the frame. But now, to simplify the handling of context, I keep everything together in the `Arguments` object. I am not sure whether I could actually do it any other way without having to materialize frames at one point. But, this means, `Arguments` objects do escape, and self is always some boxed Java object, even thought it might be ideally an unboxed primitive int.
But, since arguments are passed as an Object as in the SimpleLanguage, I guess that this is common practice?
For the type specialization, I already mentioned a few details. Now with having the materialized frames solved, I hoped the type specialization would also lead to better performance, but it is only in the 30% range, and thus, doesn’t really help to fix the huge gap with RPySOM. I think, something else is still going very wrong here.
Overall, my design with implicit casts to unwrap SOM objects leads to the situation that most of the time, there are just Java objects floating around. Only when an object gets assigned to a SOM object field, it gets wrapped as a SOM object. So, I think, that reaches one of the relevant goals, however, it is Java objects instead of Java primitive values. I don’t know how that could be fixed, and whether it actually needs to be fixed.
My memory tells my that one of the main rules was that a parent node should not make assumptions about a child node. This manifests itself in using `executeGeneric(.)` or equivalent methods to execute child nodes. However, I don’t see how the boxing could be avoided then, if we actually have nicely specialized primitives that implement for instance the addition of two Java unboxed ints. I guess, the partial evaluator will see such patterns and just do the right thing?
Well, I don’t know. I also spend a bit of time with igv and -G:Dump= to poke around in the graphs to try to understand where it is going wrong, but without much success so far.
The Truffle ASTs look nice and as I would expect them to look. Inlining seems to work, as well as specialization. However, it still generates enormous amounts of stuff for simple integer loops. And I don’t how to start reducing that to the necessary things.
Any help on how to speed things up, and what could be wrong are highly appreciate.
Software Languages Lab
Vrije Universiteit Brussel
Pleinlaan 2 / B-1050 Brussels / Belgium
Phone: +32 2 629 2974
Fax: +32 2 629 3525
More information about the graal-dev