Valhalla, startup, performance of interpreter, and vwithfield

Brian Goetz brian.goetz at oracle.com
Mon May 13 21:46:59 UTC 2019


Thanks Sergey. You are right, I had forgotten to put on my “interpreter glasses”.  All looks good.  

Sent from my MacBook Wheel

> On May 13, 2019, at 5:26 PM, Sergey Kuksenko <sergey.kuksenko at oracle.com> wrote:
> 
> 
>> On 5/13/19 7:00 AM, Brian Goetz wrote:
>> This is good news.  I want to ask further about the numbers you cite here.  You compare value creation to classic object creation, but obviously we want value creation to be faster.
> 
> In the interpreter? I am afraid that value creation cost in the interpreter can't be faster than classic object creation. We still have interpretation cost of value types slower than interpretation of equivalent classic objects. But the difference was reduced drastically. Also I didn't find any scenario where the interpreter performance has significant impact to startup time. The first execution which implies class loading, verification, etc is 500x times slower than subsequent execution in the interpreter. (classic objects and value types)
> 
>>  When you say it is comparable to classic object creation costs, I assume that you are not including the allocation cost, and comparing only the field write costs?
> 
> No. It includes allocation cost. Don't forget - I am talking about the interpreter performance. Here is some decomposition.
> 
> 1. Classic object creation: ~230ns (500 cycles) for the whole object creation. It could be split to ~200ns (440 cycles) for object allocation and ~30 ns (60 cycles) for fields initialization.
> 
> 2. Value type creation. Any single operation vdefault or vwithfield has ~200ns (440 cycles) cost. It's on par (even slightly better) than full object creation. And it looks normal, because of the single vdefault or vwithfield operation - "creates" object (or similar to it). Of course, than more fields we have than higher it is in the interpreter to gather the full objects.
> 
> As for compiled code - after C2 we have the following numbers:
> 
> e.g. (two-fields classe)
> 
> 1. Classic object creation: 14.9ns (total cost) (G1GC)
> 
> 1.1 Classic object creation - only fields write cost: 0.99ns
> 
> 2. Value type (full creation): 0.97ns   (slightly better than just fields write cost in case of classic object).
> 
> Note: all examples here was measured when all data are perfectly fit into CPU caches, even for classic objects. All value type benefits due to better cache locality were intentionally excluded.
> 
> 
>>> I did quick evaluation of startup and interpreter performance cost. I have to take back my words that "vwithfield is major contributor to the interpreter speed and merged(or fused) vwithfield could improve interpreter performance". It was quite long time age when I was looking into interpreter's performance last time. I have to say that a huge work was done for interpreter since that time and now I don't consider interpreter's performance as an issue. As for vwithfield, now cost of the single vwithfield (in the interpreter) is approximately 200ns (on 2.2GHz freq). It is not a big nor a small value. If compare cost of value creation vs cost similar classic java object creation (simple writes) then single vwithfield costs ~7%-10% from the whole object creation. So I am guessing that if you have a value with 10 fields (and 10 vwithfield operations) - you may double value creation cost, but it will have minor impact for the whole execution.
>>> 
>>> Also I have to say that if look into startup for the first execution of code - interpreter takes less than 1%. All others actions (classloading, verification, etc..) take much more time. As for "time to performance" - I didn't evaluate it yet. Interpreter's impact could be higher in that case. At the same moment  - working TieredCompilation will improve "time to performance" much more than any interpreter tuning.



More information about the valhalla-dev mailing list