hg: hsx/hotspot-comp/hotspot: 7013347: allow crypto functions to be called inline to enhance performance
john.r.rose at oracle.com
Tue Feb 7 15:46:21 PST 2012
On Feb 6, 2012, at 6:20 PM, Krystal Mok wrote:
> The original thinking behind modifying the UEP sequence in native wrapper is that the more space it takes, the more likely it'll become a code bubble that affects i-cache. The patch in the earlier post also modified the place where it checks thread state, which should also reduce code bubble for the common case.
Let's remember to control our tendency (I mean yours, mine, and everybody's) to make optimizations prematurely.
Just because a bit of code looks ugly, or a data structure contains a surplus byte, does *not* mean it will pay to smooth it out. Remember the following maxim: "If you touch it it will break." In the case of Hotspot, we have many times seen simple, innocuous changes cause unanticipated breakage, in either correctness or performance. To change the metaphor to binoculars or camera lenses, vigorously removing the lint can scratch the finish. A well-intentioned change can include unintentional costs, including escalating source code complexity, surprising performance interactions, noisy skew between versions or across ports, or schedule slips due to stress test failures.
In this case, when removing icache bubbles, it would be best to detect them in real benchmarks (not micro-benchmarks) and then follow this paradigm: First fix the problems that are known to affect performance. Such work will probably occupy more than half of our professional time, because there are plenty of such problems. To turn around the logic, if you see impossibly bubbly code (like nop sequences) in hotspot output, that is prima facie evidence that the "bad" code is irrelevant to performance, and probably also non-trivial to clean up.
I apologize if you have already done the empirical investigations for this case, and I know everyone already knows these generalities. But I can also tell you that, at least in the teams I have been on, if we don't remind ourselves from time to time to pick the tasks that matter (and can be proved to matter), we will become less effective as engineers. Nobody wants that, so I'm reminding.
In the case of UEP, it seems likely that those instructions are usually not in code cache for typical workloads. A code cache block contains many words of header information, immediately preceding the UEP, and all of that stuff appears to be irrelevant to performance. If we want to experiment with cache effects, I suggest that a more fruitful line of investigation would be to either rigidly segregate hot nmethod code from all other nmethod contents, or do cache-line aware assignment (coloring) and padding to pack the hot nmethod code (post UEP, almost always) most tightly into icache. I think OS engineers use similar techniques for perturbing thread stack addresses, to spread stack frames more evenly across dcache.
(And, to be clear, I do believe passionately in the value of code cleanups, and hope that every change can leave the workspace cleaner than before. The best engineering change, IMO, is one which throws away bad code and finds a cleaner-and-better solution for a difficult-and-practical problem. Another good change tweaks the system in an obviously correct way, without increasing source complexity, to achieve a localized beneficial result. Tom's suggestion of moving the UEP up past the nops sounds to me like such a thing.)
Hope this helps...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the hotspot-compiler-dev