RFR: JDK-8234507 [lworld] Macro ASM version of access_value_copy
frederic.parain at oracle.com
Thu Nov 21 16:20:06 UTC 2019
Thank you for this work, it was huge but it seems you tackled it nicely.
Nice refactoring of the allocation code.
And the performance numbers sound great.
Here’s my feedback on your webrev:
line 227: does the method really need to be virtual?
line 4520: using temp_reg instead of obj for the temp register of test_klass_is_empty_value()
would make the code easier to read (just my opinion)
line 107: I don’t get the comment (seems to be a bad copy/paste from line 574)
line 551: does the method really need to be virtual?
line 4339: the template of defaultvalue could be simplified by using the new get_empty_value_oop()
line 3704: qputfield could be re-written to use access_value_copy() and avoid the runtime call
(this could be delayed to another CR)
line 3795: Label slow_path is not used
line 508: [style] add empty line?
line 109: allocate_instance_buffer() it would be helpful to have a comment here to explicitly
state that the allocated instance memory has not been cleared, so initialization of
its payload *must* follow before returning the oop to Java (there’s a comment in
collectedHeap.hpp, but people have to dig in to find it)
> On Nov 21, 2019, at 06:10, David Simms <david.simms at oracle.com> wrote:
> Updated webrev0, Tests pass
> On 20/11/19 3:07 PM, David Simms wrote:
>> From further testing it looks like eden_allocate requires "obj" arg as "rax", so some juggling of regs remains to be done in "InterpreterMacroAssembler::read_flattened_field()"
>> On 20/11/19 2:36 PM, David Simms wrote:
>>> Bug/RFE: https://bugs.openjdk.java.net/browse/JDK-8234507
>>> Webrev: http://cr.openjdk.java.net/~dsimms/valhalla/8234507/webrev0/
>>> In working towards complete support for the "Heap Access API", I have added "MacroAssembler::access_value_copy". Given the complexity of performing essentially a "clone" operation (oop iteration applying barrier logic interspersed with memcpy), it is currently implemented simply as a leaf call (has very little overhead).
>>> There maybe be further opportunities to optimized for value type containing no oops etc, but this general purpose variant would probably remain as "fall back" implementation, so it seem like decent first step for building on.
>>> In order to provide testing for "MacroAssembler::access_value_copy", I created a "fast path" alternative for "InterpreterRuntime::read_flattened_field()" see: "InterpreterMacroAssembler::read_flattened_field()". This required quite a number of support methods, i.e. assembler versions of:
>>> * "InstanceKlass::get_value_field_klass()"
>>> * "ValueKlass::read_flattened_field()"
>>> o is_empty_value()
>>> o default_value()
>>> o allocate_instance()
>>> o data_for_oop()
>>> This increased the scope of the change, most of these ancillary helpers make up the bulk of the change. I feel this was necessary in order to evaluate whether the performance gains were the amount of engineering required to avoid interpreter runtime calls. Micro-benchmarking show a x3 performance gain for reading int sized value type field (and x2 gain for reading empty value type field). So answer seems to confirm Frederic's desire to remove as many runtime calls as possible.
>>> Of interest: "MacroAssembler::allocate_instance()". I took the guts out of "TemplateTable::_new" in order to allocate "oop buffer". This seems like fairly general code (not specific to interpreter), thinking we can refactor some of our other tlab allocations to use this (perhaps "MacroAssembler::store_value_type_fields_to_buf")
>>> Still running testing, there may be a subtle bug hiding...but this patch is taking it's time, best gather feedback at this point.
>>> /David Simms
More information about the valhalla-dev