RFR(XS): JDK-8199780: SetMemory0 and CopyMemory0 in unsafe.cpp need to resolve their operands
paul.sandoz at oracle.com
Tue Mar 20 16:55:23 UTC 2018
> On Mar 20, 2018, at 9:25 AM, Erik Österlund <erik.osterlund at oracle.com> wrote:
> Hi Roman,
> On 2018-03-20 16:40, Roman Kennke wrote:
>> Am 20.03.2018 um 12:00 schrieb Erik Österlund:
>>> Hi Roman,
>>> On 2018-03-20 11:36, Roman Kennke wrote:
>>>> Same reason as splitting resolve -> resolve_for_read/resolve_for_write
>>>> in other routines: being able to distinguish read and write access.
>>>> Also, I'd rather be careful to put this stuff in central places that
>>>> might over-cover it.
>>> It sounds like the motivation for this in my opinion more fragile call
>>> site chasing code is optimization.
>>> What is the performance difference? Has this showed up in any profiles?
>>> Whenever robustness is traded for performance, it would be great to have
>>> some understanding about how much performance was lost.
>> I don't have numbers. But it is not hard to see that copying potentially
>> large arrays twice has some impact. It may only really matter in
>> interpreter and C1, because C2 would most likely intrinsify anything
>> that would show up in profiles, but this would still amount to startup
>> time penalty I would think. I don't really intend to trade robustness
>> for performance: my goal is to make a robust API that also allows GCs to
>> be efficient.
> Conversely, I would be surprised if there was a considerable difference to startup due to hitting an unnecessary write barrier for an arraycopy during startup, happening precisely while concurrent relocation is going on and the object has been previously unmodified since before relocation started. I think that if you want to change the API to something in my opinion more fragile purely for optimization purposes, I think it would be appropriate to at least measure if it makes a difference or not so that we get a good understanding about why we are doing this.
I concur. In prior updates to unsafe i was involved in we focused on correctness, and if performance was a concern or priority then an intrinsic would developed for C2 and possibly for C1.
More information about the hotspot-dev