Use DAX in ZGC

Per Liden per.liden at oracle.com
Fri Feb 14 14:08:55 UTC 2020


Hi,

On 2/14/20 2:31 PM, Yasumasa Suenaga wrote:
> Hi Per,
> 
> On 2020/02/14 20:52, Per Liden wrote:
>> Hi Yasumasa,
>>
>> On 2/14/20 10:07 AM, Yasumasa Suenaga wrote:
>>> Hi all,
>>>
>>> I tried to allocate heap to DAX on Linux with -XX:AllocateHeapAt, but 
>>> it couldn't.
>>> It seems to allow when filesystem is hugetlbfs or tmpfs.
>>>
>>> According to kernel document [1], DAX is supported in ext2, ext4, and 
>>> xfs.
>>> Also we need to mount it with "-o dax".
>>>
>>> I want to use ZGC on DAX, so I want to introduce new option 
>>> -XX:ZAllowHeapOnFileSystem to allow to use all filesystem as backing 
>>> storage.
>>> What do you think this change?
>>
>>
>> +  experimental(bool, ZAllowHeapOnFileSystem, false,     \
>> +          "Allow to use filesystem as Java heap backing storage "     \
>> +          "specified by -XX:AllocateHeapAt")     \
>> +     \
>>
>> Instead of adding a new option it would be preferable to automatically 
>> detect that it's a dax mounted filesystem. But I haven't has a chance 
>> to look into the best way of doing that.
> 
> I thought so, but I guess it is difficult.
> PMDK also does not check it automatically.
> 
>    
> https://urldefense.com/v3/__https://github.com/pmem/pmdk/blob/master/src/libpmem2/pmem2_utils_linux.c*L18__;Iw!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpfC316y0$ 
> 
> In addition, we don't seem to be able to get mount option ("-o dax") via 
> syscall.
> I strace'ed `mount -o dax ...`, I saw "-o dax" was passed to 5th 
> argument (const void *data). It would be handled in each filesystem, so 
> I could not get it.
> 
> Another solution, we can use /proc/mounts, but it might be complex.

I was maybe hoping you could get this information through some ioctl() 
command on the file descriptor?

> 
> 
>>     const size_t expected_block_size = is_tmpfs() ? os::vm_page_size() 
>> : os::large_page_size();
>> -  if (expected_block_size != _block_size) {
>> +  if (!ZAllowHeapOnFileSystem && (expected_block_size != _block_size)) {
>>       log_error(gc)("%s filesystem has unexpected block size " 
>> SIZE_FORMAT " (expected " SIZE_FORMAT ")",
>>                     is_tmpfs() ? ZFILESYSTEM_TMPFS : 
>> ZFILESYSTEM_HUGETLBFS, _block_size, expected_block_size);
>>       return;
>>     }
>>
>> This part looks potentially dangerous, since we might then be working 
>> with an incorrect _block_size.
> 
> I guess block size in almost filesystems is 4KB even if DAX.
> (XFS allows variable block sizes...)

With your current patch, a user could use -XX:AllocateHeapAt to point to 
any kind of file system, which (at least in theory) could have any block 
size. For things to work down the road we must ensure than ZGranuleSize 
is a multiple of _block_size.

> 
>    
> https://urldefense.com/v3/__https://nvdimm.wiki.kernel.org/2mib_fs_dax__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpxnIc0as$ 
> 
> So I think we can limit _block_size to OS page size (4KB).
> 
> 
>>   int ZPhysicalMemoryBacking::create_file_fd(const char* name) const {
>> +  if (ZAllowHeapOnFileSystem && (AllocateHeapAt == NULL)) {
>> +    log_error(gc)("-XX:AllocateHeapAt is needed when 
>> ZAllowHeapOnFileSystem is specified");
>> +    return -1;
>> +  }
>> +
>>     const char* const filesystem = ZLargePages::is_explicit()
>>                                    ? ZFILESYSTEM_HUGETLBFS
>>                                    : ZFILESYSTEM_TMPFS;
>>
>> This part looks unnecessary, no?
> 
> I added ZAllowHeapOnFileSystem to use with AllocateHeapAt.
> So I want to warn if AllocateHeapAt == NULL.

Yes, but that seems unnecessary, and I suggest it's removed.

cheers,
/Per

> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
>> cheers,
>> Per
>>
>>>
>>>    http://cr.openjdk.java.net/~ysuenaga/dax-z/
>>>
>>> If it can be accepted, I will file it to JBS and will propose CSR.
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>>>
>>>
>>> [1] 
>>> https://urldefense.com/v3/__https://www.kernel.org/doc/Documentation/filesystems/dax.txt__;!!GqivPVa7Brio!PlQs19bQVBJF7PDA9RLZ9JLbXOQ2KYocNW6DJH-eOUqXZcYwl-cSvSjpe5WElhc$ 
> 


More information about the hotspot-gc-dev mailing list