Proposal for adding O_DIRECT support into JDK 9
yingqi.lu at intel.com
Thu Oct 13 22:37:07 UTC 2016
O_DIRECT is widely adopted in applications designed for high IO throughput, such as webservers and Databases. It bypasses filesystem cache and Linux readahead buffer which some time slow down the throughput and cause unpredictable IO performance. As an example, we recently measured on our Apache Cassandra database (one of the most popular distributed database systems written in Java) on default buffered IO, lowering readahead buffer from 128KB (default) to 8KB improves throughout by up to 4X. This is a typical usage for DirectIO and we expect to see even greater gains by doing that. Another example is we have enabled Hadoop Distributed File System (HDFS) with O_DIRECT through native calls and also measure significant performance gains running a cloud workload. It would be really important to have O_DIRECT supported by Java so that all Java based applications can take advantage of it.
We agree on the other hand, O_DIRECT has certain limitations. That is why it is normally recommended to be used as a performance option. Our purpose is to enable it inside Java and provide the application writers a uniform and secure way to use it. All the limitations apply to native applications as well.
1. Regarding to alignment restriction, I cross checked with our kernel experts, and we think it is safe to align the memory buffer used by DirectIO to the kernel page size.
2. In term of having O_DIRECT I/O running concurrently with fork(2) on privately mapped memory buffer, the issue can be solved by creating DirectIO memory buffer using shmat(2) or mmap(2) with the MAP_SHARED flag. Alternatively, MADV_DONTFORK with madvise(2) can be used on the memory buffer to avoid the issue as well. Details can be find http://man7.org/linux/man-pages/man2/open.2.html
3. Combination of DirectIO and BufferedIO on the same file is not recommended for performance reason. However, we think application writers should be familiar with all of these before using it.
Now, we are planning to do the following changes to the existing patch, hope it will be structured a little better this way :-)
1. Create a function to allocate aligned DirectByteBuffer and use it for Direct I/O (default is to use DirectByteBuffer as well, but not aligned). The buffer will be aligned to page boundary. There are some existing code in Direct-X-Buffer.java.template for VM.isDirectMemoryPageAligned. We will follow this as an example. We think this will address the "extra copy" issue in the last version of the patch.
2. Move all the changes to FileDispatcherImpl.c to Java level.
3. Remove the changes to FileDescriptor and do the DirectIO check inside FileChannelImpl.
>From: Alan Burlison [mailto:Alan.Burlison at oracle.com]
>Sent: Thursday, October 13, 2016 5:20 AM
>To: Brian Burkhalter <brian.burkhalter at oracle.com>; Lu, Yingqi
><yingqi.lu at intel.com>
>Cc: Kharbas, Kishor <kishor.kharbas at intel.com>; nio-dev at openjdk.java.net;
>core-libs-dev at openjdk.java.net; Kaczmarek, Eric <eric.kaczmarek at intel.com>
>Subject: Re: Proposal for adding O_DIRECT support into JDK 9
>On 06/10/2016 00:31, Brian Burkhalter wrote:
>> Given that the functionality of O_DIRECT on Linux appears to be
>> supported by other interfaces on OS X, Solaris, and Windows, I wonder
>> whether the patch will need to be refactored in some way to
>> accommodate these other operating systems? For reference it looks as
>> if direct I/O on OS X uses the F_NOCACHE command of fcntl(2) 
>> (although per some online comments this might have some problems),
>> Solaris uses the advice argument of directio(3c) , and Windows uses
>> a combination of flags passed to CreateFile() [3, 4].
>The Linux open(2) manpage contains a long list of warnings about O_DIRECT,
>In Linux alignment restrictions vary by filesystem and kernel version and might be
>absent entirely. However there is currently no filesystem-independent interface
>for an application to discover these restrictions for a given file or filesystem.
>O_DIRECT I/Os should never be run concurrently with the fork(2) system call, if
>the memory buffer is a private mapping (i.e., any mapping created with the
>mmap(2) MAP_PRIVATE flag; this includes memory allocated on the heap and
>statically allocated buffers). Any such I/Os, whether submitted via an
>asynchronous I/O interface or from another thread in the process, should be
>fork(2) is called. Failure to do so can result in data corruption and undefined
>behavior in parent and child processes.
>Applications should avoid mixing O_DIRECT and normal I/O to the same file, and
>especially to overlapping byte regions in the same file.
>Even when the filesystem correctly handles the coherency issues in this situation,
>overall I/O throughput is likely to be slower than using either mode alone.
>Likewise, applications should avoid mixing
>mmap(2) of files with direct I/O to the same files.
> "The thing that has always disturbed me about O_DIRECT is that the whole
>interface is just stupid, and was probably designed by a deranged monkey on
>some serious mind-controlling substances." - Linus
>Adding support for O_DIRECT has a far wider impact than adding just another IO
>handle flag. As such I'm opposed to this change as it seems to be prone to cause
>hard-to-diagnose failures on Linux and it is also specific to just Linux.
More information about the core-libs-dev