NIO performance

Alan Bateman Alan.Bateman at Sun.COM
Sat Sep 5 07:14:35 PDT 2009

John Hendrikx wrote:
> I noticed that using a FileVisitor to iterate over a single directory 
> is far faster than doing a DirectoryStream + readBasicFileAttributes 
> combination.
> Results for 9000 file directory:
> Simple DirectoryStream iteration = 300 ms
> DirectoryStream iteration + readBasicFileAttributes for each entry = 
> 9000 ms
> FileVisitor which skips all subdirectories (but does return 
> BasicFileAttributes for each entry) = 480 ms
> Using a FileVisitor to iterate over a single directory seems somewhat 
> clumsy so I looked at the implementation to see if there was a better 
> way, but I found that it is basically cheating (Path seems to be an 
> instance of BasicFileAttributesHolder which obviously is a lot faster 
> than doing your own Attributes.readBasicFileAttributes(path) call).
> I guess what I'm saying is that I didn't really expect that -- I would 
> have expected that for reading a single directory (+ attributes) there 
> would be a simple way to do it like DirectoryStream currently 
> provides.  Currently, I think that many would fall for the trap of 
> iterating over a DirectoryStream and calling readBasicFileAttributes 
> on each entry which is very slow.  Of course now that I figured this 
> out it is no real problem to just wrap a FileVisitor in my own class 
> to read a single directory.
> I hope this feedback is useful.
Files.walkFileTree is essentially an internal iterator built on an 
external iterator (DirectoryStream). So for the maxDepth == 1 case then 
it is reasonable to expect the performance to be  the same as using 
DirectoryStream to iterate over all entries in the directory, calling 
Attributes.readBasicFileAttributes to read the attributes of each file. 
The anomaly you are seeing is a Windows only anomaly. Elsewhere (on 
Solaris and Linux at least) the performance will be as you would expect. 
For example, I did a quick test on Solaris with a directory of 9000 
files and the simple iteration took 22ms, the iteration + reading the 
attributes took 88ms, and walkFileTree with maxDepth==1 took 83ms. On 
Windows, the anomaly (or why is Files.walkFileTree so much faster) is 
because the attributes are obtained during the directory traversal so 
the implementation can avoid re-reading them - if it re-read the 
attributes for each file then it would take about the same time as 
calling Attributes.readBasicFileAttributes for each file in the 
directory, an operation that is expensive on Windows. One thing to say 
is that difference isn't as obvious with NTFS - for example, I repeated 
your test with a directory of 9000 files and the simple iteration took 
21ms, the iteration + reading the attributes took 237ms, and 
Files.walkFileTree took 20ms. With FAT32 or when the volume is remote 
then the difference is very obvious - I'll guess this is what you are 
testing on.

This issue does bring up the question as to if we need a method that 
returns a DirectoryStream where the elements are a pair consisting of 
the entry and its attributes. It's come up once or twice. From a 
performance point of view it helps Windows, and maybe some custom file 
systems. The other potential justification is convenience in that the 
basic attributes will often be required when iterating over a directory. 
It's worth looking at.


More information about the nio-dev mailing list