From martinrb at google.com Sun Feb 1 13:36:58 2009 From: martinrb at google.com (Martin Buchholz) Date: Sun, 1 Feb 2009 13:36:58 -0800 Subject: What's up with jdk/test/TestEnv.java? Message-ID: <1ccfd1c10902011336g32f8d9f1rde7f95e9fcac4581@mail.gmail.com> I've been trying to get the jtreg tests currently failing with UnknownHostException to start passing instead. I looked at jdk/test/TestEnv.java which looks perfectly designed for our needs. But it's not in openjdk7...!?!? (normally I would expect to see things the other way around) In openjdk7 I just see the hostnames hardcoded into jdk/test/java/nio/channels/TestUtil.java Do y'all plan to forward-port jdk/test/TestEnv.java to openjdk7? Thanks, Martin From Alan.Bateman at Sun.COM Mon Feb 2 01:37:00 2009 From: Alan.Bateman at Sun.COM (Alan Bateman) Date: Mon, 02 Feb 2009 09:37:00 +0000 Subject: What's up with jdk/test/TestEnv.java? In-Reply-To: <1ccfd1c10902011336g32f8d9f1rde7f95e9fcac4581@mail.gmail.com> References: <1ccfd1c10902011336g32f8d9f1rde7f95e9fcac4581@mail.gmail.com> Message-ID: <4986BEBC.7060108@sun.com> Martin Buchholz wrote: > I've been trying to get the jtreg tests > currently failing with UnknownHostException > to start passing instead. > > I looked at jdk/test/TestEnv.java > which looks perfectly designed for our needs. > > But it's not in openjdk7...!?!? > (normally I would expect to see things the other way around) > In openjdk7 I just see the hostnames hardcoded into > jdk/test/java/nio/channels/TestUtil.java > > Do y'all plan to forward-port jdk/test/TestEnv.java to openjdk7? > > Thanks, > > Martin > TestEnv was my attempt to centralize the host dependencies so that they could be specified to jtreg as a properties file (and make it a bit easier to run them outside of Sun's network). Joe was anxious we do something about this in 6open so that is why it was pushed there first (back in 6open-b12). It's on my list to push this to the jdk7/tl repository. -Alan. From Joe.Darcy at Sun.COM Mon Feb 2 18:10:10 2009 From: Joe.Darcy at Sun.COM (Joe Darcy) Date: Mon, 02 Feb 2009 18:10:10 -0800 Subject: What's up with jdk/test/TestEnv.java? In-Reply-To: <4986BEBC.7060108@sun.com> References: <1ccfd1c10902011336g32f8d9f1rde7f95e9fcac4581@mail.gmail.com> <4986BEBC.7060108@sun.com> Message-ID: <4987A782.3080801@sun.com> On 02/02/09 01:37 AM, Alan Bateman wrote: > Martin Buchholz wrote: >> I've been trying to get the jtreg tests >> currently failing with UnknownHostException >> to start passing instead. >> >> I looked at jdk/test/TestEnv.java >> which looks perfectly designed for our needs. >> >> But it's not in openjdk7...!?!? >> (normally I would expect to see things the other way around) >> In openjdk7 I just see the hostnames hardcoded into >> jdk/test/java/nio/channels/TestUtil.java >> >> Do y'all plan to forward-port jdk/test/TestEnv.java to openjdk7? >> >> Thanks, >> >> Martin >> > TestEnv was my attempt to centralize the host dependencies so that > they could be specified to jtreg as a properties file (and make it a > bit easier to run them outside of Sun's network). Joe was anxious we > do something about this in 6open so that is why it was pushed there > first (back in 6open-b12). It's on my list to push this to the jdk7/tl > repository. For the curious, this is bug 6720349 "(ch) Channels tests depending on hosts inside Sun." -Joe From Alan.Bateman at Sun.COM Fri Feb 6 12:02:23 2009 From: Alan.Bateman at Sun.COM (Alan Bateman) Date: Fri, 06 Feb 2009 20:02:23 +0000 Subject: nio2 build 99 In-Reply-To: <492D6097.9080801@sun.com> References: <492D6097.9080801@sun.com> Message-ID: <498C974F.2000207@sun.com> We have a new build available. It is mostly bug fixes and a few minor improvements. It is synchronized with jdk7 b45. Binaries, javadoc, summary of changes are in the usual place [1]. -Alan. [1] http://openjdk.java.net/projects/nio/ From i30817 at gmail.com Mon Feb 16 13:16:40 2009 From: i30817 at gmail.com (Paulo Levi) Date: Mon, 16 Feb 2009 21:16:40 +0000 Subject: ListFiles performance vs file system visitor Message-ID: <212322090902161316i11a9e27esdc939f1c9ada4548@mail.gmail.com> Hi. I saw recently that the nio2 was going to have a new visitor like implementation for filesystem walking and i am wondering if is faster than doing a recursive list files by hand. I'm my application this is the main bottleneck. Also besides this question, i hope you don't think it too cheeky for me to post my function here for advice. Windows takes about 6 seconds find the files (in the explorer given a * regex) , java 49 seconds, This on a usb port. On a usb 2 port times become 3s windows and 21 seconds java. public static void getFiles(int levels, File[] sum, List files, List directories) { int dirIndex = directories.size(); for (File f : sum) { if (f.isDirectory()) { directories.add(f); } else { files.add(f); } } if (levels > 0) { int dirLen = directories.size(); for (; dirIndex < dirLen; dirIndex++) { File current = directories.get(dirIndex); File[] children = listFiles(current); if (children == null) { logMissingDirectory(current); } else { getFiles(levels - 1, children, files, directories); } } } } private static final Comparator comp = Collections.reverseOrder(Strings.getNaturalComparator()); private static File[] listFiles(File p) { String[] ss = p.list(); if (ss == null) { return null; } Arrays.sort(ss, comp); int n = ss.length; File[] fs = new File[n]; for (int i = 0; i < n; i++) { //private non-validating constructor not accessible (private) fs[i] = new File(p, ss[i]); } return fs; } From Alan.Bateman at Sun.COM Tue Feb 17 07:27:50 2009 From: Alan.Bateman at Sun.COM (Alan Bateman) Date: Tue, 17 Feb 2009 15:27:50 +0000 Subject: ListFiles performance vs file system visitor In-Reply-To: <212322090902161316i11a9e27esdc939f1c9ada4548@mail.gmail.com> References: <212322090902161316i11a9e27esdc939f1c9ada4548@mail.gmail.com> Message-ID: <499AD776.7080501@sun.com> Paulo Levi wrote: > Hi. I saw recently that the nio2 was going to have a new visitor like > implementation for filesystem walking Yes, the method you want is Files#walkFileTree. It's not really possible to do recursive operations effectively with java.io.File today, esp. when there are symbolic links. > and i am wondering if is faster > than doing a recursive list files by hand. I'm my application this is > the main bottleneck. > Probably not because a hand-rolled file tree walker is going to do the same things that walkFileTree does. That said, there is one optimization that we can do on Windows to partly workaround the performance issues that are often reported on FAT32 and SMB. > Also besides this question, i hope you don't think it too cheeky for > me to post my function here for advice. Windows takes about 6 seconds > find the files (in the explorer given a * regex) , java 49 seconds, > This on a usb port. On a usb 2 port times become 3s windows and 21 > seconds java. > That is a significant difference and probably explained by usage of isDirectory to check if each file is a directory. If I understand correctly, then the following code fragment is probably close to what you want: static List findMySource(Path start) { final List results = new ArrayList(); final PathMatcher matcher = start.getFileSystem().getPathMatcher("glob:*.java"); Files.walkFileTree(start, new SimpleFileVisitor() { public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) { if (matcher.matches(file.getName())) results.add(file); return FileVisitResult.CONTINUE; } }); return results; } -Alan. From i30817 at gmail.com Tue Feb 17 10:01:38 2009 From: i30817 at gmail.com (Paulo Levi) Date: Tue, 17 Feb 2009 18:01:38 +0000 Subject: ListFiles performance vs file system visitor In-Reply-To: <499AD776.7080501@sun.com> References: <212322090902161316i11a9e27esdc939f1c9ada4548@mail.gmail.com> <499AD776.7080501@sun.com> Message-ID: <212322090902171001k5712b6e9rdcf16dcfeddfa10d@mail.gmail.com> How about adding a iterator like method (to remove the array creation)? Also if wasn't for the isDirectory i could make a lazy File extension/delegate (no normalization - if the private constructor was protected that would work too), at least for the files that aren't directories (that are in common usage the most of them). Any alternative to is directory? Nio2 can't be used as a seperate jar since it introduces new native api right? On Tue, Feb 17, 2009 at 3:27 PM, Alan Bateman wrote: > Paulo Levi wrote: >> >> Hi. I saw recently that the nio2 was going to have a new visitor like >> implementation for filesystem walking > > Yes, the method you want is Files#walkFileTree. It's not really possible to > do recursive operations effectively with java.io.File today, esp. when there > are symbolic links. > >> and i am wondering if is faster >> than doing a recursive list files by hand. I'm my application this is >> the main bottleneck. >> > > Probably not because a hand-rolled file tree walker is going to do the same > things that walkFileTree does. That said, there is one optimization that we > can do on Windows to partly workaround the performance issues that are often > reported on FAT32 and SMB. > >> Also besides this question, i hope you don't think it too cheeky for >> me to post my function here for advice. Windows takes about 6 seconds >> find the files (in the explorer given a * regex) , java 49 seconds, >> This on a usb port. On a usb 2 port times become 3s windows and 21 >> seconds java. >> > > That is a significant difference and probably explained by usage of > isDirectory to check if each file is a directory. If I understand correctly, > then the following code fragment is probably close to what you want: > > static List findMySource(Path start) { > final List results = new ArrayList(); > final PathMatcher matcher = > start.getFileSystem().getPathMatcher("glob:*.java"); > Files.walkFileTree(start, new SimpleFileVisitor() { > public FileVisitResult visitFile(Path file, BasicFileAttributes > attrs) { > if (matcher.matches(file.getName())) > results.add(file); > return FileVisitResult.CONTINUE; > } > }); > return results; > } > > -Alan. > From Alan.Bateman at Sun.COM Tue Feb 17 12:55:14 2009 From: Alan.Bateman at Sun.COM (Alan Bateman) Date: Tue, 17 Feb 2009 20:55:14 +0000 Subject: ListFiles performance vs file system visitor In-Reply-To: <212322090902171001k5712b6e9rdcf16dcfeddfa10d@mail.gmail.com> References: <212322090902161316i11a9e27esdc939f1c9ada4548@mail.gmail.com> <499AD776.7080501@sun.com> <212322090902171001k5712b6e9rdcf16dcfeddfa10d@mail.gmail.com> Message-ID: <499B2432.6080207@sun.com> Paulo Levi wrote: > How about adding a iterator like method (to remove the array creation)? > Yes, the new API provides an iterator. > Also if wasn't for the isDirectory i could make a lazy File > extension/delegate (no normalization - if the private constructor was > protected that would work too), at least for the files that aren't > directories (that are in common usage the most of them). Any > alternative to is directory? > I don't completely understand what you are saying here but you can't tell the non-directories by the names and so you will have to continue to use isDirectory with your solution. > Nio2 can't be used as a seperate jar since it introduces new native api right? > It's not a standalone API, sorry! -Alan. From i30817 at gmail.com Tue Feb 17 20:27:14 2009 From: i30817 at gmail.com (Paulo Levi) Date: Wed, 18 Feb 2009 04:27:14 +0000 Subject: ListFiles performance vs file system visitor In-Reply-To: <499B2432.6080207@sun.com> References: <212322090902161316i11a9e27esdc939f1c9ada4548@mail.gmail.com> <499AD776.7080501@sun.com> <212322090902171001k5712b6e9rdcf16dcfeddfa10d@mail.gmail.com> <499B2432.6080207@sun.com> Message-ID: <212322090902172027m283c006cka13547ec58b9e2a2@mail.gmail.com> >> Also if wasn't for the isDirectory i could make a lazy File >> extension/delegate (no normalization - if the private constructor was >> protected that would work too), at least for the files that aren't >> directories (that are in common usage the most of them). Any >> alternative to is directory? >> > > I don't completely understand what you are saying here but you can't tell > the non-directories by the names and so you will have to continue to use > isDirectory with your solution. I mean that the current File class has a private constructor File(String child, File parent) that doesn't do file normalization. If isDirectory() doesn't require filename normalization i would have the opportunity to normalize lazily the first time the files were used, since File is not final. But if the problem is not listFiles() it is obviously isdirectory(). If i call listFiles on a normal file it should return null right? I wonder if that is faster than the isDirectory call itself... no reason to but doesn't hurt to check. Also isDirectory() performance fixes? Make File(String child, File parent) protected? Pleeeeease? Also i just noticed. Disregard the sort there, the times i told where before i added that. From Alan.Bateman at Sun.COM Wed Feb 18 03:24:13 2009 From: Alan.Bateman at Sun.COM (Alan Bateman) Date: Wed, 18 Feb 2009 11:24:13 +0000 Subject: ListFiles performance vs file system visitor In-Reply-To: <212322090902172027m283c006cka13547ec58b9e2a2@mail.gmail.com> References: <212322090902161316i11a9e27esdc939f1c9ada4548@mail.gmail.com> <499AD776.7080501@sun.com> <212322090902171001k5712b6e9rdcf16dcfeddfa10d@mail.gmail.com> <499B2432.6080207@sun.com> <212322090902172027m283c006cka13547ec58b9e2a2@mail.gmail.com> Message-ID: <499BEFDD.8060007@sun.com> Paulo Levi wrote: > : > I mean that the current File class has a private constructor > File(String child, File parent) that doesn't do file normalization. > > If isDirectory() doesn't require filename normalization i would have > the opportunity to normalize lazily the first time the files were > used, since File is not final. > > But if the problem is not listFiles() it is obviously isdirectory(). > > If i call listFiles on a normal file it should return null right? I > wonder if that is faster than the isDirectory call itself... no reason > to but doesn't hurt to check. > > Also isDirectory() performance fixes? Make File(String child, File > parent) protected? > > Pleeeeease? > > Also i just noticed. Disregard the sort there, the times i told where > before i added that. > Sorry, I don't understand what you proposing here. The File class was never designed to be extended and should have been a final class. I did a quick test with a FAT32 drive (5400rpm, <13ms seek) connected to a Windows XP machine via USB (2.0, not 1.1 as I couldn't find anything that old). The drive has 28,000 files spread over 1850 directories and the test simply searches the entire disk for ".java" files. Using Files.walkFileTree my build takes ~11.9s for the cold-start case (drive just connected, nothing in the file cache) and 320-340ms for second and subsequent runs. Using listFiles + isDirectory (like the code fragment you posted) it takes 15.0-15.4s for the cold-start case and 2890-2920ms for second and subsequent runs. Using only listFiles (ie: assume every file is a directory and check for null as you suggest) then the test takes 15.0-15.4s for the cold-start case (same as previous) and 2845-2860ms for second and subsequent runs (marginally better than previous). Your initial mail was comparing the performance against Windows search. I'm assuming this is the Search tool that be started from the Start menu - is that right? If so, then I tried it and the wall-clock time was 26-28s for the cold start case. This is very different to the 6s you mentioned so perhaps you might be doing something else or pilot error on my part. -Alan. From i30817 at gmail.com Wed Feb 18 11:15:46 2009 From: i30817 at gmail.com (Paulo Levi) Date: Wed, 18 Feb 2009 19:15:46 +0000 Subject: ListFiles performance vs file system visitor In-Reply-To: <499BEFDD.8060007@sun.com> References: <212322090902161316i11a9e27esdc939f1c9ada4548@mail.gmail.com> <499AD776.7080501@sun.com> <212322090902171001k5712b6e9rdcf16dcfeddfa10d@mail.gmail.com> <499B2432.6080207@sun.com> <212322090902172027m283c006cka13547ec58b9e2a2@mail.gmail.com> <499BEFDD.8060007@sun.com> Message-ID: <212322090902181115k3b011a2exf456db57c3a478d8@mail.gmail.com> I managed massive improvement by refactoring the code somewhat in the older 1.1 usb machine with a usb drive, and it cut the java time from 41 to 25 - fairly awesome, and in another recent dual core machine, i can no longer tell the difference. Testing 2,306 files and 417 folders i just got a massive win by removing the null check and just using list files: public static void getFiles(int levels, File[] sum, List files, List directories){ Comparator orderFiles = Strings.getNaturalComparator(); getFiles(levels, sum, files, directories, orderFiles); } private static void getFiles(int levels, File[] sum, List files, List directories, Comparator comp) { int dirIndex = directories.size(); List subFilesList = new ArrayList(50); for (File f : sum) { String [] subFiles = f.list(); if (subFiles == null) { files.add(f); } else { directories.add(f); Arrays.sort(subFiles, comp); subFilesList.add(subFiles); } } if (levels > 0) { for (int dirLen = directories.size(), subCounter = 0; dirIndex < dirLen; dirIndex++, subCounter++) { File current = directories.get(dirIndex); String [] childs = subFilesList.get(subCounter); File[] children = new File[childs.length]; createFiles(current, childs, children); getFiles(levels - 1, children, files, directories, comp); } } } private static void createFiles(File parent, String[] childStrings, File [] childsOut) { for (int i = 0; i < childStrings.length; i++) { childsOut[i] = new File(parent, childStrings[i]); } } This is my junit test: @Test public void testGetFiles() { List files = new ArrayList(); List directories = new ArrayList(); File [] arr = {new File("e:\\\\LargeDir")}; long time = System.currentTimeMillis(); IoUtils.getFiles(5, arr, files, directories); System.out.println("Time indexing : "+ (System.currentTimeMillis()-time) ); } Output : Time indexing : 2859 Don't know how to record the windows time, but its about the same 2-4s I guess the bottle neck is in usb 1.1 ... Also strange that this listfiles is faster ... i would think it is doing more work - being executed both for files and directories, and saving the results on a list (to break the time complexity up) versus just using isDirectory in all files and using listFiles only on the directory. Ok maybe not so strange now i write it out.