Recursive file listing in java
Walking the tree the better way.
When the first version of this post was published, a search on the web for “recursive file java” returned only horrible examples on how to implement directory traversal. I published a version based on anonymous classes to improve on that. More than 10 years later it’s time to re-evaluate.
Using Streams
With File.walk
java has gotten a streaming API that visits the given path in a depth-first manner. The most commonly found example is simple and straight forward.
Files.walk(Paths.get(path))
.filter(Files::isRegularFile)
.forEach(System.out::println);
Unfortunately, a “hello world” example like this hides the shortcomings in real-world usage scenarios. While you can filter the emitted paths of the stream, there is no way to skip directories during traversal. Even worse - there is no way to gracefully handle exceptions that are bound to happen. With Files.find
you can pass in a predicate that might help to reduce redundant retrieval of file attributes - but other than that it suffers from the very same problems. I’d recommend avoiding both API methods because of that.
Using the Visitor
This leaves us with Files.walkFileTree
which accepts a FileVisitor
instead of returning a stream.
FileTraversal visitor = new FileTraversal();
Files.walkFileTree(Paths.get("/"), visitor);
I’ve never been a fan of the formal implementation of the Visitor pattern and even extending the SimpleFileVisitor
only helps to reduce some of the cruft of the interface.
public class FileTraversal extends SimpleFileVisitor<Path> {
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
throws IOException {
return onDirectory(dir, attrs)
? FileVisitResult.CONTINUE
: FileVisitResult.SKIP_SUBTREE;
}
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
throws IOException {
onFile(file, attrs);
return FileVisitResult.CONTINUE;
}
public boolean onDirectory(Path dir, BasicFileAttributes attrs) {
return true;
}
public void onFile(Path file, BasicFileAttributes attrs) {
}
public void traverse( Path dir ) throws IOException {
Files.walkFileTree(dir, this);
}
}
new FileTraversal() {
public boolean onFile(Path dir, BasicFileAttributes attrs) {
System.out.println("dir: " + dir);
return true;
}
public void onFile(Path file, BasicFileAttributes attrs) {
System.out.println("file: " + file);
}
}.traverse(Paths.get("/"));
By basing off the SimpleFileVisitor
it comes quite close to the original version using the old File interface. Except that I find the old code still much easier to read and understand. I do believe, in comparison, the old code has aged gracefully.
public class FileTraversal {
public final void traverse( File f ) throws IOException {
if (f.isDirectory()) {
if (onDirectory(f)) {
File[] childs = f.listFiles();
for( File child : childs ) {
traverse(child);
}
return;
}
}
onFile(f);
}
public boolean onDirectory( File d ) {
return true;
}
public void onFile( File f ) {
}
}
While the anonymous class approach never felt great it was quite effective. Despite being a believer in delegation over inheritance, I could not come up with a concise version that mirrors the clear elegance.
Conclusion
What would should you use today? A fluent API implementation of the FileVisitor
interface in combination with lambdas is probably the most compact version as of today - if you think about the usage and don’t look at the bloated implementaion that is.
new FileTraversal()
.onDir((path, attrs) -> {
return true;
})
.onFile((path, attrs) -> {
System.out.println(path);
})
.traverse(path);
At least in theory the visitor approach should also allow for some easy paralelisation. If you enjoy simplicity I’d argue there is no need to replace the old code - unless you have a good reason to. If performance is a major concern it might be worth to consider the upgrade.