Recursive file listing in java

Walking the tree the better way.

| |

When the first version of this post was published, a search on the web for “recursive file java” returned only horrible examples on how to implement directory traversal. I published a version based on anonymous classes to improve on that. More than 10 years later it’s time to re-evaluate.

Using Streams

With File.walk java has gotten a streaming API that visits the given path in a depth-first manner. The most commonly found example is simple and straight forward.

Files.walk(Paths.get(path))
  .filter(Files::isRegularFile)
  .forEach(System.out::println);

Unfortunately, a “hello world” example like this hides the shortcomings in real-world usage scenarios. While you can filter the emitted paths of the stream, there is no way to skip directories during traversal. Even worse - there is no way to gracefully handle exceptions that are bound to happen. With Files.find you can pass in a predicate that might help to reduce redundant retrieval of file attributes - but other than that it suffers from the very same problems. I’d recommend avoiding both API methods because of that.

Using the Visitor

This leaves us with Files.walkFileTree which accepts a FileVisitor instead of returning a stream.

FileTraversal visitor = new FileTraversal();
Files.walkFileTree(Paths.get("/"), visitor);

I’ve never been a fan of the formal implementation of the Visitor pattern and I would use a base class FileTraversal to get rid of some of the cruft of the interface. But even with that the minimal code becomes quite long.

public class FileTraversal extends SimpleFileVisitor<Path> {

  public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
    throws IOException {
     return onDirectory(dir, attrs)
     ? FileVisitResult.CONTINUE
     : FileVisitResult.SKIP_SUBTREE;
  }

  public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
    throws IOException {
      onFile(file, attrs);
      return FileVisitResult.CONTINUE;
  }

  public boolean onDirectory(Path dir, BasicFileAttributes attrs) {
    return true;
  }

  public void onFile(Path file, BasicFileAttributes attrs) {
  }

  public void traverse( Path dir ) throws IOException {
    Files.walkFileTree(dir, this);
  }

}

new FileTraversal() {
  public boolean onFile(Path dir, BasicFileAttributes attrs) {
    System.out.println("dir: " + dir);
    return true;
  }
  public void onFile(Path file, BasicFileAttributes attrs) {
    System.out.println("file: " + file);
  }
}.traverse(Paths.get("/"));

If you just extend the SimpleFileVisitor it comes quite close to the original version using the old File interface. Except that I find the old code still much easier to read and understand. I do believe this code has aged gracefully.

public class FileTraversal {
  public final void traverse( File f ) throws IOException {
    if (f.isDirectory()) {
      if (onDirectory(f)) {
        File[] childs = f.listFiles();
        for( File child : childs ) {
          traverse(child);
        }
        return;
      }
    }
    onFile(f);
  }

  public boolean onDirectory( File d ) {
    return true;
  }

  public void onFile( File f ) {
  }
}

While the anonymous class approach never felt great it was quite effective. Despite being a believer in delegation over inheritance, even now with lambdas, I could not come up with a concise version that mirrors the elegance. This mostly stems from the way lambdas are implemented in java though.

new FileTraversal()
  .onDir((path, attrs) -> true)
  .onFile((path, attrs) -> System.out.println(path))
  .traverse(path);

With a fluent API, it would be less verbose but bloat the visitor implementation. I am not sure how much better this is when you transition to multi-line lambdas though.

Conclusion

What would should you use today? To be completely honest - work on a vistor class that works for your use case. Base it off the SimpleFileVisitor. At least in theory the visitor would also allow for some paralelisation. If you hide away the code you could even use lambdas. But if you don’t have a reason - no need to replace the old code.