Chapter 7. File I/O

The nonblocking (or “new”) input/output package, referred to as NIO, was added in J2SE 1.4.1 The NIO.2 extension, added in Java 7, brought in new classes for manipulating files and directories. The additions included the java.nio.file package, which is the subject of this chapter. Several of the new classes in that package, like java.nio.files.File, have been enhanced in Java 8 with methods that use streams.

Unfortunately, here is where the stream metaphor from functional programming conflicts with the same term from input/output, leading to potential confusion. For example, the java.nio.file.DirectoryStream interface has nothing to do with functional streams. It is implemented by classes that iterate over a directory tree using the traditional for-each construct.2

This chapter focuses on capabilities in I/O that support functional streams. In Java 8, several methods were added to the java.nio.file.Files class to support functional streams. Those methods are shown in Table 7-1. Note that all the methods in the Files class are static.

Table 7-1. Methods in java.nio.files.Files that return streams
Method Return type

lines

Stream<String>

list

Stream<Path>

walk

Stream<Path>

find

Stream<Path>

The recipes in this chapter deal with each of these methods.

7.1 Process Files

Problem

You want to process the contents of a text file using streams.

Solution

Use the static lines method in either java.io.BufferedReader or java.nio​.file.Files to return the contents of a file as a stream.

Discussion

All FreeBSD-based Unix systems (including macOS) include a version of Webster’s Second International Dictionary in the /usr/share/dict/ folder. The file web2 includes approximately 230,000 words. Each word appears on its own line.

Say you wanted to find the 10 longest words in that dictionary. You can use the Files.lines method to retrieve the words as a stream of strings, and then do normal stream processing like map and filter. An example is shown in Example 7-1.

Example 7-1. Finding the 10 longest words in the web2 dictionary
try (Stream<String> lines = Files.lines(Paths.get("/usr/share/dict/web2")) {
    lines.filter(s -> s.length() > 20)
         .sorted(Comparator.comparingInt(String::length).reversed())
         .limit(10)
         .forEach(w -> System.out.printf("%s (%d)%n", w, w.length()));
} catch (IOException e) {
    e.printStackTrace();
}

The predicate in the filter passes only words longer than 20 characters. The sorted method then sorts the words by length in descending order. The limit method terminates after the first 10 words, which are then printed. By opening the stream in a try-with-resources block, the system will automatically close it, and the dictionary file, when the try block completes.

The results from executing the code in Example 7-1 is shown in Example 7-2.

Example 7-2. Longest words in dictionary
formaldehydesulphoxylate (24)
pathologicopsychological (24)
scientificophilosophical (24)
tetraiodophenolphthalein (24)
thyroparathyroidectomize (24)
anthropomorphologically (23)
blepharosphincterectomy (23)
epididymodeferentectomy (23)
formaldehydesulphoxylic (23)
gastroenteroanastomosis (23)

There are five words in the dictionary that are 24 characters in length. The results show them in alphabetical order, only because the original file was in alphabetical order. If you add a thenComparing clause to the Comparator argument to sorted, you can choose how you want the equal-length words to be sorted.

Following the list of 24-character words are five 23-character words, many of which are from the medical field.3

By applying Collectors.counting as a downstream collector, you can determine how many words of each length exist in the dictionary, as shown in Example 7-3.

Example 7-3. Determining number of words of each length
try (Stream<String> lines = Files.lines(Paths.get("/usr/share/dict/web2"))) {
    lines.filter(s -> s.length() > 20)
         .collect(Collectors.groupingBy(String::length, Collectors.counting()))
         .forEach((len, num) -> System.out.println(len + ": " + num));
}

This snippet used the groupingBy collector to create a Map where the keys are the word lengths and values are the number of words of each length. The result is:

21: 82
22: 41
23: 17
24: 5

The output has the information, but isn’t terribly informative. It’s also sorted in ascending order, which may not be what you want.

As an alternative, the Map.Entry interface now has static methods comparingByKey and comparingByValue, each of which also takes an optional Comparator, as discussed in Recipe 4.4. In this case, sorting by the reverseOrder comparator gives the reverse of the natural order. See Example 7-4.

Example 7-4. Number of words of each length, in descending order
try (Stream<String> lines = Files.lines(Paths.get("/usr/share/dict/web2"))) {
    Map<Integer, Long> map = lines.filter(s -> s.length() > 20)
        .collect(Collectors.groupingBy(String::length, Collectors.counting()));

    map.entrySet().stream()
       .sorted(Map.Entry.comparingByKey(Comparator.reverseOrder()))
       .forEach(e -> System.out.printf("Length %d: %d words%n",
            e.getKey(), e.getValue()));
}

The result now is:

Length 24: 5 words
Length 23: 17 words
Length 22: 41 words
Length 21: 82 words

If your source of data is not a File, the BufferedReader class also has a lines method, though in this case it is an instance method. The equivalent version of Example 7-4 using BufferedReader is shown in Example 7-5.

Example 7-5. Using BufferedReader.lines method
try (Stream<String> lines =
        new BufferedReader(
            new FileReader("/usr/share/dict/words")).lines()) {

    // ... same as previous example ...
}

Again, since Stream implements AutoCloseable, when the try-with-resources block closes the stream, it will then close the underlying BufferedReader.

See Also

Sorting maps is discussed in Recipe 4.4.

7.2 Retrieving Files as a Stream

Problem

You want to process all the files in a directory as a Stream.

Solution

Use the static Files.list method.

Discussion

The static list method in the java.nio.file.Files class takes a Path as an argument and returns a Stream that wraps a DirectoryStream.4 The DirectoryStream interface extends AutoCloseable, so using the list method is best done using a try-with-resources construct, as in Example 7-6.

Example 7-6. Using Files.list(path)
try (Stream<Path> list = Files.list(Paths.get("src/main/java"))) {
    list.forEach(System.out::println);
} catch (IOException e) {
    e.printStackTrace();
}

Assuming this is executed in the root of a project that has the standard Maven or Gradle structure, this will print the names of all files and folders in the src/main/java directory. Using the try-with-resources block means that when the try block completes, the system will invoke close on the stream, which will then invoke close on the underlying DirectoryStream. The listing is not recursive.

When run on the source code for this book, the result includes both directories and individual files:

src/main/java/collectors
src/main/java/concurrency
src/main/java/datetime
...
src/main/java/Summarizing.java
src/main/java/tasks
src/main/java/UseFilenameFilter.java

The signature for the list method shows that the return type is a Stream<Path> and its argument a directory:

public static Stream<Path> list(Path dir) throws IOException

Executing the method on a non-directory resource results in a NotDirectoryException.

The Javadocs make a point of saying that the resulting stream is weakly consistent, meaning “it is thread safe but does not freeze the directory while iterating, so it may (or may not) reflect updates to the directory that occur after returning from this method.”

See Also

To navigate a filesystem using a depth-first search, see Recipe 7.3.

7.3 Walking the Filesystem

Problem

You need to perform a depth-first traversal of the filesystem.

Solution

Use the static Files.walk method.

Discussion

The signature of the static Files.walk method in the java.nio.file package is:

public static Stream<Path> walk(Path start,
                                FileVisitOption... options)
                         throws IOException

The arguments are the starting Path and a variable argument list of FileVisitOption values. The return type is a lazily populated Stream of Path instances obtained by walking the filesystem from the starting path, performing a depth-first traversal.

The returned Stream encapsulates a DirectoryStream, so again it is recommended that you invoke the method using a try-with-resources block, as in Example 7-7.

Example 7-7. Walking the tree
try (Stream<Path> paths = Files.walk(Paths.get("src/main/java"))) {
    paths.forEach(System.out::println);
} catch (IOException e) {
    e.printStackTrace();
}

The walk method takes zero or more FileVisitOption values as the second and subsequent arguments. This example didn’t use any. FileVisitOption is an enum, added in Java 1.7, whose only defined value is FileVisitOption.FOLLOW_LINKS. Following links means that, at least in principle, the tree can involve a cycle, so the stream keeps track of files visited. If a cycle is detected, a FileSystemLoopException is thrown.

The results of this example on the book source code is similar to:

src/main/java
src/main/java/collectors
src/main/java/collectors/Actor.java
src/main/java/collectors/AddCollectionToMap.java
src/main/java/collectors/Book.java
src/main/java/collectors/CollectorsDemo.java
src/main/java/collectors/ImmutableCollections.java
src/main/java/collectors/Movie.java
src/main/java/collectors/MysteryMen.java
src/main/java/concurrency
src/main/java/concurrency/CommonPoolSize.java
src/main/java/concurrency/CompletableFutureDemos.java
src/main/java/concurrency/FutureDemo.java
src/main/java/concurrency/ParallelDemo.java
src/main/java/concurrency/SequentialToParallel.java
src/main/java/concurrency/Timer.java
src/main/java/datetime
...

The paths are traversed lazily. The resulting stream is guaranteed to have at least one element—the starting argument. As each path is encountered, the system determines if it is a directory, at which point it is traversed before moving on to the next sibling. The result is a depth-first traversal. Each directory is closed after all of its entries have been visited.

There is also an overload of this method available:

public static Stream<Path> walk(Path start,
                                int maxDepth,
                                FileVisitOption... options)
                         throws IOException

The maxDepth argument is the maximum number of levels of directories to visit. Zero means only use the starting level. The version of this method without a maxDepth parameter uses a value of Integer.MAX_VALUE, meaning all levels should be visited.

See Also

Listing files in a single directory is shown in Recipe 7.2. Searching for files is done using Recipe 7.4.

7.4 Searching the Filesystem

Problem

You want to find files in a file tree that satisfy given properties.

Solution

Use the static Files.find method in the java.nio.file package.

Discussion

The signature of the Files.find method is:

public static Stream<Path> find(Path start,
                                int maxDepth,
                                BiPredicate<Path, BasicFileAttributes> matcher,
                                FileVisitOption... options)
                         throws IOException

This is similar to the walk method, but with an added BiPredicate to determine whether or not a particular Path should be returned. The find method starts at a given path and performs a depth-first search, up to the maxDepth number of levels, evaluating each path against the BiPredicate, following links if specified as the value of the FileVisitOption enum.

The BiPredicate matcher needs to return a boolean based on each path element, along with its associated BasicFileAttributes object. For instance, Example 7-8 returns the paths for nondirectory files in the fileio package in the book’s source code.

Example 7-8. Finding the nondirectory files in the fileio package
try (Stream<Path> paths =
    Files.find(Paths.get("src/main/java"), Integer.MAX_VALUE,
        (path, attributes) ->
            !attributes.isDirectory() && path.toString().contains("fileio"))) {
    paths.forEach(System.out::println);
} catch (IOException e) {
    e.printStackTrace();
}

The result is:

src/main/java/fileio/FileList.java
src/main/java/fileio/ProcessDictionary.java
src/main/java/fileio/SearchForFiles.java
src/main/java/fileio/WalkTheTree.java

For each file encountered while walking the tree, the method evaluates it against the given BiPredicate. This is just like calling the walk method with a filter, but the Javadocs claim this approach may be more efficient by avoiding redundant retrieval of the BasicFileAttributes objects.

As usual, the resulting Stream encapsulates a DirectoryStream, so closing the stream closes the underlying source. Using the method in a try-with-resources block, as shown, is therefore the preferred approach.

See Also

Walking the filesystem is discussed in Recipe 7.3.

1 Most Java developers are astonished to learn that NIO was added that early.

2 Even more confusing, the interface DirectoryStream.Filter is actually a functional interface, though again it has nothing to do with functional streams. It’s used to approve only selected entries in a directory tree.

3 Fortunately, the word blepharosphincterectomy doesn’t mean what it sounds like. It has to do with relieving pressure of the eyelid on the cornea, which is bad enough, but it could have been worse.

4 That’s an I/O stream, not a functional one.