Java has had input/output (I/O) support since the very first version. However, due to Java’s strong desire for platform independence, the earlier versions of I/O functionality emphasized portability over functionality. As a result, they were not always easy to work with.
We’ll see later in the chapter how the original APIs have been supplemented—they are now rich, fully featured, and very easy to develop with. Let’s kick off the chapter by looking at the original, “classic” approach to Java I/O, which the more modern approaches layer on top of.
The File class is the cornerstone of Java’s original way to do
file I/O. This abstraction can represent both files and directories, but
in doing so is sometimes a bit cumbersome to deal with, and leads to
code like this:
// Get a file object to represent the user's home directoryFilehomedir=newFile(System.getProperty("user.home"));// Create an object to represent a config file (should// already be present in the home directory)Filef=newFile(homedir,"app.conf");// Check the file exists, really is a file, and is readableif(f.exists()&&f.isFile()&&f.canRead()){// Create a file object for a new configuration directoryFileconfigdir=newFile(f,".configdir");// And create itconfigdir.mkdir();// Finally, move the config file to its new homef.renameTo(newFile(configdir,".config"));}
This shows some of the flexibility possible with the File class, but
also demonstrates some of the problems with the abstraction. It is very
general, and so requires a lot of methods to interrogate a File
object in order to determine what it actually represents and its
capabilities.
The File class has a very large number of methods on it, but some
basic functionality (notably a way to read the actual contents of a file) is
not, and never has been, provided directly.
Here’s a quick summary of File methods:
// Permissions managementbooleancanX=f.canExecute();booleancanR=f.canRead();booleancanW=f.canWrite();booleanok;ok=f.setReadOnly();ok=f.setExecutable(true);ok=f.setReadable(true);ok=f.setWritable(false);// Different views of the file's nameFileabsF=f.getAbsoluteFile();FilecanF=f.getCanonicalFile();StringabsName=f.getAbsolutePath();StringcanName=f.getCanonicalPath();Stringname=f.getName();StringpName=getParent();URIfileURI=f.toURI();// Create URI for File path// File metadatabooleanexists=f.exists();booleanisAbs=f.isAbsolute();booleanisDir=f.isDirectory();booleanisFile=f.isFile();booleanisHidden=f.isHidden();longmodTime=f.lastModified();// milliseconds since epochbooleanupdateOK=f.setLastModified(updateTime);// millisecondslongfileLen=f.length();// File management operationsbooleanrenamed=f.renameTo(destFile);booleandeleted=f.delete();// Create won't overwrite existing filebooleancreatedOK=f.createNewFile();// Temporary file handlingFiletmp=File.createTempFile("my-tmp",".tmp");tmp.deleteOnExit();// Directory handlingbooleancreatedDir=dir.mkdir();String[]fileNames=dir.list();File[]files=dir.listFiles();
The File class also has a few methods on it that aren’t a perfect fit
for the abstraction. They largely involve interrogating the filesystem
(e.g., inquiring about available free space) that the file resides on:
longfree,total,usable;free=f.getFreeSpace();total=f.getTotalSpace();usable=f.getUsableSpace();File[]roots=File.listRoots();// all available Filesystem roots
The I/O stream abstraction (not to be confused with the streams that are used when dealing with the Java 8 Collection APIs) was present in Java 1.0, as a way of dealing with sequential streams of bytes from disks or other sources.
The core of this API is a pair of abstract classes, InputStream and
OutputStream. These are very widely used, and in fact the “standard”
input and output streams, which are called System.in and
System.out, are streams of this type. They are public, static fields
of the System class, and are often used in even the simplest programs:
System.out.println("Hello World!");
Specific subclasses of streams, including FileInputStream and
FileOutputStream, can be used to operate on individual bytes in a
file—for example, by counting all the times ASCII 97 (small letter a)
occurs in a file:
try(InputStreamis=newFileInputStream("/Users/ben/cluster.txt")){byte[]buf=newbyte[4096];intlen,count=0;while((len=is.read(buf))>0){for(inti=0;i<len;i++)if(buf[i]==97)count++;}System.out.println("'a's seen: "+count);}catch(IOExceptione){e.printStackTrace();}
This approach to dealing with on-disk data can lack some
flexibility—most developers think in terms of characters, not bytes. To
allow for this, the streams are usually combined with the higher-level
Reader and Writer classes, which provide a character-stream level of
interaction, rather than the low-level byte stream provided by
InputStream and OutputStream and their subclasses.
By moving to an abstraction that deals in characters, rather than bytes, developers are presented with an API that is much more familiar, and that hides many of the issues with character encoding, Unicode, and so on.
The Reader and Writer classes are intended to overlay the byte
stream classes, and to remove the need for low-level handling of I/O
streams. They have several subclasses that are often used to layer on
top of each other, such as:
FileReader
BufferedReader
InputStreamReader
FileWriter
PrintWriter
BufferedWriter
To read all lines in from a file and print them out, we use a
BufferedReader layered on top of a FileReader, like this:
try(BufferedReaderin=newBufferedReader(newFileReader(filename))){Stringline;while((line=in.readLine())!=null){System.out.println(line);}}catch(IOExceptione){// Handle FileNotFoundException, etc. here}
If we need to read in lines from the console, rather than a file, we
will usually use an InputStreamReader applied to System.in. Let’s look at an example where we want to read in lines of input from the
console, but treat input lines that start with a special character as
special—commands (“metas”) to be processed, rather than regular text.
This is a common feature of many chat programs, including IRC. We’ll
use regular expressions from Chapter 9 to help
us:
PatternSHELL_META_START=Pattern.compile("^#(\\w+)\\s*(\\w+)?");try(BufferedReaderconsole=newBufferedReader(newInputStreamReader(System.in))){Stringline;READ:while((line=console.readLine())!=null){// Check for special commands ("metas")Matcherm=SHELL_META_START.matcher(line);if(m.find()){StringmetaName=m.group(1);Stringarg=m.group(2);doMeta(metaName,arg);continueREAD;}System.out.println(line);}}catch(IOExceptione){// Handle FileNotFoundException, etc. here}
To output text to a file, we can use code like this:
Filef=newFile(System.getProperty("user.home")+File.separator+".bashrc");try(PrintWriterout=newPrintWriter(newBufferedWriter(newFileWriter(f)))){out.println("## Automatically generated config file. DO NOT EDIT");// ...}catch(IOExceptioniox){// Handle exceptions}
This older style of Java I/O has a lot of other functionality that is
occasionally useful. For example, to deal with text files, the
FilterInputStream class is quite often useful. Or for threads that
want to communicate in a way similar to the classic “piped” I/O
approach, PipedInputStream, PipedReader, and their write
counterparts are provided.
Throughout this chapter so far, we have used the language feature known as “try-with-resources” (TWR). This syntax was briefly introduced in “The try-with-resources Statement”, but it is in conjunction with operations like I/O that it comes into its fullest potential, and it has granted a new lease on life to the older I/O style.
To make the most of Java’s I/O capabilities, it is important to understand how and when to use TWR. It is very easy to understand when code should use TWR—whenever it is possible to do so.
Before TWR, resources had to be closed manually, and complex interactions between resources that could fail to close led to buggy code that could leak resources.
In fact, Oracle’s engineers estimate that 60% of the resource handling code in the initial JDK 6 release was incorrect. So, if even the platform authors can’t reliably get manual resource handling right, then all new code should definitely be using TWR.
The key to TWR is a new interface—AutoCloseable. This interface is a direct superinterface of Closeable.
It marks a resource that must be automatically closed, and for which the compiler will insert special exception-handling code.
Inside a TWR resource clause, only declarations of objects that
implement AutoCloseable objects may appear—but the developer may
declare as many as required:
try(BufferedReaderin=newBufferedReader(newFileReader("profile"));PrintWriterout=newPrintWriter(newBufferedWriter(newFileWriter("profile.bak")))){Stringline;while((line=in.readLine())!=null){out.println(line);}}catch(IOExceptione){// Handle FileNotFoundException, etc. here}
The consequences of this are that resources are automatically scoped to
the try block. The resources (whether readable or writable) are
automatically closed in the correct order, and the compiler inserts
exception handling that takes dependencies between resources into
account.
TWR is related to similar concepts in other languages and environments—for example, RAII in C++.
However, as discussed in the finalization section, TWR is limited to block scope.
This minor limitation is due to the fact that the feature is implemented by the Java source code compiler—it automatically inserts bytecode that calls the resource’s close() method when the scope is exited (by whatever means).
As a result, the overall effect of TWR is more similar to C#’s using keyword, rather than the C++ version of RAII.
For Java developers, the best way to regard TWR is as “finalization done right.”
As noted in “Finalization”, new code should never directly use the finalization mechanism, and should always use TWR instead.
Older code should be refactored to use TWR as soon as is practicable, as it provides real tangible benefits to resource handling code.
Even with the welcome addition of try-with-resources, the File
class and friends have a number of problems that make them less than
ideal for extensive use when performing even standard I/O operations.
For instance:
“Missing methods” for common operations
Does not deal with filenames consistently across platforms
Fails to have a unified model for file attributes (e.g., modeling read/write access)
Difficult to traverse unknown directory structures
No platform- or OS-specific features
Nonblocking operations for filesystems not supported
To deal with these shortcomings, Java’s I/O has evolved over several major releases. It was really with the release of Java 7 that this support became truly easy and effective to use.
Java 7 brought in a brand new I/O API—usually called NIO.2—and it
should be considered almost a complete replacement for the original
File approach to I/O. The new classes are contained in the
java.nio.file package.
The new API that was brought in with Java 7 is considerably easier to
use for many use cases. It has two major parts. The first is a new
abstraction called Path (which can be thought of as representing a
file location, which may or may not have anything actually at that
location). The second piece is lots of new convenience and utility
methods to deal with files and filesystems. These are contained as
static methods in the Files class.
For example, when you are using the new Files functionality, a basic copy
operation is now as simple as:
FileinputFile=newFile("input.txt");try(InputStreamin=newFileInputStream(inputFile)){Files.copy(in,Paths.get("output.txt"));}catch(IOExceptionex){ex.printStackTrace();}
Let’s take a quick survey of some of the major methods in Files—the
operation of most of them is pretty self-explanatory. In many cases, the
methods have return types. We have omitted handling these, as they are
rarely useful except for contrived examples, and for duplicating the
behavior of the equivalent C code:
Pathsource,target;Attributesattr;Charsetcs=StandardCharsets.UTF_8;// Creating files//// Example of path --> /home/ben/.profile// Example of attributes --> rw-rw-rw-Files.createFile(target,attr);// Deleting filesFiles.delete(target);booleandeleted=Files.deleteIfExists(target);// Copying/moving filesFiles.copy(source,target);Files.move(source,target);// Utility methods to retrieve informationlongsize=Files.size(target);FileTimefTime=Files.getLastModifiedTime(target);System.out.println(fTime.to(TimeUnit.SECONDS));Map<String,?>attrs=Files.readAttributes(target,"*");System.out.println(attrs);// Methods to deal with file typesbooleanisDir=Files.isDirectory(target);booleanisSym=Files.isSymbolicLink(target);// Methods to deal with reading and writingList<String>lines=Files.readAllLines(target,cs);byte[]b=Files.readAllBytes(target);BufferedReaderbr=Files.newBufferedReader(target,cs);BufferedWriterbwr=Files.newBufferedWriter(target,cs);InputStreamis=Files.newInputStream(target);OutputStreamos=Files.newOutputStream(target);
Some of the methods on Files provide the opportunity to pass optional
arguments, to provide additional (possibly implementation-specific)
behavior for the operation.
Some of the API choices here produce occasionally annoying behavior. For example, by default, a copy operation will not overwrite an existing file, so we need to specify this behavior as a copy option:
Files.copy(Paths.get("input.txt"),Paths.get("output.txt"),StandardCopyOption.REPLACE_EXISTING);
StandardCopyOption is an enum that implements an interface called
CopyOption. This is also implemented by LinkOption. So
Files.copy() can take any number of either LinkOption or
StandardCopyOption arguments. LinkOption is used to specify how
symbolic links should be handled (provided the underlying OS supports
symlinks, of course).
Path is a type that may be used to locate a file in a filesystem.
It represents a path that is:
System dependent
Hierarchical
Composed of a sequence of path elements
Hypothetical (may not exist yet, or may have been deleted)
It is therefore fundamentally different to a File. In particular, the
system dependency is manifested by Path being an interface, not a
class, which enables different filesystem providers to each implement the
Path interface, and provide for system-specific features while
retaining the overall abstraction.
The elements of a Path consist of an optional root component, which
identifies the filesystem hierarchy that this instance belongs to. Note
that, for example, relative Path instances may not have a root
component. In addition to the root, all Path instances have zero or
more directory names and a name element.
The name element is the element farthest from the root of the directory
hierarchy and represents the name of the file or directory. The Path
can be thought of as consisting of the path elements joined together by a
special separator or delimiter.
Path is an abstract concept; it isn’t necessarily bound to any physical file path.
This allows us to talk easily about the locations of files that don’t exist yet.
Java ships with a Paths class that provides factory methods for creating Path instances.
Paths provides two get() methods for creating Path objects.
The usual version takes a String, and uses the default filesystem provider. The URI version takes advantage of the ability of NIO.2 to plug in additional providers of bespoke filesystems.
This is an advanced usage, and interested developers should consult the primary documentation.
Let’s look at some simple examples of how to use Path:
Pathp=Paths.get("/Users/ben/cluster.txt");Pathp=Paths.get(newURI("file:///Users/ben/cluster.txt"));System.out.println(p2.equals(p));Filef=p.toFile();System.out.println(f.isDirectory());Pathp3=f.toPath();System.out.println(p3.equals(p));
This example also shows the easy interoperation between Path and File objects.
The addition of a toFile() method to Path and a toPath() method to File allows the developer to move effortlessly between the two APIs and allows for a straightforward approach to refactoring the internals of code based on File to use Path instead.
We can also make use of some useful “bridge” methods that the Files
class also provides. These provide convenient access to the older I/O
APIs—for example, by providing convenience methods to open Writer
objects to specified Path locations:
PathlogFile=Paths.get("/tmp/app.log");try(BufferedWriterwriter=Files.newBufferedWriter(logFile,StandardCharsets.UTF_8,StandardOpenOption.WRITE)){writer.write("Hello World!");// ...}catch(IOExceptione){// ...}
We’re making use of the StandardOpenOption enum, which provides
similar capabilities to the copy options, but for the case of opening a
new file instead.
In this example use case, we have used the Path API to:
Create a Path corresponding to a new file
Use the Files class to create that new file
Open a Writer to that file
Write to that file
Automatically close it when done
In our next example, we’ll build on this to manipulate a JAR file as
a FileSystem in its own right, modifying it to add an additional file
directly into the JAR.
Recall that JAR files are actually just ZIP files, so this technique will also work for .zip archives:
PathtempJar=Paths.get("sample.jar");try(FileSystemworkingFS=FileSystems.newFileSystem(tempJar,null)){PathpathForFile=workingFS.getPath("/hello.txt");List<String>ls=newArrayList<>();ls.add("Hello World!");Files.write(pathForFile,ls,Charset.defaultCharset(),StandardOpenOption.WRITE,StandardOpenOption.CREATE);}
This shows how we use a FileSystem to make the Path objects inside
it, via the getPath() method. This enables the developer to
effectively treat FileSystem objects as black boxes.
Files also provides methods for handling temporary files and directories, which is a surprisingly common use case (and can be a source of security bugs).
For example, let’s see how to load a resources file from within the classpath, copy it to a newly created temporary directory, and then clean up the temporary files safely (using the Reaper class we introduced in Chapter 5):
Pathtmpdir=Files.createTempDirectory(Paths.get("/tmp"),"tmp-test");try(InputStreamin=FilesExample.class.getResourceAsStream("/res.txt")){Pathcopied=tmpdir.resolve("copied-resource.txt");Files.copy(in,copied,StandardCopyOption.REPLACE_EXISTING);// ... work with the copy}// Clean up when done...Files.walkFileTree(tmpdir,newReaper());
One of the criticisms of Java’s original I/O APIs was the lack of support for native and high-performance I/O. A solution was initially added in Java 1.4, the Java New I/O (NIO) API, and it has been refined in later Java versions.
NIO buffers are a low-level abstraction for high-performance I/O.
They provide a container for a linear sequence of elements of a
specific primitive type. We’ll work with the ByteBuffer (the most
common case) in our examples.
This is a sequence of bytes, and can conceptually be thought of as a
performance-critical alternative to working with a byte[]. To get
the best possible performance, ByteBuffer provides support for dealing
directly with the native capabilities of the platform the JVM is running
on.
This approach is called the direct buffers case, and it bypasses the Java heap wherever possible. Direct buffers are allocated in native memory, not on the standard Java heap, and they are not subject to garbage collection in the same way as regular on-heap Java objects.
To obtain a direct ByteBuffer, call the allocateDirect() factory
method. An on-heap version, allocate(), is also provided, but in
practice this is not often used.
A third way to obtain a byte buffer is to wrap an existing byte[]—this
will give an on-heap buffer that serves to provide a more
object-oriented view of the underlying bytes:
ByteBufferb=ByteBuffer.allocateDirect(65536);ByteBufferb2=ByteBuffer.allocate(4096);byte[]data={1,2,3};ByteBufferb3=ByteBuffer.wrap(data);
Byte buffers are all about low-level access to the bytes. This means that developers have to deal with the details manually—including the need to handle the endianness of the bytes and the signed nature of Java’s integral primitives:
b.order(ByteOrder.BIG_ENDIAN);intcapacity=b.capacity();intposition=b.position();intlimit=b.limit();intremaining=b.remaining();booleanmore=b.hasRemaining();
To get data in or out of a buffer, we have two types of
operation—single value, which reads or writes a single value, and bulk,
which takes a byte[] or ByteBuffer and operates on a (potentially
large) number of values as a single operation. It is from the bulk
operations that we’d expect to realize performance gains:
b.put((byte)42);b.putChar('x');b.putInt(0xcafebabe);b.put(data);b.put(b2);doubled=b.getDouble();b.get(data,0,data.length);
The single value form also supports a form used for absolute positioning within the buffer:
b.put(0,(byte)9);
Buffers are an in-memory abstraction. To affect the outside world (e.g.,
the file or network), we need to use a Channel, from the package
java.nio.channels. Channels represent connections to entities that
can support read or write operations. Files and sockets are the usual
examples of channels, but we could consider custom implementations used
for low-latency data processing.
Channels are open when they’re created, and can subsequently be closed. Once closed, they cannot be reopened. Channels are usually either readable or writable, but not both. The key to understanding channels is that:
Reading from a channel puts bytes into a buffer
Writing to a channel takes bytes from a buffer
For example, suppose we have a large file that we want to checksum in 16M chunks:
FileInputStreamfis=getSomeStream();booleanfileOK=true;try(FileChannelfchan=fis.getChannel()){ByteBufferbuffy=ByteBuffer.allocateDirect(16*1024*1024);while(fchan.read(buffy)!=-1||buffy.position()>0||fileOK){fileOK=computeChecksum(buffy);buffy.compact();}}catch(IOExceptione){System.out.println("Exception in I/O");}
This will use native I/O as far as possible, and will avoid a lot of
copying of bytes on and off the Java heap. If the computeChecksum()
method has been well implemented, then this could be a very performant
implementation.
These are a type of direct byte buffer that contain a memory-mapped
file (or a region of one). They are created from a FileChannel
object, but note that the File object corresponding to the
MappedByteBuffer must not be used after the memory-mapped operations,
or an exception will be thrown. To mitigate this, we again use
try-with-resources, to scope the objects tightly:
try(RandomAccessFileraf=newRandomAccessFile(newFile("input.txt"),"rw");FileChannelfc=raf.getChannel();){MappedByteBuffermbf=fc.map(FileChannel.MapMode.READ_WRITE,0,fc.size());byte[]b=newbyte[(int)fc.size()];mbf.get(b,0,b.length);for(inti=0;i<fc.size();i++){b[i]=0;// Won't be written back to the file, we're a copy}mbf.position(0);mbf.put(b);// Zeros the file}
Even with buffers, there are limitations of what can be done in Java for large I/O operations (e.g., transferring 10G between filesystems) that perform synchronously on a single thread. Before Java 7, these types of operations would typically be done by writing custom multithreaded code, and managing a separate thread for performing a background copy. Let’s move on to look at the new asynchronous I/O features that were added with JDK 7.
The key to the new asynchronous functionality are some new subclasses
of Channel that can deal with I/O operations that need to be handed
off to a background thread. The same functionality can be applied to
large, long-running operations, and to several other use cases.
In this section, we’ll deal exclusively with AsynchronousFileChannel
for file I/O, but there are a couple of other asynchronous channels to
be aware of. We’ll deal with asynchronous sockets at the end of the
chapter. We’ll look at:
AsynchronousFileChannel for file I/O
AsynchronousSocketChannel for client socket I/O
AsynchronousServerSocketChannel for asynchronous sockets that
accept incoming connections
There are two different ways to interact with an asynchronous
channel—Future style and callback style.
We’ll meet the Future interface in detail in
Chapter 11, but for the purpose of this
chapter, it can be thought of as an ongoing task that may or may not
have completed yet. It has two key methods:
isDone()get()Returns the result. If finished, returns immediately. If not finished, blocks until done.
Let’s look at an example of a program that reads a large file (possibly as large as 100 Mb) asynchronously:
try(AsynchronousFileChannelchannel=AsynchronousFileChannel.open(Paths.get("input.txt"))){ByteBufferbuffer=ByteBuffer.allocateDirect(1024*1024*100);Future<Integer>result=channel.read(buffer,0);while(!result.isDone()){// Do some other useful work....}System.out.println("Bytes read: "+result.get());}
The callback style for asynchronous I/O is based on a
CompletionHandler, which defines two methods, completed() and
failed(), that will be called back when the operation either succeeds
or fails.
This style is useful if you want immediate notification of events in asynchronous I/O—for example, if there are a large number of I/O operations in flight, but failure of any single operation is not necessarily fatal:
byte[]data={2,3,5,7,11,13,17,19,23};ByteBufferbuffy=ByteBuffer.wrap(data);CompletionHandler<Integer,Object>h=newCompletionHandler(){publicvoidcompleted(Integerwritten,Objecto){System.out.println("Bytes written: "+written);}publicvoidfailed(Throwablex,Objecto){System.out.println("Asynch write failed: "+x.getMessage());}};try(AsynchronousFileChannelchannel=AsynchronousFileChannel.open(Paths.get("primes.txt"),StandardOpenOption.CREATE,StandardOpenOption.WRITE)){channel.write(buffy,0,null,h);Thread.sleep(1000);// Needed so we don't exit too quickly}
The AsynchronousFileChannel object is associated with a background
thread pool, so that the I/O operation proceeds, while the original
thread can get on with other tasks.
By default, this uses a managed thread pool that is provided by the
runtime. If required, it can be created to use a thread pool that is
managed by the application (via an overloaded form of
AsynchronousFileChannel.open()), but this is not often necessary.
Finally, for completeness, let’s touch upon NIO’s support for
multiplexed I/O. This enables a single thread to manage multiple
channels and to examine those channels to see which are ready for
reading or writing. The classes to support this are in the
java.nio.channels package and include SelectableChannel and
Selector.
These nonblocking multiplexed techniques can be extremely useful when you’re writing advanced applications that require high scalability, but a full discussion is outside the scope of this book. In general, the nonblocking API should only be used for advanced use cases when high performance or other NFRs are genuinely required.
The last class of asynchronous services we will consider are those that watch a directory or visit a directory (or a tree). The watch services operate by observing everything that happens within a directory—for example, the creation or modification of files:
try{WatchServicewatcher=FileSystems.getDefault().newWatchService();Pathdir=FileSystems.getDefault().getPath("/home/ben");WatchKeykey=dir.register(watcher,StandardWatchEventKinds.ENTRY_CREATE,StandardWatchEventKinds.ENTRY_MODIFY,StandardWatchEventKinds.ENTRY_DELETE);while(!shutdown){key=watcher.take();for(WatchEvent<?>event:key.pollEvents()){Objecto=event.context();if(oinstanceofPath){System.out.println("Path altered: "+o);}}key.reset();}}
By contrast, the directory streams provide a view into all files currently in a single directory. For example, to list all the Java source files and their size in bytes, we can use code like:
try(DirectoryStream<Path>stream=Files.newDirectoryStream(Paths.get("/opt/projects"),"*.java")){for(Pathp:stream){System.out.println(p+": "+Files.size(p));}}
One drawback of this API is that this will only return elements that
match according to glob syntax, which is sometimes insufficiently
flexible. We can go further by using the new Files.find() and
Files.walk() methods to address each element obtained by a recursive
walk through the directory:
finalPatternisJava=Pattern.compile(".*\\.java$");finalPathhomeDir=Paths.get("/Users/ben/projects/");Files.find(homeDir,255,(p,attrs)->isJava.matcher(p.toString()).find()).forEach(q->{System.out.println(q.normalize());});
It is possible to go even further, and construct advanced solutions
based on the FileVisitor interface in java.nio.file, but that
requires the developer to implement all four methods on the interface,
rather than just using a single lambda expression as done here.
In the last section of this chapter, we will discuss Java’s networking support and the core JDK classes that enable it.
The Java platform provides access to a large number of standard networking protocols, and these make writing simple networked applications quite easy.
The core of Java’s network support lives in the package java.net, with additional extensibility provided by javax.net (and in particular, javax.net.ssl).
One of the easiest protocols to use for building applications is HyperText Transmission Protocol (HTTP), the protocol that is used as the basic communication protocol of the Web.
HTTP is the highest-level network protocol that Java supports out of the box. It is a very simple, text-based protocol, implemented on top of the standard TCP/IP stack. It can run on any network port, but is usually found on port 80.
Java has two separate APIs for handling HTTP—one of which dates back to the earliest days of the platform, and the other of which is a more modern API that arrived in incubator form in Java 9.
Let’s take a quick look at the older API, for the sake of completeness.
In this API URL is the key class—it supports URLs of the form http://,
ftp://, file://, and https:// out of the box. It is very easy to
use, and the simplest example of Java HTTP support is to download a
particular URL. With Java 8, this is just:
URLurl=newURL("http://www.google.com/");try(InputStreamin=url.openStream()){Files.copy(in,Paths.get("output.txt"));}catch(IOExceptionex){ex.printStackTrace();}
For more low-level control, including metadata about the request and
response, we can use URLConnection to give us more control, and
achieve something like:
try{URLConnectionconn=url.openConnection();Stringtype=conn.getContentType();Stringencoding=conn.getContentEncoding();DatelastModified=newDate(conn.getLastModified());intlen=conn.getContentLength();InputStreamin=conn.getInputStream();}catch(IOExceptione){// Handle exception}
HTTP defines “request methods,” which are the operations that a client can make on a remote resource. These methods are called GET, POST, HEAD, PUT, DELETE, OPTIONS, and TRACE.
Each has slightly different usages, for example:
GET should only be used to retrieve a document and never should perform any side effects.
HEAD is equivalent to GET except the body is not returned—useful if a program wants to quickly check whether a URL has changed.
POST is used when we want to send data to a server for processing.
By default, Java always uses GET, but it does provide a way to use other methods for building more complex applications; however, doing so is a bit involved. In this next example, we’re using the search function provided by the BBC website to search for news articles about Java:
varurl=newURL("http://www.bbc.co.uk/search");varencodedData=URLEncoder.encode("q=java","ASCII");varcontentType="application/x-www-form-urlencoded";HttpURLConnectionconn=(HttpURLConnection)url.openConnection();conn.setInstanceFollowRedirects(false);conn.setRequestMethod("POST");conn.setRequestProperty("Content-Type",contentType);conn.setRequestProperty("Content-Length",String.valueOf(encodedData.length()));conn.setDoOutput(true);OutputStreamos=conn.getOutputStream();os.write(encodedData.getBytes());intresponse=conn.getResponseCode();if(response==HttpURLConnection.HTTP_MOVED_PERM||response==HttpURLConnection.HTTP_MOVED_TEMP){System.out.println("Moved to: "+conn.getHeaderField("Location"));}else{try(InputStreamin=conn.getInputStream()){Files.copy(in,Paths.get("bbc.txt"),StandardCopyOption.REPLACE_EXISTING);}}
Notice that we needed to send our query parameters in the body of a
request, and to encode them before sending. We also had to disable
following of HTTP redirects, and to treat any redirection from the
server manually. This is due to a limitation of the HttpURLConnection
class, which does not deal well with redirection of POST requests.
The older API definitely shows its age, and in fact only implements version 1.0 of the HTTP standard, which is very inefficient and considered archaic. As an alternative, modern Java programs can use the new API, which was added as a result of Java needing to support the new HTTP/2 protocol.
It was added as an incubator module in Java 9, but has been made into a fully supported module, java.net.http, in Java 11.
Let’s see a simple example of using the new API:
importstaticjava.net.http.HttpResponse.BodyHandlers.ofString;varclient=HttpClient.newBuilder().build();varuri=newURI("https://www.oreilly.com");varrequest=HttpRequest.newBuilder(uri).build();varresponse=client.send(request,ofString(Charset.defaultCharset()));varbody=response.body();System.out.println(body);
Note that this API is designed to be extensible, with interfaces such as HttpResponse.BodySubscriber being available to be implemented for custom handling.
The interface also seamlessly hides the differences between HTTP/2 and the older HTTP/1.1 protocol, meaning that Java applications will be able to migrate gracefully as web servers adopt the new version.
Let’s move on to look at the next layer down the networking stack, the Transmission Control Protocol (TCP).
TCP is the basis of reliable network transport over the internet. It ensures that web pages and other internet traffic are delivered in a complete and comprehensible state. From a networking theory standpoint, the protocol properties that allow TCP to function as this “reliability layer” for internet traffic are:
Data belongs to a single logical stream (a connection).
Data packets will be resent until they arrive.
Damage caused by network transit will be detected and fixed automatically.
TCP is a two-way (or bidirectional) communication channel, and uses a special numbering scheme (TCP sequence numbers) for data chunks to ensure that both sides of a communication stream stay in sync. In order to support many different services on the same network host, TCP uses port numbers to identify services, and ensures that traffic intended for one port does not go to a different one.
In Java, TCP is represented by the classes Socket and
ServerSocket. They are used to provide the capability to be the client
and server side of the connection, respectively—meaning that Java can be
used both to connect to network services and as a language for
implementing new services.
As an example, let’s consider reimplementing HTTP. This is a relatively simple, text-based protocol. We’ll need to implement both sides of the connection, so let’s start with an HTTP client on top of a TCP socket. To accomplish this, we will actually need to implement the details of the HTTP protocol, but we do have the advantage that we have complete control over the TCP socket.
We will need to both read and write from the client socket, and we’ll construct the actual request line in accordance with the HTTP standard (which is known as RFC 2616, and uses explicit line-ending syntax). The resulting code will look something like this:
Stringhostname="www.example.com";intport=80;Stringfilename="/index.html";try(Socketsock=newSocket(hostname,port);BufferedReaderfrom=newBufferedReader(newInputStreamReader(sock.getInputStream()));PrintWriterto=newPrintWriter(newOutputStreamWriter(sock.getOutputStream()));){// The HTTP protocolto.("GET "+filename+" HTTP/1.1\r\nHost: "+hostname+"\r\n\r\n");to.flush();for(Stringl=null;(l=from.readLine())!=null;)System.out.println(l);}
On the server side, we’ll need to receive possibly multiple incoming
connections. To handle this, we’ll need to kick off a main server loop,
then use accept() to take a new connection from the operating system.
The new connection then will need to be quickly passed to a separate
handler class, so that the main server loop can get back to listening
for new connections. The code for this is a bit more involved than the
client case:
// Handler classprivatestaticclassHttpHandlerimplementsRunnable{privatefinalSocketsock;HttpHandler(Socketclient){this.sock=client;}publicvoidrun(){try(BufferedReaderin=newBufferedReader(newInputStreamReader(sock.getInputStream()));PrintWriterout=newPrintWriter(newOutputStreamWriter(sock.getOutputStream()));){out.("HTTP/1.0 200\r\nContent-Type: text/plain\r\n\r\n");Stringline;while((line=in.readLine())!=null){if(line.length()==0)break;out.println(line);}}catch(Exceptione){// Handle exception}}}// Main server looppublicstaticvoidmain(String[]args){try{intport=Integer.parseInt(args[0]);ServerSocketss=newServerSocket(port);for(;;){Socketclient=ss.accept();HTTPHandlerhndlr=newHTTPHandler(client);newThread(hndlr).start();}}catch(Exceptione){// Handle exception}}
When designing a protocol for applications to communicate over TCP, there’s a simple and profound network architecture principle, known as Postel’s Law (after Jon Postel, one of the fathers of the internet) that you should always keep in mind. It is sometimes stated as follows: “Be strict about what you send, and liberal about what you will accept.” This simple principle means that communication can remain broadly possible in a network system, even in the event of quite imperfect implementations.
Postel’s Law, when combined with the general principle that the protocol should be as simple as possible (sometimes called the KISS principle), will make the developer’s job of implementing TCP-based communication much easier than it otherwise would be.
Below TCP is the internet’s general-purpose haulage protocol—the Internet Protocol (IP) itself.
IP is the “lowest common denominator” transport, and provides a useful abstraction over the physical network technologies that are used to actually move bytes from A to B.
Unlike TCP, delivery of an IP packet is not guaranteed, and a packet can be dropped by any overloaded system along the path. IP packets do have a destination, but usually no routing data—it’s the responsibility of the (possibly many different) physical transports along the route to actually deliver the data.
It is possible to create “datagram” services in Java that are based
around single IP packets (or those with a UDP header, instead of TCP),
but this is not often required except for extremely low-latency
applications. Java uses the class DatagramSocket to implement this
functionality, although few developers should ever need to venture this
far down the network stack.
Finally, it’s worth noting some changes that are currently in-flight in the addressing schemes that are used across the internet. The current dominant version of IP in use is IPv4, which has a 32-bit space of possible network addresses. This space is now very badly squeezed, and various mitigation techniques have been deployed to handle the depletion.
The next version of IP (IPv6) is being rolled out, but it is not fully accepted and has yet to displace IPv4, although steady progress toward it becoming the standard continues. In the next 10 years, IPv6 is likely to overtake IPv4 in terms of traffic volume, and low-level networking will need to adapt to this radically new version. However, for Java programmers, the good news is that the language and platform have been working for many years on good support for IPv6 and the changes that it introduces. The transition between IPv4 and IPv6 is likely to be much smoother and less problematic for Java applications than in many other languages.