Appendix 2
The CAP Theorem

Understanding the five database genres is important when deciding on which database to use in a particular case, but there’s one other major thing that you should always bear in mind. A recurring theme in this book has been the CAP theorem, which lays bare an unsettling truth about how distributed database systems behave in the face of network instability.

CAP proves that you can create a distributed database that can have one or more of the following qualities: It can be consistent (writes are atomic and all subsequent requests retrieve the new value), available (the database will always return a value as long as a single server is running), and/or partition tolerant (the system will still function even if server communication is temporarily lost—that is, a network partition). But the catch is that any given system can be at most two of these things at once, and never all three.

In other words, you can create a distributed database system that is consistent and partition tolerant (a “CP” system), a system that is available and partition tolerant (an “AP” system), or a system that is consistent and available (the much more rare CA system that is not partition tolerant—which basically means not distributed). Or a system can have only one of those qualities (this book doesn’t cover any databases like that, and you’re unlikely to encounter such a database in wide use). But at the end of the day it simply isn’t possible to create a distributed database that is consistent and available and partition tolerant at the same time, and anyone who says that they’ve “solved CAP” is saying that they’ve defied the laws of physics and thus should not be trusted.

The CAP theorem is pertinent when considering a distributed database because it forces you to decide what you are willing to give up. The database you choose will lose either availability or consistency. Partition tolerance is strictly an architectural decision (depending on whether you want a distributed database). It’s important to understand the CAP theorem to fully grok your options. The trade-offs made by the database implementations in this book are largely influenced by it.

Imagine the world as a giant distributed database system. All of the land in the world contains information about certain topics, and as long as you’re somewhere near people or technology, you can find an answer to your questions.

Now, for the sake of argument, imagine you are a passionate Beyoncé fan and the date is September 5, 2016. Suddenly, while at your friend’s beach house party celebrating the release of Beyoncé’s hot new studio album, a freak tidal wave sweeps across the dock and drags you out to sea. You fashion a makeshift raft and wash up on a desert island days later. Without any means of communication, you are effectively partitioned from the rest of the system (the world). There you wait for five long years...

One morning in 2021, you are awakened by shouts from the sea. A salty old schooner captain has discovered you! After five years alone, the captain leans over the bow and bellows: “How many studio albums does Beyoncé have?”

You now have a decision to make. You can answer the question with the most recent value you have (which is now five years old). If you answer his query, you are available. Or, you can decline to answer the question, knowing that because you are partitioned, your answer may not be consistent with the current state of the world. The captain won’t get his answer, but the state of the world remains consistent (if he sails back home, he can get the correct answer). In your role of queried node, you can either help keep the world’s data consistent or be available, but not both.

Previous Chapter

A1. Database Overview Tables

Next Chapter

Eventual Consistency

Table of Contents for Seven Databases in Seven Weeks, 2nd Edition

Appendix 2The CAP Theorem

Table of Contents for
Seven Databases in Seven Weeks, 2nd Edition

Appendix 2
The CAP Theorem