Index

aborts (transactions), Transactions, Atomicity
- in two-phase commit, Introduction to two-phase commit
- performance of optimistic concurrency control, Performance of serializable snapshot isolation
- retrying aborted transactions, Handling errors and aborts
abstraction, Simplicity: Managing Complexity, Data Models and Query Languages, Transactions, Summary, Consistency and Consensus
access path (in network model), The network model, The SPARQL query language
accidental complexity, removing, Simplicity: Managing Complexity
accountability, Responsibility and accountability
ACID properties (transactions), Transaction Processing or Analytics?, The Meaning of ACID
- atomicity, Atomicity, Single-Object and Multi-Object Operations
- consistency, Consistency, Maintaining integrity in the face of software bugs
- durability, Durability
- isolation, Isolation, Single-Object and Multi-Object Operations
acknowledgements (messaging), Acknowledgments and redelivery
active/active replication (see multi-leader replication)
active/passive replication (see leader-based replication)
ActiveMQ (messaging), Message brokers, Message brokers compared to databases
- distributed transaction support, XA transactions
ActiveRecord (object-relational mapper), The Object-Relational Mismatch, Handling errors and aborts
actor model, Distributed actor frameworks
- (see also message-passing)
- comparison to Pregel model, The Pregel processing model
- comparison to stream processing, Message passing and RPC
Advanced Message Queuing Protocol (see AMQP)
aerospace systems, Reliability, Human Errors, Byzantine Faults, Membership services
aggregation
- data cubes and materialized views, Aggregation: Data Cubes and Materialized Views
- in batch processes, GROUP BY
- in stream processes, Stream analytics
aggregation pipeline query language, MapReduce Querying
Agile, Evolvability: Making Change Easy
- minimizing irreversibility, Philosophy of batch process outputs, Reprocessing data for application evolution
- moving faster with confidence, The end-to-end argument again
- Unix philosophy, The Unix Philosophy
agreement, Fault-Tolerant Consensus
- (see also consensus)
Airflow (workflow scheduler), MapReduce workflows
Ajax, Dataflow Through Services: REST and RPC
Akka (actor framework), Distributed actor frameworks
algorithms
- algorithm correctness, Correctness of an algorithm
- B-trees, B-Trees-B-tree optimizations
- for distributed systems, System Model and Reality
- hash indexes, Hash Indexes-Hash Indexes
- mergesort, SSTables and LSM-Trees, Distributed execution of MapReduce, Sort-merge joins
- red-black trees, Constructing and maintaining SSTables
- SSTables and LSM-trees, SSTables and LSM-Trees-Performance optimizations
all-to-all replication topologies, Multi-Leader Replication Topologies
AllegroGraph (database), Graph-Like Data Models
ALTER TABLE statement (SQL), Schema flexibility in the document model, Encoding and Evolution
Amazon
- Dynamo (database), Leaderless Replication
Amazon Web Services (AWS), Hardware Faults
- Kinesis Streams (messaging), Using logs for message storage
- network reliability, Network Faults in Practice
- postmortems, Software Errors
- RedShift (database), The divergence between OLTP databases and data warehouses
- S3 (object storage), MapReduce and Distributed Filesystems
  - checking data integrity, Don’t just blindly trust what they promise
amplification
- of bias, Bias and discrimination
- of failures, Limitations of distributed transactions, Maintaining derived state
- of tail latency, Describing Performance, Partitioning Secondary Indexes by Document
- write amplification, Advantages of LSM-trees
AMQP (Advanced Message Queuing Protocol), Message brokers compared to databases
- (see also messaging systems)
- comparison to log-based messaging, Logs compared to traditional messaging, Replaying old messages
- message ordering, Acknowledgments and redelivery
analytics, Transaction Processing or Analytics?
- comparison to transaction processing, Transaction Processing or Analytics?
- data warehousing (see data warehousing)
- parallel query execution in MPP databases, Comparing Hadoop to Distributed Databases
- predictive (see predictive analytics)
- relation to batch processing, The Output of Batch Workflows
- schemas for, Stars and Snowflakes: Schemas for Analytics-Stars and Snowflakes: Schemas for Analytics
- snapshot isolation for queries, Snapshot Isolation and Repeatable Read
- stream analytics, Stream analytics
- using MapReduce, analysis of user activity events (example), Example: analysis of user activity events
anti-caching (in-memory databases), Keeping everything in memory
anti-entropy, Read repair and anti-entropy
Apache ActiveMQ (see ActiveMQ)
Apache Avro (see Avro)
Apache Beam (see Beam)
Apache BookKeeper (see BookKeeper)
Apache Cassandra (see Cassandra)
Apache CouchDB (see CouchDB)
Apache Curator (see Curator)
Apache Drill (see Drill)
Apache Flink (see Flink)
Apache Giraph (see Giraph)
Apache Hadoop (see Hadoop)
Apache HAWQ (see HAWQ)
Apache HBase (see HBase)
Apache Helix (see Helix)
Apache Hive (see Hive)
Apache Impala (see Impala)
Apache Jena (see Jena)
Apache Kafka (see Kafka)
Apache Lucene (see Lucene)
Apache MADlib (see MADlib)
Apache Mahout (see Mahout)
Apache Oozie (see Oozie)
Apache Parquet (see Parquet)
Apache Qpid (see Qpid)
Apache Samza (see Samza)
Apache Solr (see Solr)
Apache Spark (see Spark)
Apache Storm (see Storm)
Apache Tajo (see Tajo)
Apache Tez (see Tez)
Apache Thrift (see Thrift)
Apache ZooKeeper (see ZooKeeper)
Apama (stream analytics), Complex event processing
append-only B-trees, B-tree optimizations, Indexes and snapshot isolation
append-only files (see logs)
Application Programming Interfaces (APIs), Thinking About Data Systems, Data Models and Query Languages
- for batch processing, MapReduce workflows
- for change streams, API support for change streams
- for distributed transactions, XA transactions
- for graph processing, The Pregel processing model
- for services, Dataflow Through Services: REST and RPC-Data encoding and evolution for RPC
  - (see also services)
  - evolvability, Data encoding and evolution for RPC
  - RESTful, Web services
  - SOAP, Web services
application state (see state)
approximate search (see similarity search)
archival storage, data from databases, Archival storage
arcs (see edges)
arithmetic mean, Describing Performance
ASCII text, Thrift and Protocol Buffers, A uniform interface
ASN.1 (schema language), The Merits of Schemas
asynchronous networks, Unreliable Networks, Glossary
- comparison to synchronous networks, Synchronous Versus Asynchronous Networks
- formal model, System Model and Reality
asynchronous replication, Synchronous Versus Asynchronous Replication, Glossary
- conflict detection, Synchronous versus asynchronous conflict detection
- data loss on failover, Leader failure: Failover
- reads from asynchronous follower, Problems with Replication Lag
Asynchronous Transfer Mode (ATM), Can we not simply make network delays predictable?
atomic broadcast (see total order broadcast)
atomic clocks (caesium clocks), Clock readings have a confidence interval, Synchronized clocks for global snapshots
- (see also clocks)
atomicity (concurrency), Glossary
- atomic increment-and-get, Implementing total order broadcast using linearizable storage
- compare-and-set, Compare-and-set, What Makes a System Linearizable?
  - (see also compare-and-set operations)
- replicated operations, Conflict resolution and replication
- write operations, Atomic write operations
atomicity (transactions), Atomicity, Single-Object and Multi-Object Operations, Glossary
- atomic commit, Distributed Transactions and Consensus
  - avoiding, Multi-partition request processing, Coordination-avoiding data systems
  - blocking and nonblocking, Three-phase commit
  - in stream processing, Exactly-once message processing, Atomic commit revisited
  - maintaining derived data, Keeping Systems in Sync
- for multi-object transactions, Single-Object and Multi-Object Operations
- for single-object writes, Single-object writes
auditability, Trust, but Verify-Tools for auditable data systems
- designing for, Designing for auditability
- self-auditing systems, A culture of verification
- through immutability, Advantages of immutable events
- tools for auditable data systems, Tools for auditable data systems
availability, Hardware Faults
- (see also fault tolerance)
- in CAP theorem, The CAP theorem
- in service level agreements (SLAs), Describing Performance
Avro (data format), Avro-Code generation and dynamically typed languages
- code generation, Code generation and dynamically typed languages
- dynamically generated schemas, Dynamically generated schemas
- object container files, But what is the writer’s schema?, Archival storage, Philosophy of batch process outputs
- reader determining writer’s schema, But what is the writer’s schema?
- schema evolution, The writer’s schema and the reader’s schema
- use in Hadoop, Philosophy of batch process outputs
awk (Unix tool), Simple Log Analysis
AWS (see Amazon Web Services)
Azure (see Microsoft)

nanomsg (messaging library), Direct messaging from producers to consumers
Narayana (transaction coordinator), Introduction to two-phase commit
NATS (messaging), Message brokers
near-real-time (nearline) processing, Batch Processing
- (see also stream processing)
Neo4j (database)
- Cypher query language, The Cypher Query Language
- graph data model, Graph-Like Data Models
Nephele (dataflow engine), Dataflow engines
netcat (Unix tool), Separation of logic and wiring
Netflix Chaos Monkey, Reliability, Network Faults in Practice
Network Attached Storage (NAS), Distributed Data, MapReduce and Distributed Filesystems
network model, The network model
- graph databases versus, The SPARQL query language
- imperative query APIs, Declarative Queries on the Web
Network Time Protocol (see NTP)
networks
- congestion and queueing, Network congestion and queueing
- datacenter network topologies, Cloud Computing and Supercomputing
- faults (see faults)
- linearizability and network delays, Linearizability and network delays
- network partitions, Network Faults in Practice, The CAP theorem
- timeouts and unbounded delays, Timeouts and Unbounded Delays
next-key locking, Index-range locks
nodes (in graphs) (see vertices)
nodes (processes), Glossary
- handling outages in leader-based replication, Handling Node Outages
- system models for failure, System Model and Reality
noisy neighbors, Network congestion and queueing
nonblocking atomic commit, Three-phase commit
nondeterministic operations
- accidental nondeterminism, Fault tolerance
- partial failures in distributed systems, Faults and Partial Failures
nonfunctional requirements, Summary
nonrepeatable reads, Snapshot Isolation and Repeatable Read
- (see also read skew)
normalization (data representation), Many-to-One and Many-to-Many Relationships, Glossary
- executing joins, Which data model leads to simpler application code?, Convergence of document and relational databases, Reduce-Side Joins and Grouping
- foreign key references, The need for multi-object transactions
- in systems of record, Derived Data
- versus denormalization, Deriving several views from the same event log
NoSQL, The Birth of NoSQL, Unbundling Databases
- transactions and, The Slippery Concept of a Transaction
Notation3 (N3), Triple-Stores and SPARQL
npm (package manager), The move toward declarative query languages
NTP (Network Time Protocol), Unreliable Clocks
- accuracy, Clock Synchronization and Accuracy, Timestamps for ordering events
- adjustments to monotonic clocks, Monotonic clocks
- multiple server addresses, Weak forms of lying
numbers, in XML and JSON encodings, JSON, XML, and Binary Variants

Table of Contents for Designing Data-Intensive Applications

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

Table of Contents for
Designing Data-Intensive Applications