Table of Contents for
Cassandra: The Definitive Guide, 2nd Edition
Close
Version ebook
/
Retour
Cassandra: The Definitive Guide, 2nd Edition
by Eben Hewitt
Published by O'Reilly Media, Inc., 2016
Cover
nav
Cassandra: The Definitive Guide
Cassandra: The Definitive Guide
Dedication
Foreword
Foreword
Preface
1. Beyond Relational Databases
2. Introducing Cassandra
3. Installing Cassandra
4. The Cassandra Query Language
5. Data Modeling
6. The Cassandra Architecture
7. Configuring Cassandra
8. Clients
9. Reading and Writing Data
10. Monitoring
11. Maintenance
12. Performance Tuning
13. Security
14. Deploying and Integrating
Index
About the Authors
Colophon
Foreword
Foreword
Preface
Why Apache Cassandra?
Is This Book for You?
What’s in This Book?
New for the Second Edition
Conventions Used in This Book
Using Code Examples
O’Reilly Safari
How to Contact Us
Acknowledgments
1. Beyond Relational Databases
What’s Wrong with Relational Databases?
A Quick Review of Relational Databases
RDBMSs: The Awesome and the Not-So-Much
Web Scale
The Rise of NoSQL
Summary
2. Introducing Cassandra
The Cassandra Elevator Pitch
Cassandra in 50 Words or Less
Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tuneable Consistency
Brewer’s CAP Theorem
Row-Oriented
High Performance
Where Did Cassandra Come From?
Release History
Is Cassandra a Good Fit for My Project?
Large Deployments
Lots of Writes, Statistics, and Analysis
Geographical Distribution
Evolving Applications
Getting Involved
Summary
3. Installing Cassandra
Installing the Apache Distribution
Extracting the Download
What’s In There?
Building from Source
Additional Build Targets
Running Cassandra
On Windows
On Linux
Starting the Server
Stopping Cassandra
Other Cassandra Distributions
Running the CQL Shell
Basic cqlsh Commands
cqlsh Help
Describing the Environment in cqlsh
Creating a Keyspace and Table in cqlsh
Writing and Reading Data in cqlsh
Summary
4. The Cassandra Query Language
The Relational Data Model
Cassandra’s Data Model
Clusters
Keyspaces
Tables
Columns
CQL Types
Numeric Data Types
Textual Data Types
Time and Identity Data Types
Other Simple Data Types
Collections
User-Defined Types
Secondary Indexes
Summary
5. Data Modeling
Conceptual Data Modeling
RDBMS Design
Design Differences Between RDBMS and Cassandra
Defining Application Queries
Logical Data Modeling
Hotel Logical Data Model
Reservation Logical Data Model
Physical Data Modeling
Hotel Physical Data Model
Reservation Physical Data Model
Materialized Views
Evaluating and Refining
Calculating Partition Size
Calculating Size on Disk
Breaking Up Large Partitions
Defining Database Schema
DataStax DevCenter
Summary
6. The Cassandra Architecture
Data Centers and Racks
Gossip and Failure Detection
Snitches
Rings and Tokens
Virtual Nodes
Partitioners
Replication Strategies
Consistency Levels
Queries and Coordinator Nodes
Memtables, SSTables, and Commit Logs
Caching
Hinted Handoff
Lightweight Transactions and Paxos
Tombstones
Bloom Filters
Compaction
Anti-Entropy, Repair, and Merkle Trees
Staged Event-Driven Architecture (SEDA)
Managers and Services
Cassandra Daemon
Storage Engine
Storage Service
Storage Proxy
Messaging Service
Stream Manager
CQL Native Transport Server
System Keyspaces
Summary
7. Configuring Cassandra
Cassandra Cluster Manager
Creating a Cluster
Seed Nodes
Partitioners
Murmur3 Partitioner
Random Partitioner
Order-Preserving Partitioner
ByteOrderedPartitioner
Snitches
Simple Snitch
Property File Snitch
Gossiping Property File Snitch
Rack Inferring Snitch
Cloud Snitches
Dynamic Snitch
Node Configuration
Tokens and Virtual Nodes
Network Interfaces
Data Storage
Startup and JVM Settings
Adding Nodes to a Cluster
Dynamic Ring Participation
Replication Strategies
SimpleStrategy
NetworkTopologyStrategy
Changing the Replication Factor
Summary
8. Clients
Hector, Astyanax, and Other Legacy Clients
DataStax Java Driver
Development Environment Configuration
Clusters and Contact Points
Sessions and Connection Pooling
Statements
Policies
Metadata
Debugging and Monitoring
DataStax Python Driver
DataStax Node.js Driver
DataStax Ruby Driver
DataStax C# Driver
DataStax C/C++ Driver
DataStax PHP Driver
Summary
9. Reading and Writing Data
Writing
Write Consistency Levels
The Cassandra Write Path
Writing Files to Disk
Lightweight Transactions
Batches
Reading
Read Consistency Levels
The Cassandra Read Path
Read Repair
Range Queries, Ordering and Filtering
Functions and Aggregates
Paging
Speculative Retry
Deleting
Summary
10. Monitoring
Logging
Tailing
Examining Log Files
Monitoring Cassandra with JMX
Connecting to Cassandra via JConsole
Overview of MBeans
Cassandra’s MBeans
Database MBeans
Networking MBeans
Metrics MBeans
Threading MBeans
Service MBeans
Security MBeans
Monitoring with nodetool
Getting Cluster Information
Getting Statistics
Summary
11. Maintenance
Health Check
Basic Maintenance
Flush
Cleanup
Repair
Rebuilding Indexes
Moving Tokens
Adding Nodes
Adding Nodes to an Existing Data Center
Adding a Data Center to a Cluster
Handling Node Failure
Repairing Nodes
Replacing Nodes
Removing Nodes
Upgrading Cassandra
Backup and Recovery
Taking a Snapshot
Clearing a Snapshot
Enabling Incremental Backup
Restoring from Snapshot
SSTable Utilities
Maintenance Tools
DataStax OpsCenter
Netflix Priam
Summary
12. Performance Tuning
Managing Performance
Setting Performance Goals
Monitoring Performance
Analyzing Performance Issues
Tracing
Tuning Methodology
Caching
Key Cache
Row Cache
Counter Cache
Saved Cache Settings
Memtables
Commit Logs
SSTables
Hinted Handoff
Compaction
Concurrency and Threading
Networking and Timeouts
JVM Settings
Memory
Garbage Collection
Using cassandra-stress
Summary
13. Security
Authentication and Authorization
Password Authenticator
Using CassandraAuthorizer
Role-Based Access Control
Encryption
SSL, TLS, and Certificates
Node-to-Node Encryption
Client-to-Node Encryption
JMX Security
Securing JMX Access
Security MBeans
Summary
14. Deploying and Integrating
Planning a Cluster Deployment
Sizing Your Cluster
Selecting Instances
Storage
Network
Cloud Deployment
Amazon Web Services
Microsoft Azure
Google Cloud Platform
Integrations
Apache Lucene, SOLR, and Elasticsearch
Apache Hadoop
Apache Spark
Summary
Index