In a multi-data center, high-availability deployment, it may seem like taking backups of your Cassandra nodes is unnecessary. After all, if a node crashes and doesn't come back, there are usually other nodes in the data center containing replicas of the data lost. If an entire data center is lost, a new data center can be deployed into the existing Cassandra cluster, with data streamed from another data center.
Even if a copy of the data directory of a node was to be taken and copied off site, those files are of limited use in a large-scale deployment. This is because a node will only store data for specific token ranges. With Vnodes, token ranges are numerous and non-contiguous, making the backed-up data only valid for that specific node. Additionally, if you should happen to add or remove nodes from your cluster, the token ranges will be recalculated, essentially invalidating any off-site copies of node-specific data.
Despite these challenges, having a solid backup and restore strategy for your cluster(s) is essential. And while deploying multiple nodes across multiple data centers might help for disaster recovery, it does not help in situations where data has been corrupted, tampered with, or accidentally deleted.
Cassandra does allow its operators to take snapshots. Snapshots are periodic, point-in-time bookmarks for your data. They can be taken on a table, keyspace, or system-wide basis. Additionally, you can also enable incremental backups, which essentially backs up data changed since the last snapshot. To enable incremental backups, edit the incremental_backups property in your cassandra.yaml configuration file, and set it to true:
incremental_backups: true
It is very easy to take a snapshot in Cassandra. Simply invoke it through nodetool and provide as much focus as you require, such as nodetool snapshot [keyspace_name].[table_name]:
nodetool snapshot packt.logins_by_user
Requested creating snapshot(s) for [packt.logins_by_user] with snapshot name [1504986577085] and options {skipFlush=false}
Snapshot directory: 1504986577085