Seven NoSQL Databases in a Week

Let's say that after taking a snapshot, one of our tables ends up with corrupted data for a particular user. The application team fixes the problem on their end, and simply asks for us to restore the logins_by_user table to the last snapshot.

First of all, let's take a look at the data in question:

cassdba@cqlsh> use packt ; 
cassdba@cqlsh:packt> SELECT * FROM logins_by_user WHERE user_id='avery' LIMIT 1; 
 
 user_id | login_datetime                  | origin_ip 
---------+---------------------------------+----------- 
   avery | 1970-01-01 19:48:33.945000+0000 | 10.0.15.2 
 
(1 rows)

Obviously, the new user did not recently log in on January 1, 1970, so our corrupted data has been presented to us. To ensure that we are starting from a clean slate, truncate the table:

cassdba@cqlsh:packt> truncate table packt.logins_by_user;

Assuming the data for our keyspace is in /var/lib/cassandra/data/packt, let's take a look at it:

    cd /var/lib/cassandra/data/packt
    ls -al
    total 20
    drwxrwxr-x  5 aploetz aploetz 4096 Jul 18 09:23 .
    drwxr-xr-x 18 aploetz aploetz 4096 Jun 10 09:06 ..
    drwxrwxr-x  3 aploetz aploetz 4096 Jul 18 14:05 astronauts-b27b5a406bc411e7b609c123c0f29bf4
    drwxrwxr-x  3 aploetz aploetz 4096 Jul 18 14:05 astronauts_by_group-b2c163f06bc411e7b609c123c0f29bf4
    drwxrwxr-x  4 aploetz aploetz 4096 Sep  9 14:51 logins_by_user-fdd9fa204de511e7a2e6f3d179351473

We see a directory for the logins_by_user table. Once a snapshot has been taken, each table directory should also have a "snapshots" directory, so let's cd into that and list it out:

    cd logins_by_user-fdd9fa204de511e7a2e6f3d179351473/snapshots
    ls -al
    total 16
    drwxrwxr-x 4 aploetz aploetz 4096 Sep  9 14:58 .
    drwxrwxr-x 4 aploetz aploetz 4096 Sep  9 14:58 ..
    drwxrwxr-x 2 aploetz aploetz 4096 Sep  9 14:55 1504986577085
    drwxrwxr-x 2 aploetz aploetz 4096 Sep  9 14:58 truncated-1504987099599-logins_by_user

Recalling the output from our earlier nodetool snapshot command, the 1504986577085 directory was the name of the snapshot taken. Enter that directory, and list it out:

    ls -al
    total 52
    drwxrwxr-x 2 aploetz aploetz 4096 Sep  9 14:49 .
    drwxrwxr-x 3 aploetz aploetz 4096 Sep  9 14:49 ..
    -rw-rw-r-- 1 aploetz aploetz   31 Sep  9 14:49 manifest.json
    -rw-rw-r-- 2 aploetz aploetz   43 Jun 10 10:53 mc-1-big-CompressionInfo.db
    -rw-rw-r-- 2 aploetz aploetz  264 Jun 10 10:53 mc-1-big-Data.db
    -rw-rw-r-- 2 aploetz aploetz    9 Jun 10 10:53 mc-1-big-Digest.crc32
    -rw-rw-r-- 2 aploetz aploetz   16 Jun 10 10:53 mc-1-big-Filter.db
    -rw-rw-r-- 2 aploetz aploetz   11 Jun 10 10:53 mc-1-big-Index.db
    -rw-rw-r-- 2 aploetz aploetz 4722 Jun 10 10:53 mc-1-big-Statistics.db
    -rw-rw-r-- 2 aploetz aploetz   65 Jun 10 10:53 mc-1-big-Summary.db
    -rw-rw-r-- 2 aploetz aploetz   92 Jun 10 10:53 mc-1-big-TOC.txt
    -rw-rw-r-- 1 aploetz aploetz  947 Sep  9 14:49 schema.cql

All of these files need to be copied into the logins_by_user-fdd9fa204de511e7a2e6f3d179351473 directory. As we have navigated our way down to the directory containing the snapshot files, we can do this with a simple command:

    cp * ../../

This copies all files from the current directory into the directory two levels up, which is /var/lib/cassandra/data/packt/logins_by_user-fdd9fa204de511e7a2e6f3d179351473. Now, we will bounce (stop/restart) our node. Go back into cqlsh, and rerun the prior query:

cassdba@cqlsh> use packt ; 
cassdba@cqlsh:packt> SELECT * FROM logins_by_user WHERE user_id='avery' LIMIT 1; 
 
 user_id | login_datetime                  | origin_ip 
---------+---------------------------------+----------- 
   avery | 2017-09-09 19:48:33.945000+0000 | 10.0.15.2 
 
(1 rows)

It is important to note that snapshots and incremental backups are essentially hard links created to sstable files on disk. These hard links prevent sstable files from being removed once compacted. Therefore, it is recommended to build a process to remove old snapshots and backups that are no longer needed.

Table of Contents for
Seven NoSQL Databases in a Week

Restoring from a snapshot

Table of Contents for Seven NoSQL Databases in a Week

Table of Contents for
Seven NoSQL Databases in a Week