Seven NoSQL Databases in a Week

Cassandra installs with the Cassandra Query Language Shell (CQLSH) tool. This command-line interface allows you to perform schema changes, user maintenance, and run queries.

CQLSH requires Python 2.7.

To run cqlsh , you can invoke it from the command line. If you have authorization and authentication enabled (and you should), you can start it like this:

cqlsh 192.168.0.100 -u cassandra -p cassandra

You should see output similar to the following, and beat a cqlsh prompt:

Connected to PermanentWaves at 192.168.0.100:9042.
[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh>

The first administrative task that should be completed is to tighten up security. You'll want to create a new administrative user (so that you don't need the default Cassandra user), and change the password on the default cassandra/cassandra user. The recommendation is to change it to something long and indecipherable because you shouldn't need to use it again:

cassandra@cqlsh> CREATE ROLE cassdba
  WITH PASSWORD='flynnLives' AND SUPERUSER=true AND LOGIN=true;
cassandra@cqlsh> ALTER ROLE cassandra 
  WITH PASSWORD='2f394084e98a4bec92405f73e2e634ea';

Now, log in with your newly created user. You should see that the cqlsh prompt changes:

cqlsh 192.168.0.100 -u cassdba -p flynnLives
Connected to PermanentWaves at 192.168.0.100:9042.
[cqlsh 5.0.1 | Cassandra 3.10 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cassdba@cqlsh>

Now that we are logged in as a non-default user, let's go ahead and create a keyspace (database) to work in. As we want our cluster to be data center aware, and we are already using GossipingPropertyFileSnitch, we will create our new keyspace using the NetworkTopologyStrategy replication strategy. Creating a keyspace using the default SimpleStrategy does not work properly with a plural data center configuration, and it offers no benefits over NetworkTopologyStrategy. Therefore, it is best not to get into the habit of using it at all. For the purposes of this exercise, I will suggest the keyspace name of packt:

CREATE KEYSPACE packt WITH replication = {
  'class':'NetworkTopologyStrategy', 'LakesidePark':'1'}
  AND durable_writes = true;

As previously discussed, this command creates a new keyspace named packtusing NetworkTopologyStrategy. Our data center is named LakesidePark, and we are specifying a replication factor (RF) of one to store only a single replica of all data in that data center. The durable_writes property is set to true. This is the default setting for durable writes (so our setting is superfluous). Note that disabling durable writes (setting it to false) prevents write operations for this keyspace from being written to the commit log. Disabling it is not advised unless you have a specific use case for it, so use with caution!

The RF of one specified is only recommended for single-node clusters. If you joined additional nodes, as detailed in the preceding instructions, you will want to increase that. The accepted practice is to set the RF equal to (not greater than) the number of nodes with a maximum RF of three. I have seen RFs set higher than three in two cases, as mentioned here:

A specific use case for a higher level of consistency
The keyspace was created by someone who didn't really understand (or care about) the extra overhead involved in maintaining additional replicas

Now, let's go ahead and create a table to demonstrate a few examples. We will create a table called astronauts_by_group, and design it to work with a query to select NASA astronauts by selection group (group):

CREATE TABLE packt.astronauts_by_group ( 
  name text, year int, group int, status text, dob text, 
  birthplace text, gender text, alma_mater text, spaceflights int, 
  spaceflight_hours int, spacewalks int, spacewalk_hours int, 
  missions text, 
  PRIMARY KEY (group,name)) 
WITH CLUSTERING ORDER BY (name asc);

Next, we will use the CQLSH copy command to import a comma-separated values (CSV) file containing data on more than 350 NASA astronauts:

COPY packt.astronauts_by_group (name, year, group, status, dob, birthplace,
  gender, alma_mater, spaceflights, spaceflight_hours, spacewalks,
  spacewalk_hours, missions)
  FROM '~/Documents/Packt/astronauts.csv' WITH HEADER=true;

The CSV file can be found in the GitHub repo at https://github.com/aploetz/packt.

It is also important to remember that the copy command is not a part of CQL. It is specific to CQLSH, and only works from within CQLSH.

Now, running a query for the astronaut's group where group is equal to 1 yields this result set:

    cassdba@cqlsh:packt> SELECT name, alma_mater, birthplace FROM astronauts_by_group WHERE group=1;
    
     name                   | alma_mater                        | birthplace
    ------------------------+-----------------------------------+----------------
        Alan B. Shepard Jr. |                  US Naval Academy | East Derry, NH
          Donald K. Slayton |           University of Minnesota |     Sparta, WI
          John H. Glenn Jr. |                 Muskingum College |  Cambridge, OH
       L. Gordon Cooper Jr. | Air Force Institute of Technology |    Shawnee, OK
         M. Scott Carpenter |            University of Colorado |    Boulder, CO
          Virgil I. Grissom |                 Purdue University |   Mitchell, IN
      Walter M. Schirra Jr. |                  US Naval Academy | Hackensack, NJ
    
    (7 rows)

This particular query shows data for the famous Mercury Seven astronauts. To exit cqlsh and return to your Linux prompt, simply type exit:

cassdba@cqlsh:packt> exit

Table of Contents for Seven NoSQL Databases in a Week

Table of Contents for
Seven NoSQL Databases in a Week