Now, let's try working with Cassandra from Python. First, make sure to install the DataStax Cassandra Python driver:
pip install cassandra-driver
Let's write a simple script to query the system.local table, and call it cassHelloWorld.py.
First, we will add our imports. We will need the cluster and (since we have enabled auth) PlainTextAuthProvider items. Additionally, we will need the sys module to pull in command-line arguments:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import sys
Now we will pull in our hostname, username, and password from the command-line arguments. Cassandra uses an array of endpoints to connect to, so we 'll also create a new array and add our hostname to it:
hostname=sys.argv[1]
username=sys.argv[2]
password=sys.argv[3]
nodes = []
nodes.append(hostname)
Now we will use PlainTextAuthProvider to pass along our username and password to authenticate with the cluster. Then we will set a local session object to keep our connection and pass it the system keyspace to connect to:
auth = PlainTextAuthProvider(username=username, password=password)
cluster = Cluster(nodes,auth_provider=auth)
session = cluster.connect("system")
Our CQL query will pull down a few columns from the system.local table. This particular data will reveal some information about the cluster that we are connecting to:
strCQL = """ SELECT cluster_name,data_center,listen_address,release_version
FROM local WHERE key='local'
"""
Next, we'll execute our query and process the result set. The system.local table will only ever contain a single row, but it is still a good idea to get into the habit of processing a complete result set. Once the result set has been printed, we will close our connection to Cassandra:
rows = session.execute(strCQL)
print("Hello world from:")
for row in rows:
print(row[0] + " " + row[1] + " " + row[2] + " " + row[3])
#closing Cassandra connection
session.shutdown()
Running this from the command line yields the following output:
python cassHelloWorld.py 192.168.0.100 cassdba flynnLives
Hello world from:
PermanentWaves 'LakesidePark' 192.168.0.100 3.10
We can also use Python to interact with the existing tables in our packt keyspace (that we created in the preceding sections). We will name this script queryUser.py. It will require the logins_by_user table which was introduced earlier in the chapter. If you have not created it, go ahead and do that now:
CREATE TABLE packt.logins_by_user (
user_id text,
login_datetime timestamp,
origin_ip text,
PRIMARY KEY ((user_id), login_datetime)
) WITH CLUSTERING ORDER BY (login_datetime DESC);
The imports and command-line arguments will be similar to the previous ones, except that we will add a variable to process as user_id from the command line. We will also define our keyspace as a variable:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import sys
hostname = sys.argv[1]
username = sys.argv[2]
password = sys.argv[3]
userid = sys.argv[4]
nodes = []
nodes.append(hostname)
The code to define the connection to Cassandra will remain the same as shown here:
auth = PlainTextAuthProvider(username=username, password=password)
cluster = Cluster(nodes,auth_provider=auth)
session = cluster.connect(keyspace)
We will prepare and execute an INSERT to our logins_by_user table to record a new entry. For the login_datetime, we will pass the dateof(now()) nested function, which will add the current time as a timestamp from the server-side Cassandra:
strINSERT = """
INSERT INTO logins_by_user (user_id,login_datetime,origin_ip)
VALUES (?,dateof(now()),?)
"""
pINSERTStatement = session.prepare(strINSERT);
session.execute(pINSERTStatement,['aploetz','192.168.0.114'])
Then we will prepare a query for the last three entries for that user:
strSELECT = """
SELECT * FROM logins_by_user WHERE user_id=? LIMIT 3;
"""
pSELECTStatement = session.prepare(strSELECT);
Finally, we'll process the result set and close our connection:
rows = session.execute(pSELECTStatement,[userid])
print("Data for user %s:" % userid)
for row in rows:
#only one row in system.local
print(row[0] + " " +
str(row[1]) + " " +
row[2])
#closing Cassandra connection
session.shutdown()
Running this from the command line yields the following output:
python queryUser.py 192.168.0.100 cassdba flynnLives aploetz
aploetz 2017-06-10 15:26:23.329000 192.168.0.114
aploetz 2017-06-03 14:04:55 192.168.0.101
aploetz 2017-06-02 18:23:11 192.168.0.105
Notice the difference between inserting a timestamp without milliseconds, and dateof(now()).