Chapter 9. Backup

As discussed in Chapter 8, storage space on OpenShift gears is subject to a quota. Over time, your application may well generate more data than you have space for on the platform. When you manipulate data on your gears, there is the possibility of accidental data corruption or deletion. When you deploy a new version of your application code, there is a chance, there are pesky bugs lurking between your test cases. In order to be able to respond promptly to any application issues and protect your app against unexpected data loss, you should have a backup strategy. In this chapter, we will showcase the application backup tools included in RHC and demonstrate how to use Cron to back up your database or files.

Managing Deployments and Rollbacks

When you create a new OpenShift application, it is configured out of the box to automatically deploy any changes pushed to the application Git repository. By default, only the latest version of your code is kept on your OpenShift gear. Both of these behaviors can be altered to give you more control over your deployments.

Manual Deployments

To disable the automatic deployment of pushed Git commits, use the command rhc app-configure --no-auto-deploy. You can change back to automatic deployment with rhc app-configure --auto-deploy. To deploy the latest version of the Git repository manually, use the command rhc app deploy ref. This command accepts the flags --hot-deploy, --no-hot-deploy, --force-clean-build, and --no-force-clean-build, which you can use as an alternative to or override for marker files to trigger or disable these actions (see Using Marker Files for more on marker files).

Here is an example of manually deploying a new commit to Insult App:

[me@localhost ~/insultapp]$ rhc app-configure --no-auto-deploy
Configuring application 'insultapp' ... done

insultapp @ http://insultapp-osbeginnerbook.rhcloud.com/
(uuid: 6e7672676e61676976757570)
---------------------------------------------------------------------------
  Deployment:        manual (use 'rhc deploy')
  Keep Deployments:  1
  Deployment Type:   git
  Deployment Branch: master

Your application 'insultapp' is now configured as listed above.

Use 'rhc show-app insultapp --configuration' to check your configuration values
any time.
[me@localhost ~/insultapp]$ git status
# On branch master
# Your branch is ahead of 'origin/master' by 1 commit.
#   (use "git push" to publish your local commits)
#
nothing to commit, working directory clean
[me@localhost ~/insultapp]$ git push
Counting objects: 5, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 290 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To ssh://6e7672676e61676976757570@insultapp-osbeginnerbook.rhcloud.com/~/git/
insultapp.git/
   a38c5e3..c100ed9 master -> master
[me@localhost ~/insultapp]$ rhc app deploy c100ed9
Deployment of git ref 'c100ed9' in progress for application insultapp ...

The reference supplied as an argument to rhc app deploy can be the identifier for a git commit, tag, or branch. In this case, we have just used the latest git commit, as shown in the git push command output.

Tip

We have shown deployments with Git, but it is also possible to deploy binaries with RHC. To switch to binary deployment, use the command rhc app-configure --deployment-type binary. You can then save a snapshot of your active deployment, as detailed in Application Snapshots with RHC, and deploy an altered version with rhc deploy /path/to/app.tar.gz -a appname.

Keeping and Utilizing Deployment History

Another feature you may wish to configure in RHC is the number of saved deployments. By default, this is set to one, which means only the current deployment is stored. Setting this to a higher figure will tell OpenShift to keep a copy of the application repository and dependencies used for each recent deployment, up to the given number of deployments. This enables you to use RHC to quickly roll back to a previous deployment if something goes wrong, without having to fiddle with your Git history. The deployment files are stored in the app-deployments directory on the gear. They do contribute to your storage quota, so you probably do not want to keep any more history than you really need.

To configure the number of deployments stored, use the command rhc app-configure --keep-deployments number. You can list the saved deployments with rhc deployment list and show more information on a given one with rhc deployment show id. To activate a particular deployment, use the command rhc deployment activate id.

Here is an example of configuring the Insult App to keep the current and previous two deployments:

[me@localhost ~/insultapp]$ rhc app-configure --keep-deployments 3
Configuring application 'insultapp' ... done

insultapp @ http://insultapp-osbeginnerbook.rhcloud.com/
(uuid: 6e7672676e61676976757570)
---------------------------------------------------------------------------
  Deployment:        auto (on git push)
  Keep Deployments:  3
  Deployment Type:   git
  Deployment Branch: master

Your application 'insultapp' is now configured as listed above.

Use 'rhc show-app insultapp --configuration' to check your configuration values
any time.

When we deploy our next commit, we might find we accidentally included something undesirable in our newly deployed code. We know the previously deployed version was OK, though, so we can fix that in a jiffy:

[me@localhost ~/insultapp]$ deployment list
3:14 PM, deployment 70692b2b
6:28 PM, deployment 7461752d
[me@localhost ~/insultapp]$ rhc deployment activate 70692b2b
Activating deployment '70692b2b' on application insultapp ...

Once this command completes, the application code and dependencies will be as they were before the most recent deployment.

Application Snapshots with RHC

While it is useful to be able to keep a record of your OpenShift deployments with RHC, this mechanism only keeps track of repository code and its dependencies. If you wish to take a snapshot of the entire application and its state, you should use the rhc snapshot command. This command exports the current state of your application, including the repository code, SQL dumps of any database cartridges, $OPENSHIFT_DATA_DIR files, and anything else the cartridges used are configured to export. The gzipped TAR file created can be used to later restore the state of the application.

Warning

Both taking application snapshots and restoring an application to a saved state require the application to be stopped and restarted.

To take a snapshot of your application, use the command rhc snapshot save. You can add the --filepath path option to specify the location and filename of the archive file. Add the --deployment option if you wish to save the snapshot as a deployable file suitable for use with the rhc deploy command. To restore the application from the archive file, use the command rhc snapshot restore --filepath /path/to/tarball. Note that not everything included in the archive is necessarily re-created; log files are not restored.

Here is an example of saving our Insult App application:

[me@localhost ~/insultapp]$ rhc snapshot save
Pulling down a snapshot to insultapp.tar.gz...
Creating and sending tar.gz

RESULT:
Success
[me@localhost ~/insultapp]$ ls
app.py.disabled  data  import.sql  insultapp.tar.gz  libs  README.md  setup.py
setup.pyc  setup.pyo  wsgi

The command produced a tarball called insultapp.tar.gz. Now we can make some changes, pushing a new commit and connecting to the application via SSH to delete files from the persistent data directory and add content to the database. We then decide we want to restore the previous state, which we accomplish as shown here:

[me@localhost ~/insultapp]$ rhc snapshot restore
Restoring from snapshot insultapp.tar.gz...
Removing old git repo: ~/git/insultapp.git/
Removing old data dir: ~/app-root/data/*
Restoring ~/git/insultapp.git and ~/app-root/data
Activation status: success

RESULT:
Success

The Git repository, database, and data directory are now as they were when we took the snapshot.

You can also use the rhc snapshot command to create a clone of an OpenShift application. To do this, create a new application of the same type (scalable/nonscalable) with the same cartridges and run the restore command, supplying the location of the archive file.

Backing Up Your Database

In addition to keeping deployment history and taking application snapshots with RHC, you may well want to make backups of your database. These do not require application downtime and can be performed regularly with the aid of the Cron utility. We discussed how to connect to your database using SSH in Chapter 5 and showed how to use port forwarding to interact with your database in Chapter 7. We demonstrated how to add the Cron cartridge to your application in Cron. In this section, we will give an example of a Cron script to create regular data dumps on the database gear and then show two approaches to moving those backups off the gear. You could write similar Cron scripts to back up other files, such as anything your application persists in $OPENSHIFT_DATA_DIR.

Writing a Cron Script

Our Insult App demo application uses a PostgreSQL database, so the command we will use to create SQL dumps is pg_dump. We want to create a backup every day, so we create the following file in our Git repository at .openshift/cron/daily/backupdb (if you need a refresher on OpenShift environment variables, see Chapter 5):

#!/bin/bash

DATE=`date +"%Y-%m-%d"`
FILE="$OPENSHIFT_APP_NAME-$DATE.sql.gz"
INIT_PATH=$OPENSHIFT_DATA_DIR/$FILE
BACKUP_DIR=$OPENSHIFT_DATA_DIR/sqlbackup

if [ ! -d "$BACKUP_DIR" ]; then
  mkdir $BACKUP_DIR
fi
pg_dump $OPENSHIFT_APP_NAME | gzip > $INIT_PATH
mv $INIT_PATH $BACKUP_DIR/$FILE

This Cron job will create a SQL dump every day in the persistent data directory on our gear. We have chosen to create the file in one directory and then move it to another when the SQL dump is completed to avoid any issues with partially created files when copying the backups elsewhere. To use the SQL dump to re-create the database from scratch, we could issue commands such as the following in an SSH session on the database gear:

[insultapp-osbeginnerbook.rhcloud.com 6e7672676e61676976757570]\> dropdb
$OPENSHIFT_APP_NAME
[insultapp-osbeginnerbook.rhcloud.com 6e7672676e61676976757570]\> createdb
$OPENSHIFT_APP_NAME
[insultapp-osbeginnerbook.rhcloud.com 6e7672676e61676976757570]\> gunzip -c
$OPENSHIFT_DATA_DIR/insultapp-sqlbackup-2014-03-14.gz | psql $OPENSHIFT_APP_NAME

We could also simply use the final command to run the SQL from insultapp-sqlbackup-2014-03-14.gz on the existing database.

Tip

If we were creating a similar Cron job for a scalable application it would run on every gear in the app, so we would want to add logic to check which gear the commands were being executed on before attempting the SQL dump. We could use environment variables such as OPENSHIFT_GEAR_UUID to determine this, or check for a particular file on the target gear.

Moving Data off the Gear

In the previous section we created a Cron job to create daily database backups on our OpenShift application gear. You may also want to move or copy these backups to another location, to save storage space or in case something goes wrong with the gear. There are two approaches we can take to shifting the files: push-based or pull-based. The approach you choose will depend on your systems and situation.

If you decide to take a pull-based approach, you could create a Cron job on the system to which you want to copy the backups. This script would connect to the OpenShift gear at regular intervals and copy the backup files; one way to accomplish this would be using the rsync tool. Here is an example of a pull-based daily Cron script for Insult App:

#!/bin/bash
rsync -avz --remove-source-files -e ssh 6e7672676e61676976757570@insultapp-
osbeginnerbook.rhcloud.com:~/app-root/data/sqlbackup /backup

This job copies the sqlbackup directory and its files from the gear to the local directory at /backup. It also deletes the files from the gear after they have been successfully copied. For this to work, the system running the job must be able to access the OpenShift gear via SSH; its public key should have been added to the related OpenShift account (see Chapter 5 for more on accessing gears via SSH).

The second alternative is a push-based approach. You may wish to push files to an external service such as Amazon S3 or Dropbox, or your own server. In order to use SSH and associated tools such as scp or rsync to send files from your gear to elsewhere, you will need access to a public/private key pair on the gear. The .ssh directory within an OpenShift user’s home directory is not writable, so we will need to create a new key set within the persistent data directory. You can do so by issuing the following commands on your gear. In this example, we do not set a passphrase for the key pair; if you wish to set a passphrase you will need to modify the Cron script shown to deal with this:

[insultapp-osbeginnerbook.rhcloud.com 6e7672676e61676976757570]\> mkdir
$OPENSHIFT_DATA_DIR/.ssh
[insultapp-osbeginnerbook.rhcloud.com 6e7672676e61676976757570]\> ssh-keygen -f
$OPENSHIFT_DATA_DIR/.ssh/id_rsa
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in ...openshift/var/lib/openshift/
6e7672676e61676976757570/app-root/data//.ssh/id_rsa.
Your public key has been saved in /var/lib/openshift/6e7672676e61676976757570/
app-root/data//.ssh/id_rsa.pub.
The key fingerprint is:
4e:79:61:6e:79:61:6e:79:61:6e:79:61:6e:79:61:21 6e7672676e61676976757570@ex-std-
node710.prod.rhcloud.com
The key's randomart image is:
+--[ RSA 2048]----+
|            . ==@|
|             *oXo|
|            +.+.=|
|           +...  |
|        S = +    |
|         + .     |
|        E        |
|                 |
|                 |
+-----------------+
[insultapp-osbeginnerbook.rhcloud.com 6e7672676e61676976757570]\> ls
$OPENSHIFT_DATA_DIR/.ssh
id_rsa  id_rsa.pub

This command has created two files for us: the private SSH key contained in id_rsa and the public key in id_rsa.pub. We will need to add id_rsa.pub to the SSH configuration for the target user on our backup server, which we are going to call mybackupserver.com because we are breathtakingly original. One way to do this would be copying the entire contents of the new id_rsa.pub file and adding it to ~/.ssh/authorized_hosts in user’s home directory on the target server.

Now that we have access to a private SSH key on the gear, here is a revised version of the Insult App .openshift/cron/daily/backupdb script from the previous section that both creates the SQL dump and copies it to another server using the scp (secure copy) command:

#!/bin/bash

DATE=`date +"%Y-%m-%d"`
FILEPATH="$OPENSHIFT_DATA_DIR/$OPENSHIFT_APP_NAME-$DATE.sql.gz"

pg_dump $OPENSHIFT_APP_NAME | gzip > $FILEPATH
scp -i $OPENSHIFT_DATA_DIR/.ssh/id_rsa -o StrictHostKeyChecking=no $FILEPATH
user@mybackupserver.com:~/backup && rm $FILEPATH

This script creates the SQL dump in the persistent data directory on the OpenShift gear, copies the archive to backup within user’s home directory on mybackupserver.com (the backup directory should already exist), and, if this action is successful, deletes the file on the gear. The secure copy command references the private SSH key we created on the OpenShift gear with the -i (identity file) option. We have disabled StrictHostKeyChecking as we do not want the Cron job to wait for someone to type “yes” to approve the connection; we want to be off doing something way more entertaining while our backup script works its magic each day.

In this chapter, we have shown three different ways to keep backup copies of aspects of your OpenShift application: deployment history, snapshots, and database dumps. We have also discussed how to do manual and binary deployments on the platform and how to move files from OpenShift to elsewhere. The way that you utilize these techniques to form your application backup strategy is dependent on your individual needs, so we cannot prescribe a one-size-fits-all approach. However, at a minimum we would recommend that you back up your database periodically, as well as any files persisted elsewhere. Carefully consider the impact of any data loss or downtime when deciding how frequently your backups should be performed and how much deployment history to store.