Getting Started with OpenShift

Chapter 8. Disk Usage

Almost all applications want to either read or store files directly on the servers they are hosted upon. These files could be images, text files, documents, or the files that the database uses to store the data. Sometimes you (or your application) just need a temporary space to output a file or store a file before processing. When you move to “the cloud,” file storage has different properties than storing files on your laptop or even your own server in a rack. OpenShift has ways of handling all these application needs.

Where You Can Write “to Disk”

As an OpenShift application developer, you are given specific locations on “disk” where you are allowed to create or modify files and directories. We use disk in quotes because, as a developer, you are not actually sure what the space is located on—it could be disk drives, solid state drives, a network-attached storage (NAS) device, or any other storage location. As a developer, there are only two locations you should write files: /tmp and the gear’s data directory.

As on all Linux systems, you have read, write, and execute permissions for the /tmp directory. However, unlike a typical Linux machine, where everyone on the machine shares those permissions on /tmp, OpenShift uses pluggable authentication module (PAM) namespaces to give you your very own /tmp. This means nobody else on the machine can see or use the /tmp that you use. The problem with putting files here is that the space is ephemeral, meaning there is no guarantee how long a file or directory will remain there. Furthermore, any data in /tmp will not survive an application restart.

The other directory available to you is the OpenShift data directory, which is currently at $OPENSHIFT_HOMEDIR/app-root/data. We use the environment variable OPENSHIFT_DATA_DIR to point to this location. By using the environment variable you increase the portability and maintainability of your application, so we highly recommend using it in your code and configuration (see Chapter 5 for more on environment variables). In this private data directory, you also have full read, write, and execute permissions. However, this directory is persistent, allowing it to survive application stops and starts.

The data directory is where your application should store its files and put configuration settings, download themes, or generally anything you want to survive restarts and Git pushes. You can’t store anything in your Git directory unless you use the Git tools. Anything written there outside of the Git lifecycle will be overwritten on the next Git push.

To clarify, the you we have been talking about here is actually the user ID of your application, which is a 24-character hexadecimal number. The permissions are granted to this user ID, which is the same user you identify yourself as when you SSH into or use secure FTP (SFTP) with the gear. It is also the user who owns the processes for your application servers, databases, or any other binaries you execute on the gear.

Warning

A current limitation in OpenShift is that the data directory is not on a shared disk space for all the gears in a scalable application. This means that when a new gear spins up in a scalable application, its data directory will be empty. There is also no default method to automatically synchronize the contents of the data directories. As this book goes to print, the preferred solution for shared storage is to either use a database to store the shared entities or place them in external storage, such as Amazon’s S3. We discuss the use of an external service at the end of the chapter.

Determining How Much Disk Space Is Used

At the time of writing, each gear in the OpenShift free plan was given 1 GB of disk space. If you moved into the paid tier, your gear could be up to 6 GB. The locations that count against that quota are:

Your gear’s /data directory
/tmp
Your Git repository on the gear
The log files for your application and database servers
The data files for your database server

The easiest way to check your disk usage is by using the RHC command-line tools:

rhc app show appname --gears quota

If you are executing the command from within the Git repository for your application, then you can omit the ``appname`` from the command. This will give you output that shows one line per gear in your application.

Here is an example:

Gear                     Cartridges                 Used Limit
------------------------ ------------------------ ------ -----
6861736b656c6c72756c6573 postgresql-9.2           75 MB  1 GB
6c616d626461733465766572 jbossews-2.0 haproxy-1.4 363 MB 1 GB

Here you can see we have two gears in this application. The gear with PostgreSQL on it is using 75 MB and the gear with JBoss is using 363 MB.

If you want to see how much disk space is used and you are comfortable with the Linux quota command, you can always SSH into a gear and use it to check your space.

To see all your gears and their SSH URLs, you can execute the command rhc app show appname --gears and then SSH into each gear to run quota.

OpenShift will also start to warn you both on git push and when you SSH into your gears if you exceed 90% of your quota.

Tip

If you are not familiar with *nix-style terminal commands, especially if you are a Microsoft Windows user, please see Appendix A.

Copying Files to or from Your Local Machine

Since OpenShift uses SSH for all communication with the server, the two main ways to transfer files up to your gears are SFTP (secure FTP) and SCP (secure copy). SCP is only for moving files back and forth, while SFTP lets you do things like listing directories and removing files. You can also use any tool that can use SSH, such as rsync, but we are just going to cover SFTP and SCP here.

For people who prefer using a GUI for their file transfers, we highly recommend FileZilla, a FOSS file transfer tool that can communicate over SFTP. Please be aware, though, that FileZilla uses PuTTY-based SSH keys while OpenShift uses OpenSSH keys. You will need to convert your public SSH key to a PuTTY public key. There is a blog post on the OpenShift website covering the details on how to convert your key and use FileZilla.

The syntax for using scp to copy to your gear is fairly straightforward:

$scp localFile 6e7672676e61676976757570@insultapp-osbeginnerbook.rhcloud.
com:/app-root/data/

The file localFile can also be replaced with a directory, and you can use -r to copy directories recursively. Please remember that due to file permissions, you can’t write to your home directory and instead need to write to $OPENSHIFT_DATA_DIR or /tmp.

The syntax to copy a file down to your machine is just as straightforward:

$ scp 6e7672676e61676976757570@insultapp-osbeginnerbook.rhcloud.
com:/tmp/dbbackup.tgz /data/databaseBackups/

Finally, you can also move files between two gears in the same application:

# Assumes you are in the /tmp directory on a gear
$ scp dbBackup.tgz 6e7672676e61676976757570@6e7672676e61676976757570-
osbeginnerbook.rhcloud.com:/tmp

Other Storage Options

The final way to create storage space for your OpenShift application is to use an external storage service such as S3 or Dropbox. You can utilize these services using the same processes as you would on your local machine—you can access them programmatically but not directly as a backup service. You could also create a Cron job (see Writing a Cron Script) to copy contents from your gears to one of these services.

Tip

If there is a specific application you want to use on OpenShift, such as WordPress, we recommend doing a search for an S3 or Dropbox plug-in, such as wp-tantan-3 or Updraft.

No matter how you look at it, there are a lot of different options for storage on OpenShift, including putting assets in your database.