Chapter 6. Library Dependencies

All applications beyond the very simple will have a requirement to use outside libraries or dependencies. Examples of dependencies might be libraries for database connectivity, turning images to text (object character recognition or OCR), calculating statistics, or web templating. In this chapter we will show you how to use libraries in your OpenShift applications. We will add the database drivers to our Insult App and then use it to access the insults stored in the database.

Where to Declare Dependencies

All modern programming languages have a “build” process; OpenShift takes advantage of this to build your application dependencies. At the time of this writing we are using the processes listed in Table 6-1 to pull in dependencies for external libraries.

Table 6-1. Dependency mechanisms used by OpenShift, by language
LanguageDependency mechanism

Java

Maven

Python

Pip

Ruby

Gem

Node.js (JavaScript)

NPM

PHP

Pear

Perl

CPAN

We have tried to make the process as close to development on your local machine as possible. So, for example, with Python if you wanted to download the “default” PostgreSQL drivers (psycopg2) to your local machine you would use Pip:

$ pip install psycopg2

This would install the Psycopg2 drivers to a location where Python can see them on your local machine. The way to reproduce this functionality on OpenShift is to include the dependency in the appropriate “application metadata” file. When you include your dependencies in this file, OpenShift will notice the dependencies during the build process and then download the files and put them where your language runtime can see them. Table 6-2 presents a listing of all the files for a variety of languages.

Table 6-2. Files used for dependency declaration
LanguageDependency file

Java

pom.xml

Python

setup.py/requirements.txt

Ruby

Gemfile.lock

Node.js (JavaScript)

package.json

PHP

deplist.txt

Perl

deplist.txt

Let’s go ahead and add Psycopg2 to our project so we can use the library to connect to our database of insults. Go into your local Git repository and edit the setup.py file. We already have a dependency declaration for Flask (see Modifying Application Code), and now we are going to add one for Psycopg2. Your install dependencies section should look like this now:

install_requires=['Flask==0.10.1', 'psycopg2==2.5.2'],

Warning

The best practice on OpenShift is to always specify an exact version number for your dependencies. There are two reasons why:

  1. If you use >= the build process will always have to check to see if there is a newer version of the library available than what is currently installed. This will slow down your build process.
  2. There is the possibility that there will be a new version of the library that is incompatible with your code. Not explicitly stating a specific version number could lead to your application breaking when you don’t expect it.

The first time you git push with this new dependency, the build will take longer because of the download and build of the new dependency. After that OpenShift will use the cached version. This is particularly noticeable for Java developers with Maven builds, since the default pom.xml requires the full JEE dependency.

When you do your git push, you should see something like the following in the output:

...
remote: Processing dependencies for Insult-App==1.0
remote: Searching for psycopg2==2.5.2
remote: Best match: psycopg2 2.5.2
remote: Processing psycopg2-2.5.2-py2.7-linux-x86_64.egg
...

These are the lines where the OpenShift build process is adding the Psycopg2 library to the virtual environment for your application.

Warning

A common problem we see in the forums goes something like: “The application works fine on my local machine but when I deploy to OpenShift I get an error that LibraryX is not available.” This is usually a sign that you have not declared your dependency in the proper file or with the proper syntax for OpenShift to download it and make it available. Unless it is in your Git repository or declared as a dependency in the proper file, it will not be available to your application code.

Incorporating Your Own Binary Dependencies

For each programming language, there is a designated location in the Git repository where you can place your own binaries for your application and have the build pick them up (Table 6-3). For example, you would do this if you have a binary library that you use within your company that you do not want to put in a public repository or in the code base. This way you can reuse the library without exposing the code.

Table 6-3. Location to place your own libraries
LanguageLocation in repository for binaries

Java

More complicated as they have to be part of Maven; please see OpenShift knowledge base article E1040. The other option is to bundle all the libs in your WAR file and just deploy the WAR.

Python

libs

Ruby

[role="filename"]vendor/cache/{myfile}.gem

Node.js (JavaScript)

node_modules

PHP

libs

Perl

libs

Placing your libraries in these locations means you can use your own libraries, ensure a certain version of a library is used, or include nonpublic libraries.

Some of these languages also have the ability to point to a library in a different Git repository or in other places “on disk.” For example, in your Ruby application you can specify the location to your Gem in your Gemfile.lock file. This is a much more flexible method than using the location specified earlier. The same holds with setup.py or requirements.txt for Python; your metadata file can point to a GitHub repository or other publicly accessible locations.

Modifying Your Application to Use the Database

Now that we know how to pull in dependencies, let’s go ahead and modify our code to take advantage of the database. We are going to design the application so that our insult propagation crew can search out new insults, add them to the database, and have them appear without any code changes. We designed the database tables so we could pick from each adjective type and noun separately and add to each group separately.

We did such a nice job with the separation of concerns between our classes in our original application that we only have to modify insulter.py. We are going to replace the static lists of adjectives and nouns with calls to the database, but nothing else in the application has to change. Even within insulter.py we only have to modify one method. Hooray for clean code!

One quick note before we dig in: as much as we would like to believe Insult App will be hugely successful and allow us to retire early, this app will probably have only one or two users at a time (if we are lucky). Therefore, we are not going to add the overhead of having a connection pool for the database connections. Given that database connections take a relatively long time to establish, in any real production application you would want to use a connection pool for your database connections.

All right, on to the code!

Code to Connect to the Database

Since WSGI acts like CGI, where each class is spun up and run each time there is a request, we are just going to go ahead and create a method to open a database connection and then call it in the function where we retrieve the words to be used. Using Pyscopg2 is incredibly easy, and the environment variables put in by OpenShift allow us to establish the connection in a portable way. First, we define a method to get a cursor (the basic object that does all the database interaction). Here is the excerpt from insulter.py:

import psycopg2
...

def get_cursor():
    #open a connection
    conn = psycopg2.connect(database=os.environ['OPENSHIFT_APP_NAME'],
                      user=os.environ['OPENSHIFT_POSTGRESQL_DB_USERNAME'],
                      password=os.environ['OPENSHIFT_POSTGRESQL_DB_PASSWORD'],
                      host=os.environ['OPENSHIFT_POSTGRESQL_DB_HOST'],
                      port=os.environ['OPENSHIFT_POSTGRESQL_DB_PORT'] )
    #get a cursor from the connection
    cursor = conn.cursor()
    return cursor

While it is bad form even in noncloud applications to hardcode database connection parameters, in cloud applications it also has the potential to break your application. If, for some reason, operations needs to migrate your gear to a different set of servers and the IP addresses change, your application will still work if you used environment variables. The other benefit to using environment variables is that you can give your Git repository to another developer, who can push the code into his own version of the application, and it will just work because the environment variables in his version will point to the new information.

Code to Close the Database Connection

Whenever you open a database connection you eventually have to close it, or your application will ultimately stop working because you have used up all the connections. If you use a database pool, this can help with connection exhaustion, but as noted earlier, we are not using a pool. Here is the code from insulter.py to close the cursor:

def close_cursor(cursor):
    conn = cursor.connection
    cursor.close()
    conn.close()

Code to Query the Terms for the Insult

Now that we have a connection to the database, we need to query it for the words we want. Since we want to pick a word at random from the tables, we need to use a little bit of fancy SQL. We found an interesting solution to the problem on Stack Overflow for PostgreSQL. The basic idea is you use the OFFSET modifier in the SQL query. Here is the description of the OFFSET keyword in the PostgreSQL manual:

OFFSET says to skip that many rows before beginning to return rows… If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting to count the LIMIT rows that are returned.

We are basically just telling PostgreSQL to pick a random number between 1 and the total number of rows and use that as the offset for where to start returning results, and then just give us one result.

In the function, we pass in the cursor and the name of the table we want to execute the query against. Psycopg2 returns a Python tuple, so we just grab the first element in the tuple:

def get_word(cursor, table):
    sql = "select string from " + table + " offset random()*
     (select count(*) from " +  table + ") limit 1;"
    cursor.execute(sql)
    result = cursor.fetchone()
    return result[0]

Now that we have that function in place, we can basically replace all the lists and the random calls with just a simple set of calls to the get_word function. The flow now becomes open a cursor, make the calls, and then finally close the cursor—nice and simple:

def generate_insult():
    local_cursor = get_cursor()
    final_insult = get_word(local_cursor, "short_adjective") + " " +
     get_word(local_cursor, "long_adjective") + " " +
     get_word(local_cursor, "noun")
    close_cursor(local_cursor)
    return final_insult

What We Have Gained by Adding a Database

Now that we have changed over our application to use a database, we can add new terms without having to touch the code, build, and deploy. As a matter of fact, we could write a separate web page for people to add new terms and the insult page would pick them up on the fly. In this chapter, we have also learned how to add library dependencies to our projects on OpenShift, and finally, how to access a database in an OpenShift application. At this point our application is finished. From here on, we are going to talk more about how to interact and monitor the application behind the scenes.