Appendix D. Advanced Python Setup

Early in the book, we set up system Python. Why? Because it is quick and easy to use. When you start using more complex libraries and tools, you will likely need a more advanced setup. An advanced Python setup on your machine is helpful when trying to organize projects. An advanced setup also helps if you need to run both Python 2.7 and Python 3+.

Warning

In this appendix, we walk you through setting up your Python environment in Expert mode. Because there are a lot of dependencies involved, it is entirely possible some parts of these instructions might not line up with your experience. To resolve issues, we suggest going to the Web to find, or ask, how to continue.

We’ll start by installing a couple of core tools, then install Python (2.7, but you could install 3+ at this point). Lastly, we’ll install and set up some virtual environments, which isolate projects so you can have different versions of a Python library for each project.

These instructions cover Mac, Windows, and Linux setups. As you read through each step, carefully follow the instructions for your particular operating system.

Step 1: Install GCC

The purpose of GCC (the GNU Compiler Collection) is to take Python libraries with C extensions and turn them into something your machine can understand and execute.

On a Mac, GCC is included in Xcode and Command Line Tools. You will need to download either one. In both cases, you will need an Apple ID for the download. Also, Xcode can take a while to download depending on your Internet connection (for me it took 20 minutes), so plan to take a break. If you are concerned with time or memory use, opt for Command Line Tools instead. Installing Xcode or Command Line Tools will not take as long. Make sure Xcode or Command Line Tools is installed before moving on to the installation of Homebrew.

If you are using Windows, Jeff Preshing has this helpful tutorial for installing GCC. If you are using Linux, GCC is installed on most Debian-based systems, or you can install it by simply running sudo apt-get install build-essential.

Step 2: (Mac Only) Install Homebrew

Homebrew manages packages on your Mac, which means you can type a command and Homebrew will aid in the installation.

Caution

Make sure either Xcode or Command Line Tools is done installing before you install Homebrew. Otherwise, you will have errors in your Homebrew installation.

To install Homebrew, open Terminal, and enter this line (follow any prompts that come up, including the one asking your permission to install Homebrew):

$ ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"

Pay attention to the output. Homebrew recommends running brew doctor to test and warn you of any issues with the installation. Depending on the state of your system, you might have various items to address. If you have no warnings returned, then continue to the next step.

Step 3: (Mac Only) Tell Your System Where to Find Homebrew

To use Homebrew, you need to tell your system where it’s located. To do this, you want to add Homebrew to your .bashrc file or other shell you are using (i.e., if you have a custom shell, you’ll need to add it there). The .bashrc file may not exist yet on your system; if it does exist, it will be hidden in your home directory.

All files that have a . at the beginning of their names do not appear when you type ls unless you explicitly request to see all of them. The purpose of this is twofold. First, if the files are not visible you are less likely to delete or edit them inappropriately. Second, these file types are not used regularly, so hiding them gives the system a cleaner appearance.

Let’s see what our directory might look like if we show all the files by adding some extra flags to ls. Make sure you’re in your home directory, and then enter the following command:

$ ls -ag

Your output will look something like this:

total 56
drwxr-xr-x+ 17 staff    578 Jun 22 00:08 .
drwxr-xr-x   5 admin    170 May 29 09:49 ..
-rw-------   1 staff      3 May 29 09:49 .CFUserTextEncoding
-rw-r--r--@  1 staff  12292 May 29 09:44 .DS_Store
drwx------   8 staff    272 Jun 10 00:45 .Trash
-rw-------   1 staff    389 Jun 22 00:07 .bash_history
drwx------   4 staff    136 Jun 10 00:35 Applications
drwx------+  5 staff    170 Jun 22 00:08 Desktop
drwx------+  3 staff    102 May 29 09:49 Documents
drwx------+ 10 staff    340 Jun 11 23:47 Downloads
drwx------@ 43 staff   1462 Jun 10 00:29 Library
drwx------+  3 staff    102 May 29 09:49 Movies
drwx------+  3 staff    102 May 29 09:49 Music
drwx------+  3 staff    102 May 29 09:49 Pictures
drwxr-xr-x+  5 staff    170 May 29 09:49 Public

We do not have a .bashrc file, so we will have to create one.

Tip

If you do have a .bashrc file, you should back it up in case you have any issues. Making a copy of your .bashrc is easiest on your command line. Simply run the following command to copy .bashrc to a new file called .bashrc_bkup:

$ cp .bashrc .bashrc_bkup

To create a .bashrc, first we need to make sure we have a .bash_profile file which is the file that will call the .bashrc file. If we add a .bashrc file without a .bash_profile file, our computer won’t know what to do with it.

Before starting, check if you have a .bash_profile file. If you do, it will be in the directory list produced by is -ag. If you don’t, then you will need to create it.

Tip

If you have a .bash_profile file, you should back it up so that if you have any issues you can restore to your original settings. Run the following command to copy your .bash_profile file to a new file called .bashrc_bkup:

$ cp ~/.bash_profile ~/.bash_profile_bkup

Then run this command to copy it to your desktop and rename it at the same time:

$ cp ~/.bash_profile ~/Desktop/bash_profile

If you are working with an existing .bash_profile, launch your editor and open the version you moved to your desktop. Add the following code to the bottom of the file. The code just says, “if there is a .bashrc file, then use it”:

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi

If you don’t already have a .bash_profile file, you’ll need to create a new file with these contents in your editor. Save the file to your desktop as bash_profile, without the dot in front.

Warning

Make sure you checked that .bash_profile and .bashrc didn’t already exist in your home directory. If they did, make sure you followed the instructions to create backups of the original files before continuing. If you don’t do this, when you execute the following code you could end up overwriting your original files, which could cause problems.

Now go back to Terminal and run the following command to rename the file and move it from the desktop to your home directory:

$ mv ~/Desktop/bash_profile .bash_profile

Now, if you run ls -al ~/, you will see that you have a .bash_profile file in your home directory. If you run more .bash_profile, you will see the code calling the .bashrc, which you put there.

Now that we have a .bash_profile file referring to the .bashrc, let’s edit the .bashrc file. Start by opening your current .bashrc or a new file in your text editor. Add the following line to the bottom of your .bashrc file. This will add the location of Homebrew to your $PATH variable in your settings. The new path will be prioritized over the old $PATH:

export PATH=/usr/local/bin:/usr/local/sbin:$PATH

Now, save that file to your desktop with the name bashrc, without the dot.

Use a Command-Line Shortcut for Your Code Editor

While we are updating settings in our .bashrc, let’s also create a shortcut to launch our code editor from the command line. This is not required, but it will make your life easier when you are navigating file directories and want to open a file in your code editor. Using your GUI to navigate the file structure will not be as efficient.

If you are using Atom, you already have a shortcut available when you install Atom and the shell commands. Sublime also has commands available for OS X.

If you are using another code editor, you can try typing the program name to see if it launches, or the program name followed by --help to see if it has any command-line help. We also recommend searching for “<your_program_name> command-line tools” and see if there are any helpful results.

Back in Terminal, run the following command to rename the file and move it from the desktop to your home directory:

$ mv ~/Desktop/bashrc .bashrc

At this point, if you run ls -al ~/, you will see that you have a .bashrc file and a .bash_profile in your home directory. Let’s confirm it worked by opening a new window in Terminal and checking out our $PATH variable. To check the variable, run the following command:

$ echo $PATH

You should get an output something like this:

/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin

Whatever your output is, you will see that the variable information (/usr/local/bin:/usr/local/sbin) added to our .bashrc now prepends the returned value.

If you do not see the new value in the variable, make sure you opened a new window. Settings changes do not load in your current Terminal window, unless you explicitly source the file into your current terminal (see the bash source command for more information).

Step 4: Install Python 2.7

To install Python 2.7 on a Mac, run the following command:

$ brew install python

If you would like to push forward with Python 3+, you can install that instead. To install Python 3+ on a Mac, run:

$ brew install python3

For Windows, you will need to follow the instructions in Chapter 1 to properly install from the Windows installer package. For Linux, you likely already have Python installed. It’s a good idea to install some extra Python tools in Linux by installing some Python developer packages.

After the process is complete, you will want to test that it worked properly.

Launch your Python interpreter in Terminal:

$ python

Then, run the following:

import sys
import pprint
pprint.pprint(sys.path)

Mac output looks similar to this:

>>> pprint.pprint(sys.path)
['',
 '/usr/local/lib/python2.7/site-packages/setuptools-4.0.1-py2.7.egg',
 '/usr/local/lib/python2.7/site-packages/pip-1.5.6-py2.7.egg',
 '/usr/local/Cellar/python/2.7.7_1/Frameworks/Python.framework/Versions/
 2.7/lib/python27.zip',
 '/usr/local/Cellar/python/2.7.7_1/Frameworks/Python.framework/Versions/
 2.7/lib/python2.7',
 '/Library/Python/2.7/site-packages',
 '/usr/local/lib/python2.7/site-packages']

If you are using a Mac, the output you received should have a bunch of file paths that start with /usr/local/Cellar/. If you do not see this, you may not have reloaded your settings in your Terminal window. Close your window, and then open a new one and try again. If this did not solve any issues you may have had during this process, return to the beginning of the setup and retrace your steps.

Debugging installation errors is a learning experience. If you have errors not documented in this section, open up your favorite search engine in your browser and search for the error. You are probably not the first one to experience the issue.

If you successfully completed this section, you can move on to the next step.

Step 5: Install virtualenv (Windows, Mac, Linux)

We’ve set up a second instance of Python, but now we want to set up a way of creating individual Python environments. This is where virtualenv helps, by isolating projects and dependencies from one another. If we have multiple projects, we can make sure individual requirements do not conflict.

To get started, we need Setuptools. When we installed Python, Setuptools came with it. Part of Setuptools is a command-line tool called pip, that we are going to use to install Python packages.

To install virtualenv, you will want to run the following command on your command line:

$ pip install virtualenv

After you run that command, part of the output should be the following: Successfully installed virtualenv. If you got that, then everything went well. If not, then you have another issue that you need to account for, so search around online for help.

Step 6: Set Up a New Directory

Before we continue, let’s create a directory in which to keep our project-related content. The exact location is a personal preference. Most people create a folder in their user home directory, for easy access and backups. You can put the directory anywhere you like that is both useful and memorable. On a Mac, to make a Projects folder in your home directory, run the following command in Terminal:

$ mkdir ~/Projects/

or for Windows:

> mkdir C:\Users\_your_name_\Projects

Then we are going to create a folder inside that folder to store the data-wrangling specific-code we will write. On a Mac, you can do that by running this command:

$ mkdir ~/Projects/data_wrangling
$ mkdir ~/Projects/data_wrangling/code

or for Windows:

> mkdir C:\Users\_your_name_\Projects\data_wrangling
> mkdir C:\Users\_your_name_\Projects\data_wrangling\code

Lastly, add a hidden folder in your home directory to use for virtualenv environments. Use this command on a Mac:

$ mkdir ~/.envs

or for Windows:

> mkdir C:\Users\_your_name_\Envs

If you’d like to hide your folder on Windows, you can do so by editing the attributes via the command line:

> attrib +s +h C:\Users\_your_name_\Envs

To unhide it, simply remove the attributes:

> attrib -s -h C:\Users\_your_name_\Envs

At this point, we have our code folder set up in a special file inside our Projects folder and our virtual environment folder properly set up in our home directory.

Step 7: Install virtualenvwrapper

virtualenv is a great tool, but virtualenvwrapper makes virtualenv easier to access and use. While it has many features not mentioned in this appendix, the most powerful feature is one of the simplest.

It takes a command like this:

source $PATH_TO_ENVS/example/bin/activate

And turns it into this:

workon example

Installing virtualenvwrapper (Mac and Linux)

To install virtualenvwrapper on Mac and Linux, run the following:

$ pip install virtualenvwrapper

Check the second-to-last line of the output to make sure everything installed correctly. For me that line says, Successfully installed virtualenvwrapper virtualenv-clone stevedore.

Updating your .bashrc

You also need to add some settings to your .bashrc. We are going to copy the file, edit it, then move it back to where it was.

First, make a backup of your .bashrc. If you already have one of these, you can skip this step. If you started with a new file, you will be creating your first backup of your .bashrc. To do so, type this command:

$ cp ~/.bashrc ~/.bashrc_bkup

Tip

I store my settings files on GitHub, so I always have a backup available. This is so if I make a mistake or my computer dies, I can always recover them. Make sure your home folder doesn’t get cluttered with 20 backups as you make adjustments over time to this file. You will rarely edit the .baschrc file, but when you do it is the kind of file that you want to back up before editing.

Open your .bashrc file using your code editor and add these three lines to the end of the file. If you did not use the same location for your Projects folder, then you will want to adjust the file paths accordingly:

export WORKON_HOME=$HOME/.envs                                          
export PROJECT_HOME=$HOME/Projects/                                     
source /usr/local/bin/virtualenvwrapper.sh

: Defines the WORKON_HOME variable. This is where your Python environments are stored. This should align with the environment folder you created earlier.
: Defines the PROJECT_HOME variable. This is where you store your code. This should align with the Projects (or for linux projects) folder you created earlier.
: Initiates virtualenvwrapper, which makes virtualenv easier to use.

When you’re done, save the file and open a new Terminal window where you will load the new settings. Now you will have an easy-to-use set of commands to work with virtual environments.

Installing virtualenvwrapper-win (Windows)

For Windows, there are some extra optional steps to make your life easier. First, you should install the Windows version of virtualenvwrapper. You can do so by running:

>pip install virtualenvwrapper-win

You should also add a WORKON_HOME environment variable. By default, virtualenvwrapper will expect you to have a folder named Envs in your User folder. If you’d rather set up your own folder for your virtual environments, do that and then add the WORKON_HOME environment variable set to the proper file path. If you haven’t set up environment variables before and want a quick how-to, there’s a nice walkthrough on Stack Overflow.

Tip

In order to work with more than one version of Python in Windows, it’s also a good idea to install pywin; this allows you to easily switch between Python versions.

Testing Your Virtual Environment (Windows, Mac, Linux)

Before we wrap up this section, let’s run a few tests to make sure everything is working. In a new terminal window, create a new virtual environment called test:

mkvirtualenv test

Your output should look something like this:

New python executable in test/bin/python2.7
Not overwriting existing python script test/bin/python (you must use
  test/bin/python2.7)
Installing setuptools, pip...done.

Note

If you wanted to create an environment with Python 3+ instead of Python 2.7, then you would define the python variable and point it to Python 3. First, identify where your instance of Python 3 is located:

which python3

Your output should look something like this:

/usr/local/bin/python3

Now, use that in your mkvirtualenv command to define a Python 3+ environment:

mkvirtualenv test --python=/usr/local/bin/python3

You should see “(test)” prepended to the being of your terminal prompt. That means the environment is currently activated.

Caution

If you got -bash: mkvirtualenv: command not found as your output instead, then your terminal is not recognizing virtualenvwrapper. First, check to make sure you opened a new Terminal or cmd window before running this code, which ensures the new settings are applied. If that’s not the issue, then go through the setup and confirm you followed all the steps.

If you were able to successfully create a virtual environment, then you are done with your setup!

Let’s deactivate our virtual environment and destroy it, as it was only a test. Run the following commands to remove the test environment:

deactivate
rmvirtualenv test

By this point, you’ve set up a second Python instance on your machine. You also have an environment where you can create isolated Python environments to protect one project from another. Now we are going to run through some exercises to make you familiar with your shiny new Python environment.

Learning About Our New Environment (Windows, Mac, Linux)

The examples shown here are for a Mac, but the process is the same on Windows and Linux. In this section, we are going to learn a little about how to use our setup and make sure all the components work together.

Let’s begin by creating a new environment called testprojects. We will activate and use this any time we need a quick environment to exercise a test or something else. To create it, run this command:

$ mkvirtualenv testprojects

After you create the environment, you should see that your Terminal prompt is now prepended with the name of the environment. For me, that looks like this:

(testprojects)Jacquelines-MacBook-Pro:~ jacquelinekazil$

Let’s install a Python library into our environment. The first library we will install is called ipython. In your active environment, run the following command:

(testprojects) $ pip install ipython

If this command is successful, then the last couple of lines of your output should look like this:

Installing collected packages: ipython, gnureadline
Successfully installed ipython gnureadline
Cleaning up...

Now, if you type pip freeze into your Terminal, you will see the libraries in your current environment along with the version number of each installation. The output should look like this:

gnureadline==6.3.3
ipython==2.1.0
wsgiref==0.1.2

This output tells us that, in the testprojects environment, we have three libraries installed: gnureadline, ipython, and wsgiref. ipython is what we just installed. gnureadline was installed when we installed ipython, because it is a dependency library. (This saves you from having to install dependent packages directly. Nice, right?) The third library is wsgiref. It was there by default, but isn’t a requirement.

So, we’ve installed a library called ipython, but what can we do with it? IPython is an easy-to-use alternative to the default Python interpreter (you can read even more about IPython in Appendix F). To launch IPython, simply type ipython.

You should see a prompt similar to this:

IPython 3.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]:

To test it out, type the following:

In [1]: import sys

In [2]: import pprint

In [3]: pprint.pprint(sys.path)

You should have the same output as earlier when we confirmed that our environment was working. sys and pprint are what are called standard library modules, which come prepackaged with Python.

Let’s exit out of IPython. There are two ways to do this. You can either press Ctrl+D, then type y for yes when prompted, or just type quit(). This works just like the default Python shell.

Once you have exited, you will be back on the command line. Now we have an environment called testprojects with three libraries installed. But what if we want to have another environment, because we are going to work on another project? First, type the following to deactivate the current environment:

$ deactivate

Then create a new one called sandbox:

$ mkvirtualenv sandbox

After you do this, you’ll be in your new environment. If you type pip freeze, you will see that you do not have IPython installed in this environment. This is because this is a fresh environment, which is completely separate from the testprojects environment. If we install IPython in this environment, it will install a second instance on our computer. This ensures anything we do in one environment doesn’t affect the others.

Why is this important? As you work on new projects, you will likely want different libraries and different versions of libraries installed. We recommend setting up one virtual environment for this book, but if you start on a new project, you’ll want to start a new virtual environment. As you can see, it’s easy to switch between environments as you change projects.

You may sometimes come across a repository with all of the requirements stored in a file called requirements.txt. The library’s authors used virtual environments and pip freeze to save a list so users can install the library and dependencies. To install from a requirements file, you need to run pip install -r requirements.txt.

We know how to create an environment and deactivate an environment, but we don’t know how to activate one that already exists. To activate our sample environment called sandbox, type the following command (if you are already in it, you may have to deactivate first to see the difference):

$ workon sandbox

Lastly, how do you destroy an environment? First, make sure you are not in the environment you want to remove. If you just typed in workon sandbox then you should be in the sandbox environment. To destroy it, you will want to first deactivate, then remove it:

$ deactivate
$ rmvirtualenv sandbox

Now, the only environment you should have is testprojects.

Advanced Setup Review

Your computer is now set up to run an advanced Python library. You should feel more comfortable interacting with your command line and working with installing packages. If you haven’t already, we also recommend you take a look at Appendix C to learn more about working with the command line.

Table D-1 lists the commands you will use most often with virtual environments.

Table D-1. Commands to review
Command	Action
`mkvirtualenv`	Creates an environment
`rmvirtualenv`	Destroys an environment
`workon`	Activates an environment
`deactivate`	Deactivates the environment that is currently active
`pip install`	Installs in the active environment^a
`pip uninstall`	Uninstalls in the active environment^b
`pip freeze`	Returns a list of installed libraries in the active environment
^a If no environment is active, the library will be installed on the secondary copy of Python on your system, which was installed using Homebrew. Your system Python should not be affected. ^b See previous footnote.

Previous Chapter

C. Learning the Command Line

Next Chapter

E. Python Gotchas

Table of Contents for Data Wrangling with Python

Appendix D. Advanced Python Setup

Warning

Step 1: Install GCC

Step 2: (Mac Only) Install Homebrew

Caution

Step 3: (Mac Only) Tell Your System Where to Find Homebrew

Tip

Tip

Warning

Step 4: Install Python 2.7

Step 5: Install virtualenv (Windows, Mac, Linux)

Step 6: Set Up a New Directory

Step 7: Install virtualenvwrapper

Installing virtualenvwrapper (Mac and Linux)

Updating your .bashrc

Tip

Installing virtualenvwrapper-win (Windows)

Tip

Testing Your Virtual Environment (Windows, Mac, Linux)

Note

Caution

Learning About Our New Environment (Windows, Mac, Linux)

Advanced Setup Review

Table of Contents for
Data Wrangling with Python