Appendix G. Using Amazon Web Services

If you want to get set up to use Amazon and the Amazon cloud services for your data wrangling needs, you’ll first need to get a server set up for your use. We’ll review how to get your first server up and running here.

We covered some alternatives to AWS in Chapter 10, including DigitalOcean, Heroku, GitHub Pages, and using a hosting provider. Depending on your level of interest in different deployment and server environments, we encourage you to use several and see what works best for you.

AWS is popular as a first cloud platform, but it can also be quite confusing. We wanted to include a walkthrough to help you navigate the process. We can also highly recommend using DigitalOcean as a start into the cloud; their tutorials and walkthroughs are quite helpful.

Spinning Up an AWS Server

To spin up (or “launch”) a server, from the AWS console, select “EC2” under “Compute” (you’ll need to sign in or create an account to access the console). This will take to you the EC2 landing page. There, click the “Launch Instance” button.

At this point, you’ll be taken to a walkthrough to set up your instance. Whatever you select here can be edited, so don’t worry if you don’t know what to choose. This book provides suggestions to get a server up and running cheaply and quickly, but this doesn’t mean it will be the solution you need. If you run into an issue such as space, you may need a larger, and therefore more expensive, setting/instance.

That said, in the following sections we’ll walk you through our recommendations for this setup.

AWS Step 1: Choose an Amazon Machine Image (AMI)

A machine image is basically an operating system image (or snapshot). The most common operating systems are Windows and OS X. However, Linux-based systems are usually used for servers. We recommend the latest Ubuntu system, which at the time of writing is “Ubuntu Server 14.04 LTS (HVM), SSD Volume Type - ami-d05e75b8.”

AWS Step 2: Choose an Instance Type

The instance type is the size of the server you spin up. Select “t2.micro (Free tier eligible).” Do not size up until you know you need to, as you will be wasting money. To learn more about instances, check out the AWS articles on instance types and pricing.

Select “Review and Launch,” which takes you to Step 7.

AWS Step 7: Review Instance Launch

At the top of the page that appears, you will notice a message that says, “Improve your instances’ security. Your security group, launch-wizard-4, is open to the world.” For true production instances or instances with sensitive data, doing this is highly recommended, along with taking other security precautions. Check out the AWS article “Tips for Securing Your EC2 Instance”.

AWS Extra Question: Select an Existing Key Pair or Create a New One

A key pair is like a set of keys for the server, so the server knows who to let in. Select “Create a new key pair,” and name it. We have named ours data-wrangling-test, but you can call it any good name you will recognize. When you are done, download the key pair in a place where you will be able to find it later.

Lastly, click “Launch Instances.” When the instance launches, you will have an instance ID provided onscreen.

Note

If you are worried about your server costs, create billing alerts in your AWS preferences.

Logging into an AWS Server

To log into the server, you need to navigate to the instance in the AWS console to get more information. From the console, select EC2, then select “1 Running Instances” (if you have more than one, the number will be larger). You’ll see a list of your servers. Unless you provided one earlier, your server won’t have a name. Give your instance a name by clicking on the blank box in the list. We named ours data-wrangling-test for consistency.

To log into our server, we are going to follow the instructions in the AWS article about connecting to a Linux instance.

Get the Public DNS Name of the Instance

The public DNS name is the web address of your instance. If you have a value there that looks like a web address, continue to the next section. If the value is “--”, then you need to follow these additional steps (from StackOverflow):

  1. Go to console.aws.amazon.com.

  2. Go to Services (top nav) → VPC (near the end of the list).

  3. Open your VPCs (lefthand column).

  4. Select the VPC connected to your EC2.

  5. From the “Actions” drop-down, select “Edit DNS Hostnames.”

  6. Change the setting for “Edit DNS Hostnames” to “Yes.”

If you return to the EC2 instance, you should see it now has a public DNS name.

Prepare Your Private Key

Your private key is the .pem file you downloaded. It’s a good idea to move it to a folder you know and remember. For Unix-based systems, your keys should be in a folder in your home folder called .ssh. For Windows, the default is either C:\Documents and Settings\<username>\.ssh\ or C:\Users\<username>\.ssh. You should copy your .pem file to that folder.

Next, you need to run the chmod command to change the .pem permissions to 400. Changing the permissions to 400 means the file is only accessible to the owner. This keeps the file secure in a multiaccount computer environment:

chmod 400 .ssh/data-wrangling-test.pem

Log into Your Server

At this point, you have all the pieces you need to log into the server. Run the following command, but replace my-key-pair.pem with the name of your key pair and public_dns_name with your public web address:

ssh -i ~/.ssh/my-key-pair.pem_ ubuntu@_public_dns_name

For example:

ssh -i data-wrangling-test.pem ubuntu@ec2-12-34-56-128.compute-1.amazonaws.com

When prompted with Are you sure you want to continue connecting (yes/no)? type in yes.

At this point, your prompt will change slightly, showing you are in the console of the server you set up. You can now continue getting your server set up by getting your code onto the server and setting up automation to run on your machine. You can read more about deploying code to your new server in Chapter 14.

To exit your server, type Ctrl-C or Cmd-C.

Summary

Now you have your first AWS server up and running. Use the lessons learned in Chapter 14 to deploy code to your server and run your data wrangling in no time!