Chapter 2. Python Basics

Now that you are all set up to run Python on your computer, let’s go over some basics. We will build on these initial concepts as we move through the book, but we need to learn a few things before we are able to continue.

In the previous chapter, you tested your installation with a couple of lines of code:

import sys
import pprint
pprint.pprint(sys.path)

By the end of this chapter, you will understand what is happening in each of those lines and will have the vocabulary to describe what the code is doing. You will also learn about different Python data types and have a basic understanding of introductory Python concepts.

We will move quickly through this material, focusing on what you need to know to move on to the next chapters. New concepts will come up in future chapters as we need them. We hope this approach allows you to learn by applying these new concepts to datasets and problems that interest you.

Before we continue, let’s launch our Python interpreter. We will be using it to run our Python code throughout this chapter. It is easy to skim over an introductory chapter like this one, but we cannot emphasize enough the importance of physically typing what you see in the book. Similar to learning a spoken language, it is most useful to learn by doing. As you type the exercises in this book and run the code, you will encounter numerous errors, and debugging (working through these errors) will help you gain knowledge.

Basic Data Types

In this section, we will go over simple data types in Python. These are some of the essential building blocks for handling information in Python. The data types we will learn are strings, integers, floats, and other non–whole number types.

Strings

The first data type we will learn about is the string. You may not have heard the word string used in this context before, but a string is basically text and it is denoted by using quotes. Strings can contain numbers, letters, and symbols.

These are all strings:

'cat'
'This is a string.'
'5'
'walking'
'$GOObarBaz340   '

If you enter each of those values into your Python interpreter, the interpreter will return them back to you. The program is saying, “Hey, I heard you. You said, 'cat' (or whatever you entered).”

The content of a string doesn’t matter as long as it is between matching quotes, which can be either single or double quotes. You must begin and end the string with the same quote (either single or double):

'cat'
"cat"

Both of these examples mean the same thing to Python. In both cases, Python will return 'cat', with single quotes. Some folks use single quotes by convention in their code, and others prefer double quotes. Whichever you use, the main thing is to be consistent in your style. Personally, we prefer single quotes because double quotes require us to hold down the Shift key. Single quotes let us be lazy.

Integers and Floats

The second and third data types we are going to learn about are integers and floats, which are how you handle numbers in Python. Let’s begin with integers.

Integers

You may remember integers from math class, but just in case you don’t, an integer is a whole number. Here are some examples:

If you enter those into your Python interpreter, the interpreter will return them back to you.

Notice in the string example in the previous section, we had a '5'. If a number is entered within quotes, Python will process the value as a string. In the following example, the first value and second value are not equal:

5
'5'

To test this, enter the following into your interpreter:

5 == '5'

The == tests to see if the two values are equal. The return from this test will be true or false. The return value is another Python data type, called a Boolean. We will work with Booleans later, but let’s briefly review them. A Boolean tells us whether a statement is True or False. In the previous statement, we asked Python whether 5 the integer was the same as '5' the string. What did Python return? How could you make the statement return True? (Hint: try testing with both as integers or both as strings!)

You might be asking yourself why anyone would store a number as a string. Sometimes this is an example of improper use—for example, the code is storing '5' when the number should have been stored as 5, without quotes. Another case is when fields are manually populated, and may contain either strings or numbers (e.g., a survey where people can type five or 5 or V). These are all numbers, but they are different representations of numbers. In this case, you might store them as strings until you process them.

One of the most common reasons for storing numbers as strings is a purposeful action, such as storing US postal codes. Postal codes in the United States consist of five numbers. In New England and other parts of the northeast, the zip codes begin with a zero. Try entering one of Boston’s zip codes into your Python interpreter as a string and as an integer. What happens?

'02108'
02108

Python will throw a SyntaxError in the second example (with the message invalid token and a pointer at the leading zero). In Python, and in numerous other languages, “tokens” are special words, symbols, and identifiers. In this case, Python does not know how to process a normal (non-octal) number beginning with zero, meaning it is an invalid token.

Floats, decimals, and other non–whole number types

There are multiple ways to tell Python to handle non–whole number math. This can be very confusing and appear to cause rounding errors if you are not aware how each non–whole number data type behaves.

When a non–whole number is used in Python, Python defaults to turning the value into a float. A float uses the built-in floating-point data type for your Python version. This means Python stores an approximation of the numeric value—an approximation that reflects only a certain level of precision.

Notice the difference between the following two numbers when you enter them into your Python interpreter:

2
2.0

The first one is an integer. The second one is a float. Let’s do some math to learn a little more about how these numbers work and how Python evaluates them. Enter the following into your Python interpreter:

2/3

What happened? You got a zero value returned, but you were likely expecting 0.6666666666666666 or 0.6666666666666667 or something along those lines. The problem was that those numbers are both integers and integers do not handle fractions. Let’s try turning one of those numbers into a float:

2.0/3

Now we get a more accurate answer of 0.6666666666666666. When one of the numbers entered is a float, the answer is also a float.

As mentioned previously, Python floats can cause accuracy issues. Floats allow for quick processing, but, for this reason, they are more imprecise.

Computationally, Python does not see numbers the way you or your calculator would. Try the following two examples in your Python interpreter:

0.3
0.1 + 0.2

With the first line, Python returns 0.3. On the second line, you would expect to see 0.3 returned, but instead you get 0.30000000000000004. The two values 0.3 and 0.30000000000000004 are not equal. If you are interested in the nuances of this, you can read more in the Python docs.

Throughout this book, we will use the decimal module (or library) when accuracy matters. A module is a section or library of code you import for your use. The decimal module makes your numbers (integers or floats) act in predictable ways (following the concepts you learned in math class).

In the next example, the first line imports getcontext and Decimal from the decimal module, so we have them in our environment. The following two lines use getcontext and Decimal to perform the math we already tested using floats:

from decimal import getcontext, Decimal
getcontext().prec = 1
Decimal(0.1) + Decimal(0.2)

When you run this code, Python returns Decimal('0.3'). Now when you enter print Decimal('0.3'), Python will return 0.3, which is the response we originally expected (as opposed to 0.30000000000000004).

Let’s step through each of those lines of code:

from decimal import getcontext, Decimal     
getcontext().prec = 1                       
Decimal(0.1) + Decimal(0.2)

: Imports getcontext and Decimal from the decimal module.
: Sets the rounding precision to one decimal point. The decimal module stores most rounding and precision settings in a default context. This line changes that context to use only one-decimal-point precision.
: Sums two decimals (one with value 0.1 and one with value 0.2) together.

What happens if you change the value of getcontext().prec? Try it and rerun the final line. You should see a different answer depending on how many decimal points you told the library to use.

As stated earlier, there are many mathematical specifics you will encounter as you wrangle your data. There are many different approaches to the math you might need to perform, but the decimal type allows us greater accuracy when using nonwhole numbers.

We’ve learned about strings, integers, and floats/decimals. Let’s use these basic data types as building blocks for some more complex ones.

Data Containers

In this section, we’ll explain data containers, which hold multiple data points. It should be noted, however, that these containers are data types as well. Python has a few common containers: variables, lists, and dictionaries.

Variables

Variables give us a way to store values, such as strings or numbers or other data containers. A variable is made of a string of characters, which is often a lowercase word (or words connected with underscores) that describes what is contained.

Let’s try creating a simple variable. In your Python interpreter, try the following:

filename = 'budget.csv'

If you entered this correctly, your interpreter should return nothing. This is different from when we entered a string into the Python interpreter. If you simply entered 'budget.csv' into your Python interpreter, it would output 'budget.csv'.

When you create a variable, you are assigning what the program would normally output to the variable as its value. That is why nothing is returned when you create a new variable. In our example, our variable is called filename and it holds the string we typed ('budget.csv') as its value.

Object-Oriented Programming

You may have heard of object-oriented programming, or OOP for short. Python is an object-oriented programming language. The “object” in OOP can be any of the data types we learned about in this chapter such as strings, variables, numbers or floats.

In the example given in the text, our object is a string and it is stored now in filename. Every variable we define is a Python object. In Python, we use objects to store data we need later. These objects often have different qualities and actions they can perform, but they are all objects.

For example, each integer object can be added to another integer using a + symbol (the addition operator). As you continue learning Python, you will learn more of the qualities and actions of these objects and their underlying types—and come to appreciate object-oriented programming as a result!

When we created a string of letters and assigned it to the variable called filename, we followed some general variable naming principles. Don’t worry about memorizing these rules, but do keep them in mind if you receive an error in your code after defining a new variable:

Underscores are OK, hyphens are not.
Numbers are OK, but variable names cannot start with a number.
For reading ease, use lowercase letters with words separated by underscores.

Try the following code:

1example = 'This is going to break.'

What happened? What kind of error did you get? You should have gotten a syntax error, because you violated the second rule.

As long as you do not break Python’s rules around naming variables, you can name the variable almost anything. To illustrate:

horriblevariablenamesarenotdescriptiveandtoolong = 'budget.csv'

As you can tell, this variable name is too long and not descriptive. Also, the lack of underscores makes it hard to read. What makes a good variable name? Ask yourself: What is something that will make sense to me six months from now and help me understand the code when I have forgotten everything?

Let’s move on to a more reasonable variable name—cats. The value doesn’t have to be a filename, as in our previous example. Your variables can have a variety of values and names. Let’s pretend we are counting our cats, so we want to assign an integer to the variable cats:

cats = 42

If our Python script keeps track of how many cats we have, we don’t need to know the exact value at any one point in time. All we need to know is that the value is stored in the variable cats, so if we call cats in our interpreter or use it in another part of our code, it will always return the current number of cats.

To call a variable is to ask Python for its value. Let’s call cats. Type cats into your interpreter. You should get 42 in return. When you type filename, you should get the string 'budget.csv' in return. Try this on your machine:

>>> cats
42
>>> filename
'budget.csv'
>>>

If you type in a variable name that does not exist (or if you misspelled either of those) you will see the following error:

>>> dogs
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'dogs' is not defined

As stated earlier, it is important to learn how to read errors, so you can understand what you did wrong and how to fix it. In this example, the error says, dogs is not defined which means we did not define a variable named dogs. Python doesn’t know what we are asking it to call because we have not defined that variable.

You would get the same error if your forgot to include the quotes in 'budget.csv' in our first example. Try this in your Python interpreter:

filename = budget.csv

The error returned is NameError: name budget is not defined. This is because Python does not know budget.csv is supposed to be a string. Remember, a string is always denoted using quotes. Without those quotes, Python tries to interpret it as another variable. The main takeaway from this exercise is to note which line the error is on and ask yourself, what might be incorrect? In our dogs example, the error message tells us it’s on line 1. If we had many lines of code, the error might show line 87.

All of the examples presented so far have been short strings or integers. Variables can also hold long strings—even ones spanning multiple lines. We chose to use short strings in our examples because long strings are not fun for you (or us) to type.

Try out a variable that holds a long string. Note that our string also has a single quote, meaning we must use double quotes to store it:

recipe = "A recipe isn't just a list of ingredients."

Now if you type recipe, you will get the long string that was stored:

>>>recipe
"A recipe isn't just a list of ingredients."

Strings or integers are not required data types for variables. Variables can hold all sorts of different Python data types, which we will learn more about in the following sections.

Lists

A list is a group of values that have some relationship in common. You use a list in Python similarly to how you would use it in normal language. In Python, you can create a list of items by placing them within square brackets([]) and separating them with commas.

Let’s make a list of groceries in Python:

['milk', 'lettuce', 'eggs']

Note

This list is made of strings, not variables. This is recognizable, because the words are enclosed in quotes. If these were variables, they would not have the quotes around them.

If you press Return, Python will return the following:

['milk', 'lettuce', 'eggs']

You have made your first Python list: a list of strings. You can make lists of any Python data type, or any mixture of data types (i.e., floats and strings). Let’s make a list of integers and floats:

[0, 1.0, 5, 10.0]

Now, let’s store our list in a variable, so we can use it later in our code. Variables are helpful because they prevent us from having to type out our data time and time again. Typing out data by hand is error-prone and isn’t very efficient if your list is, say, 5,000 items long. Just as we mentioned earlier, variables are a way to store values in an aptly named container.

Try the following in your Python interpreter:

shopping_list = ['milk', 'lettuce', 'eggs']

When you press Return, you should see a new line. It will appear as if nothing happened. Remember earlier when we had the list echoed back to us? Now, Python is storing the list in the shopping_list variable. If you call your variable by typing shopping_list into your Python prompt, you should get the following returned:

shopping_list
['milk', 'lettuce', 'eggs']

Lists can also store variables. Let’s say we have variables representing counts of animals we are tracking in an animal shelter:

cats = 2
dogs = 5
horses = 1

Now, we can take the counts of those animals and put them in a list:

animal_counts = [cats, dogs, horses]

If you enter animal_counts into your Python interpreter, Python will return the following value:

[2, 5, 1]

The variables hold the information for us. When we type the variables, Python returns the underlying values stored in our variables.

You can also create lists of lists. Let’s say we have a list of names for our animals:

cat_names = ['Walter', 'Ra']
dog_names = ['Joker', 'Simon', 'Ellie', 'Lishka', 'Fido']
horse_names = ['Mr. Ed']
animal_names = [cat_names, dog_names, horse_names]

If you enter animal_names into your Python interpreter, Python will return the following value:

[['Walter', 'Ra'], ['Joker', 'Simon', 'Ellie', 'Lishka', 'Fido'], ['Mr. Ed']]

You didn’t have to type out all those names to create a list of lists. The original variables (cat_names, dog_names, horse_names), which are lists, are still accessible. For example, if you type cat_names, you will get ['Walter', 'Ra'] in return.

Now that we’ve explored lists, we’ll move on to a slightly more complex data container, called a dictionary.

Dictionaries

A dictionary is more complex than a variable or list, and it is aptly named. Think of a Python dictionary like a dictionary in the traditional sense of the word—as a resource you can use to look up words to get their definitions. In a Python dictionary, the words you look up are called the keys and the definitions of these words are called the values. In Python, the key is something that points to a value.

Let’s go back to our animals. animal_numbers holds a list of the different numbers of animals we have, but we don’t know which number belongs to which animal type. A dictionary is a great way to store that information.

In the following example, we are using the animal types as the keys, and the counts of each animal type as the values:

animal_counts = {'cats': 2, 'dogs': 5, 'horses': 1}

If we want to access one of the values using a key, we can do so by accessing the key from the dictionary (like looking up a word in a normal dictionary). To perform this lookup in Python—say, for the number of dogs we have—we can type the following:

animal_counts['dogs']

You should see 5 returned, because we set the key 'dogs' equal to the value 5 in our dictionary ('dogs': 5). As you can see, a dictionary is very useful when you have matching keys and values you want to store. Dictionaries can be very powerful depending on what your needs are, so let’s take a longer look at using lists with dictionaries.

With our earlier list of animal names, it was hard to tell which list of names belonged to which type of animal. It was not clear which list contained the names of the cats, which one had the names of the dogs, and which held the names of the horses. With a dictionary, however, we can make this distinction clearer:

animal_names = {
    'cats': ['Walter', 'Ra'],
    'dogs': ['Joker', 'Simon', 'Ellie', 'Lishka', 'Fido'],
    'horses': ['Mr. Ed']
    }

Here is another way to write the same underlying values using more variables:

cat_names = ['Walter', 'Ra']                                    
dog_names = ['Joker', 'Simon', 'Ellie', 'Lishka', 'Fido']
horse_names = ['Mr. Ed']

animal_names = {
    'cats': cat_names,                                          
    'dogs': dog_names,
    'horses': horse_names
    }

: This line defines the variable cat_names as a list of cat names (a list of strings).
: This line uses the variable cat_names to pass that list of names as the value for the key 'cats' in the dictionary.

Both versions give us the same dictionary, although in slightly different ways.¹ As you learn more Python, you will be better able to determine when defining more variables makes sense and when it is not useful. For now, you can see it is easy to use many different defined variables (like cat_names and dog_names) to create new variables (like animal_names).

Tip

While Python does have spacing and formatting rules, you do not have to format a dictionary as we have done here. However, your code should be as easy to read as possible. Making sure your code is readable is something for which you and other developers you work with will be thankful.

What Can the Various Data Types Do?

Each of the basic data types can do a variety of things. Here is a list of the data types we’ve learned about so far, followed by examples of the kinds of actions you can tell them to do:

Strings
- Change case
- Strip space off the end of a string
- Split a string
Integers and decimals
- Add and subtract
- Simple math
Lists
- Add to or subtract from the list
- Remove the last item of the list
- Reorder the list
- Sort the list
Dictionaries
- Add a key/value pair
- Set a new value to the corresponding key
- Look up a value by the key

Note

We’ve purposely not mentioned variables in this list. The things a variable can do depend on the item it contains. For example, if a variable is a string, then it can do everything a string can do. If a variable is a list, then it can do different things only lists can do.

Think of the data types as nouns and the things they can do as verbs. For the most part, the things data types can do are called methods. To access a data type’s method, or make the data type do something, you use dot notation (.). For example, if you have a string assigned to a variable you named foo, you can call the strip method of that string by typing foo.strip(). Let’s look at few of these methods in action.

Note

When we call a string’s methods, these actions are part of the default Python libraries every Python installation shares (similar to the default applications that come preinstalled on your phone). These methods will be there on every computer running Python, so every Python string can share the same methods (just like every phone can make a phone call and every Apple phone can send an iMessage). A vast assortment of built-in methods and basic data types are included in the Python standard library (also known as stdlib), including the Python data types you are now using.

String Methods: Things Strings Can Do

Let’s use our initial variable, filename. Originally, we defined the variable using filename = 'budget.csv'. That was pretty convenient. Sometimes, though, things are not so convenient. Let’s go through a few examples:

filename = 'budget.csv        '

You’ll notice our filename string now has a lot of extra spaces we probably need to strip off. We can use the Python string’s strip method, a built-in function that removes unnecessary whitespace from the beginning and end of a string:

filename = 'budget.csv        '
filename = filename.strip()

Warning

If you do not reassign the variable (set filename equal to the output of filename.strip()), then the modifications you made to filename will not be stored.

If you enter filename in your Python interpreter, you should now see the spaces have been stripped off.

Let’s say our filename needs to be in all capital letters. We can transform all the letters to uppercase using the Python string’s built-in upper method:

filename = 'budget.csv'
filename.upper()

Your output should now show that we have properly uppercased the filename:

'BUDGET.CSV'

In this case, we did not reassign the uppercase string to the variable filename. What happens when you call filename in your interpreter again? The output should still read 'budget.csv'. If you don’t want to modify your variable but want to transform it for one-time use, you can call methods like upper, as they will return the modified string without changing the underlying variable.

What if we wanted to reassign the variable by storing the return value using the same variable name? In the following, we are changing the value of the filename variable to be uppercase:

filename = 'budget.csv'         
filename = filename.upper()

: If you call filename after this line, the output will be 'budget.csv'.
: If you call filename after this line, the output will be 'BUDGET.CSV'.

We could condense this code to run on one line:

filename = 'budget.csv'.upper()

The number of lines you use for your code is sometimes a matter of personal style or preference. Make choices that make sense to you but keep your code clear, easy to read, and obvious.

We only covered two string methods in these examples, strip and upper, but there are many other built-in string methods. We will learn more about these methods as we work with strings in our data wrangling.

Numerical Methods: Things Numbers Can Do

Integers and floats/decimals are mathematical objects. If you enter 40 + 2, Python returns 42. If you want to store the answer in a variable, you assign it to a variable just as we did in the string examples:

answer = 40 + 2

Now, if you type answer, you will get 42 in return. Most of the things you can do with integers are pretty predictable, but there may be some special formatting you need to use so your Python interpreter understands the math you want to perform. For example, if you wanted to square 42, then you would enter 42**2.

Integers, floats, and decimals also have many other methods, some of which we will encounter as we learn about data wrangling.

Addition and Subtraction

You can also apply addition to other Python data types, such as strings and lists. Try the following:

'This is ' + 'awesome.'

and:

['Joker', 'Simon', 'Ellie'] + ['Lishka', 'Turtle']

What happens if you try to use subtraction? What does the error produced by the following line tell you?

['Joker', 'Simon', 'Ellie', 'Lishka', 'Turtle'] - ['Turtle']

You should receive an error saying TypeError: unsupported operand type(s) for -: 'list' and 'list'. This tells us Python lists support addition, but not subtraction. This is because of choices Python’s developers have made in what methods each type should support. If you want to read about how to perform subtraction on a list, check out the Python list’s remove method.

List Methods: Things Lists Can Do

There are a few must-know methods for lists. Let’s start with an empty list and use a method to add values to it.

First, define an empty list like so:

dog_names = []

If you enter dog_names into your interpreter, it will return [], Python’s way of showing an empty list. Earlier in the chapter, we had a bunch of names stored in that variable, but we redefined it in the last line so now it is an empty list. The built-in append method adds items to the list. Let’s use it now and add “Joker” to the list:

dog_names.append('Joker')

Now, if you enter dog_names, your list will return one item: ['Joker'].

On your own, build out the list using the append method until you have a list that looks like this:

['Joker', 'Simon', 'Ellie', 'Lishka', 'Turtle']

Let’s say you accidentally added 'Walter', one of the cat names:

dog_names.append('Walter')

You can remove it with the Python list’s built-in remove method:

dog_names.remove('Walter')

While there are many more built-in methods for lists, append and remove are among the most commonly used.

Dictionary Methods: Things Dictionaries Can Do

To learn some useful dictionary methods, let’s build our dictionary of animal counts from scratch.

In the next example, we create an empty dictionary. Then, we add a key and define the value of that key:

animal_counts = {}
animal_counts['horses'] = 1

Adding an object to a dictionary (animal_counts['horses']) is a little different from adding an object to a list. This is because a dictionary has both a key and a value. The key in this case is 'horses' and the value is 1.

Let’s define the rest of the dictionary with our animal counts:

animal_counts['cats'] = 2
animal_counts['dogs'] = 5
animal_counts['snakes'] = 0

Now when you type animal_counts in your Python interpreter, you should get the following dictionary: {'horses': 1, 'cats': 2, 'dogs': 5, 'snakes': 0}. (Since Python dictionaries don’t store order, your output might look different but should contain the same keys and values.)

We are working with a very small example, but programming is not always so convenient. Imagine a dictionary of animal counts for all domesticated animals in the world. As the programmer, we might not know all of the different types of animal this animal_counts dictionary holds. When handling a large and unknown dictionary, we can use dictionary methods to tell us more about it. The following command returns all the keys the dictionary holds:

animal_counts.keys()

If you have been following along with the exercises, if you type this in your interpreter will return a list of keys that looks like this:

['horses', 'cats', 'dogs', 'snakes']

You can take any of those keys and retrieve the value associated with it from the dictionary. The following lookup will return the number of dogs:

animal_counts['dogs']

The output for this line is 5.

If you wanted to, you could save that value in a new variable so you don’t have to look it up again:

dogs = animal_counts['dogs']

Now, if you enter the variable dogs directly, Python will return 5.

Those are some of the basic things you can do with a dictionary. Just like with strings and lists, we will learn more about dictionaries as we apply more complex problems to our code.

Helpful Tools: type, dir, and help

There are a couple of built-in tools in the Python standard library that can help you identify what data types or objects you have and what things you can do with them (i.e., what their methods are). In this section, we will learn about three tools that come as part of the Python standard library.

type

type will help you identify what kind of data type your object is. To do this in your Python code, wrap the variable in type()—for example, if the variable name is dogs, then you would enter type(dogs) into the Python prompt. This is extremely helpful when you are using a variable to hold data and need to know what type of data is in the variable. Consider the zip code example from earlier in the chapter.

Here, we have two different uses for the value 20011. In the first case, it is a zip code stored as a string. In the second case, it is an integer:

'20011'
20011

If those values were stored in variables, they would be further obscured and we might not know or remember whether we used a string or an integer.

If we pass the value to the built-in method type, then Python will tell us what kind of data type the object is. Try it:

type('20011')
type(20011)

The first line returns str. The second line returns int. What happens when you pass a list to type? And a variable?

Identifying the type of an object can be very helpful when you are trying to troubleshoot an error or work with someone else’s code. Remember when we tried to subtract a list from another list (in “Addition and Subtraction”)? Well, you cannot subtract a string from a string either. So, the string '20011' has very different possible methods and use cases than the integer 20011.

dir

dir will help you identify all the things a particular data type can do, by returning a list of built-in methods and properties. Let’s try it out with the string 'cat,dog,horse':

dir('cat,dog,horse')

For now, ignore everything at the beginning of the returned list (the strings starting with double underscores). These are internal or private methods Python uses.

The methods that are most useful are contained in the second part of the returned list output. Many of these methods are obvious, or self-documenting. You should see some of the methods we used on strings earlier in this chapter:

 [...,
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '_formatter_field_name_split',
 '_formatter_parser',
 'capitalize',
 'center',
 'count',
 'decode',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'index',
 'isalnum',
 'isalpha',
 'isdigit',
 'islower',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

If you look at the string 'cat,dog,horse', it looks like it is a list saved in a string. It’s actually a single value, but with the Python string’s built-in split method we can divide the string into smaller pieces by splitting it on the comma character, like so:

'cat,dog,horse'.split(',')

Python will return a list:

['cat', 'dog', 'horse']

Now let’s call the dir method on our list:

dir(['cat', 'dog', 'horse'])

There are not as many options as for strings, but let’s try a few. First, let’s turn the list into a variable. You should know how to assign the list to a variable by now, but here’s an example:

animals = ['cat', 'dog', 'horse']

Now, let’s try some new methods we found using dir on a list with our variable animals:

animals.reverse()
animals.sort()

After you run each of those lines, print out the value of animals so you can see how the the method has modified the list. What output did you expect? Was it the same that you saw? Try using the dir method on integers and floats. (Hint: dir expects you to pass only one object, so try dir(1) or dir(3.0)). Are there methods you didn’t expect?

As you can see, dir gives you insight into the built-in methods for each Python data type; these methods can prove valuable when wrangling data using Python. We recommend taking time to experiment with the listed methods that interest you and testing more methods with different data types.

help

The third helpful built-in Python method we will review in this chapter is the help method. This method will return the documentation for an object, method, or module—although it is often written in a very technical (sometimes cryptic) manner. Let’s review the help for the split method we used in the previous section. If you didn’t know you needed to put the character you wanted to split the string on inside the parentheses, how would you know what the Python string’s split method expected? Let’s pretend we didn’t know how to use split and called it without passing ',':

animals = 'cat,dog,horse'
animals.split()

This code returns the following:

['cat,dog,horse']

Looks good, right? Not upon closer inspection. As we can see, Python took our string and put it into a list, but didn’t split the words into pieces using the commas. This is because the built-in split method defaults to splitting the string on spaces, not commas. We have to tell Python to split on the commas by passing a comma string (',') into the method.

To help us understand how the method works, let’s pass it to help. First, we have to redefine our animals variable, because we turned it into a list. Let’s turn it back into a string, then look up how split works:

animals = 'cat,dog,horse'
help(animals.split)

: This line passes animals.split—without the ()—to the help method. You can pass any object, method, or module to the help method, but as seen here, you should not include the end parentheses when passing methods.

Python returns the following:

split(...)
    S.split([sep [,maxsplit]]) -> list of strings

    Return a list of the words in the string S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are removed
    from the result.

The first line of the description reads: S.split([sep [,maxsplit]]) → list of strings. In English, this tells us that for a string (S) we have a method (split) with a first possible argument (a.k.a. thing we can pass), sep, and a second possible argument, maxsplit. The square brackets ([]) around the argument names indicate that they are optional, not mandatory. This method returns (->) a list of strings.

The following line reads: "Return a list of the words in the string S, using sep as the delimiter string." sep is an argument passed to the split method that is used as a separator. A delimiter is a character or series of characters used to separate fields. For example, in a comma-delimited file, the comma is the delimiter. The comma is also the delimiter in the string we created, as it separates the words we want in our list.

Tip

Once you have finished reading documentation (using arrows to scroll up and down), you can exit help by typing q.

The help description also tells us that spaces, or whitespace, are the default delimiter if no other delimiter is specified. This tells us that if we had a string 'cat dog horse' the split method would not require us to pass a delimiter inside the (). As you can see, the built-in help method can teach you a lot about what a method expects you to use and whether it’s a good fit for the problem you are solving.

Putting It All Together

Let’s test our new skills. Try the following:

Create a string, a list, and a dictionary.
Look up the possible methods of each of these data types using the dir method.
Try applying some of the built-in methods you discovered, until one throws an error.
Look up the method documentation using help. Try to figure out what the method does and what you might need to do differently to make it work.

Congrats! You just learned how to program. Programming is not about memorizing everything; rather, it is about troubleshooting when things go awry.

What Does It All Mean?

At the beginning of the chapter, we promised that by the end you would understand these three lines:

import sys
import pprint
pprint.pprint(sys.path)

Knowing what we now know, let’s break it down. In “Floats, decimals, and other non–whole number types”, we imported the decimal library. It looks like we are importing some modules from the Python standard library here—sys and pprint.

Let’s get some help on these. (Make sure you import them, or help will throw an error!) Because pprint is an easier read, let’s look at that one first:

>>>import pprint
>>>help(pprint.pprint)

Help on function pprint in module pprint:

pprint(object, stream=None, indent=1, width=80, depth=None)
    Pretty-print a Python object to a stream [default is sys.stdout].

Excellent. According to the pprint.pprint() documentation, the method outputs an easy-to-read display of whatever was passed to it.

As we learned in the previous chapter, sys.path shows where Python looks to find modules. What kind of type is sys.path?

import sys
type(sys.path)

A list. We know how to use lists! We now also know if we pass a list to pprint.pprint, it makes it look really nice. Let’s try to apply this to our list of lists holding animal names. First, let’s add a few more names to make it really messy:

animal_names = [
    ['Walter', 'Ra', 'Fluffy', 'Killer'],
    ['Joker', 'Simon', 'Ellie', 'Lishka', 'Fido'],
    ['Mr. Ed', 'Peter', 'Rocket','Star']
    ]

Now, let’s pprint the variable animal_names:

pprint.pprint(animal_names)

What we get in return is the following:

[['Walter', 'Ra', 'Fluffy', 'Killer'],
 ['Joker', 'Simon', 'Ellie', 'Lishka', 'Fido'],
 ['Mr. Ed', 'Peter', 'Rocket', 'Star']]

To summarize, here is what each of those original lines of code does:

import sys      
import pprint   
pprint.pprint(sys.path)

: Imports Python’s sys module
: Imports Python’s pprint module
: Passes sys.path, a list, to pprint.pprint so the list is displayed in a way that’s clear and easy to read

If you pass a dictionary to pprint.pprint, what happens? You should see a well-formatted dictionary output.

Summary

Data types and containers are how Python understands and stores data. There are more types than the few core ones we learned about in this chapter, which are shown in Table 2-1.

Table 2-1. Data types
Name	Example
String	`'Joker'`
Integer	`2`
Float	`2.0`
Variable	`animal_names`
List	`['Joker', 'Simon', 'Ellie', 'Lishka', 'Fido']`
Dictionary	`{'cats': 2, 'dogs': 5, 'horses': 1, 'snakes': 0}`

As you know, some data types can be contained within others. A list can be a bunch of strings or integers or a mixture of the two. A variable can be a list or a dictionary or a string or a decimal. As we saw with our variable animal_names, a list can also be a list of lists. As we gain more Python knowledge, we will learn more about these data types, how they work and how we can utilize them for our data wrangling needs.

In this chapter, we also learned about built-in methods and things we can do with objects in Python. Additionally, we learned some simple Python methods and tools we can use to help figure out what kind of data type an object is and what we can do with it. Table 2-2 summarizes these tools.

Table 2-2. Helper tools
Example	What it does
`type(`‘Joker’`)`	Returns what kind of object ‘Joker’ is.
`dir(`‘Joker’`)`	Returns a list of all the things the object ‘Joker’ can do (methods and properties).
`help(`‘Joker’`.strip)`	Returns a description of a specific method (in this case, `strip`) so we can better understand how to use it.

In the next chapter, we will learn how to open various file types and store data in the Python data types we learned in this chapter. By converting our data from files into Python objects, we can unleash the power of Python and data wrangling can soon become an easy task.

¹ They are not exactly the same dictionary, since the second example uses objects that could be modified. For more reading on the differences, check out Appendix E.

Previous Chapter

1. Introduction to Python

Next Chapter

3. Data Meant to Be Read by Machines

Table of Contents for Data Wrangling with Python

Chapter 2. Python Basics

Basic Data Types

Strings

Integers and Floats

Integers

Floats, decimals, and other non–whole number types

Data Containers

Variables

Lists

Note

Dictionaries

Tip

What Can the Various Data Types Do?

Note

Note

String Methods: Things Strings Can Do

Warning

Numerical Methods: Things Numbers Can Do

List Methods: Things Lists Can Do

Dictionary Methods: Things Dictionaries Can Do

Helpful Tools: type, dir, and help

type

dir

help

Tip

Putting It All Together

What Does It All Mean?

Summary

Table of Contents for
Data Wrangling with Python