Appendix E. Python Gotchas

Python, like any other language you learn, has its quirks and idiosyncracies. Some of them are shared among scripting languages, so they may not seem surprising if you have scripting experience. Other quirks are unique to Python. We’ve assembled a list of some of them, but by no means all of them, so you can familiarize yourself. We hope this appendix serves as an aid for debugging and also gives you a bit of insight on why Python does things the way it does.

Hail the Whitespace

As you have probably already noticed, Python uses whitespace as an integral part of code structure. Whitespace is used to indent functions, methods, and classes; to operate if-else statements; and to create continuation lines. In Python, whitespace is a special operator and helps turn Python code into executable code.

There are a few best practices for whitespace in your Python files:

  • Don’t use tabs. Use spaces.

  • Use four spaces for each indentation block.

  • Choose a good indentation for hanging indents (it can align with a delimiter, an extra indentation, or a single indentation, but should be chosen based on what is most readable and usable; see PEP-8).

Tip

PEP-8 (or Python Enhancement Proposals #8) is a Python style guide outlining good practices for indentation and advice on how to name variables, continue lines, and format your code so it is readable, easy to use, and easy to share.

If your code is improperly indented and Python cannot parse your file, you’ll get an IndentationError. The error message will show you what line you have improperly indented. It’s also fairly easy to get a Python linter set up with whatever text editor is your favorite, to automatically check your code as you are working. For example, a nice PEP-8 linter is available for Atom.

The Dreaded GIL

The Global Interpreter Lock (GIL) is a mechanism used by the Python interpreter to execute code using only one thread at a time. This means that when you are running your Python script, even on a multiprocessing machine, your code will execute linearly. This design decision was made so that Python could run quickly using C code but still be thread-safe.

The constraint the GIL puts on Python means with the standard interpreter, Python is never truly parallelized. This has some disadvantages for high-I/O applications or applications relying heavily on multiprocessing.1 There are some Python libraries to circumvent these issues by using multiprocessing or asynchronous services,2 but they don’t change the fact that the GIL still exists.

That said, there are plenty of Python core developers aware of the issues presented by the GIL, as well as its benefits. There are often good workarounds available for circumstances where the GIL is a pain point, and depending on your needs, there are alternative interpreters available that are written in languages other than C. If you find the GIL is becoming a problem for your code, it’s likely that you can either rearchitect your code or utilize a different code base (e.g., Node.js) to fulfill your needs.

= Versus == Versus is, and When to Just Copy

In Python, there are some serious distinctions between seemingly similar functions. We know some of these already, but let’s review with some code and output (using IPython):

In [1]: a = 1 1

In [2]: 1 == 1 2
Out[2]: True

In [3]: 1 is 1 3
Out[3]: True

In [4]: a is 1 4
Out[4]: True

In [5]: b = []

In [6]: [] == []
Out[6]: True

In [7]: [] is []
Out[7]: False

In [8]: b is []
Out[8]: False
1

Sets variable a equal to 1

2

Tests if 1 is equal to 1

3

Tests if 1 is the same object as 1

4

Tests if a is the same object as 1

If you execute these lines in IPython (so you can see the output, similar to what we’ve shown here) you will notice some interesting and possibly unexpected results. With an integer, we see that it’s easy to determine equivalency in a lot of ways. With the list object, however, we find that is acts differently from the other comparison operators. In Python, memory management operates differently than in some other languages. There’s a great writeup with visualizations on Sreejith Kesavan’s blog about how Python manages objects in memory.

To see this from another perspective, let’s take a look at where the object’s memory is held:

In [9]: a = 1

In [10]: id(a)
Out[10]: 14119256

In [11]: b = a 1

In [12]: id(b) 2
Out[12]: 14119256

In [13]: a = 2

In [14]: id(a) 3
Out[14]: 14119232

In [15]: c = []

In [16]: id(c)
Out[16]: 140491313323544

In [17]: b = c

In [18]: id(b) 4
Out[18]: 140491313323544

In [19]: c.append(45)

In [20]: id(c) 5
Out[20]: 140491313323544
1

Sets b equal to a.

2

When we test the id here, we find that both b and a hold the same place in memory—that is, they are the same object in memory.

3

When we test the id here, we find a has a new place in memory. That place now holds the value of 2.

4

With a list, we can see that we have the same id when we assign the list equal to the same object.

5

When we change the list, we find we do not change the place in memory. Python lists behave differently than integers and strings in this way.

What we want to take away from this is not a deep understanding of memory allocation in Python, but that we might not always think we are assigning what we are assigning. When dealing with lists and dictionaries, we want to know and understand that if we set them equal to a new variable, that new variable and the old variable are still the same object in memory. If we alter one, we alter the other. If we want to only alter one or the other, or if we need to create a new object as a copy of an object, we need to use the copy method.

Let’s take a look with one final example to explain copy versus assignment:

In [21]: a = {}

In [22]: id(a)
Out[22]: 140491293143120

In [23]: b = a

In [24]: id(b)
Out[24]: 140491293143120

In [25]: a['test'] = 1

In [26]: b 1
Out[26]: {'test': 1}

In [27]: c = b.copy() 2

In [28]: id(c) 3
Out[28]: 140491293140144

In [29]: c['test_2'] = 2

In [30]: c 4
Out[30]: {'test': 1, 'test_2': 2}

In [31]: b 5
Out[31]: {'test': 1}
1

With this line, we see that when we modify a we also modify b, as they are stored in the same place in memory.

2

Using copy we create a new variable, c, which is a copy of the first dictionary.

3

With this line, we see that copy created a new object. It has a new id.

4

After we modify c, we see it now holds two keys and values.

5

Even after c is modified, we see that b remains the same.

With this last example, it should be obvious that if you actually want a copy of a dictionary or list, you will need to use copy. If you want the same object, then you can use =. Likewise, if you want to test whether two objects “are equal” you can use ==, but if you want to see whether these are the same object, use is.

Default Function Arguments

Sometimes you will want to pass default variables into your Python functions and methods. To do so, you want to fully understand when and how Python calls these default methods. Let’s take a look:

def add_one(default_list=[]):
    default_list.append(1)
    return default_list

Now let’s investigate with IPython:

In  [2]: add_one()
Out [2]: [1]

In  [3]: add_one()
Out [3]: [1, 1]

You might have expected that each function call would return a new list with only one item, 1. Instead, both calls modified the same list object. What is happening is that the default argument is declared when the script is first interpreted. If you want a new list every time, you can rewrite the function like so:

def add_one(default_list=None):
    if default_list is None:
        default_list = []
    default_list.append(1)
    return default_list

Now our code behaves as we would expect:

In  [6]: add_one()
Out [6]: [1]

In  [7]: add_one()
Out [7]: [1]

In  [8]: add_one(default_list=[3])
Out [8]: [3, 1]

Now that you understand a bit about memory management and default variables, you can use your knowledge to determine when to test and set variables in your functions and executable code. With a deeper understanding of how and when Python defines objects, we can ensure these types of “gotchas” don’t end up adding bugs into our code.

Python Scope and Built-Ins: The Importance of Variable Names

In Python, scope operates slightly differently than you might expect. If you define a variable in the scope of a function, that variable is not known outside of the function. Let’s take a look:

In [10]: def foo():
   ....:     x = "test"

In  [11]: x
.---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-94-009520053b00> in <module>()
----> 1 x
NameError: name 'x' is not defined

However, if we have previously defined x, we will get our old definition:

In [12]: x = 1

In [13]: foo()

In  [14]: x
Out [14]: 1

This relates to built-in functions and methods. If you rewrite them by accident, then you can’t use them from that point in time onward. So, if you rewrite the special words list or date, the built-in functions with those names will not function normally throughout the rest of your code (or from that point in time forward):

In [17]: from datetime import date

In [19]: date(2015, 2, 5)
Out[19]: datetime.date(2015, 2, 5)

In [20]: date = 'my date obj'

In [21]: date(2015, 2, 5)
.---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-105-7f129d4341d0> in <module>()
----> 1 date(2015, 2, 5)

TypeError: 'str' object is not callable

As you can see, using variables that share names (or share names with anything other the standard Python namespace or any other libraries you are using) can be a debugging nightmare. If you use specific names in your code and are aware of the variable or module names, you won’t end up debugging namespace issues for hours.

Defining Objects Versus Modifying Objects

Defining a new object operates differently compared to modifying an old object in Python. Let’s say you have a function that adds one to an integer:

def add_one_int():
    x += 1
    return x

If you try to run that function, you should receive an error that reads UnboundLocalError: local variable 'x' referenced before assignment. However, if you define x in your function, you’ll see a different result:

def add_one_int():
    x = 0
    x += 1
    return x

This code is a bit convoluted (why can’t we just return 1?), but the takeaway is we must declare variables before we modify them, even when using a modification that looks like an assignment (+=). It’s especially important to keep this in mind when working with objects like lists and dictionaries (where we know modifying an object can have repercussions on other objects held in the same memory).

The important thing to remember is to always be clear and concise about when you intend to modify an object and when you want to create or return a new object. How you name variables and how you write and implement functions is key to writing scripts that are clear and behave predictably.

Changing Immutable Objects

When you want to modify or change immutable objects, you’ll need to create new objects. Python will not allow you to modify immutable objects, like tuples. As you learned when we discussed Python memory management, some objects hold the same space. Immutable objects cannot be modified; they are always reassigned. Let’s take a look:

In [1]: my_tuple = (1,)

In [2]: new_tuple = my_tuple

In [3]: my_tuple
Out[3]: (1,)

In [4]: new_tuple
Out[4]: (1,)

In [5]: my_tuple += (4, 5)

In [6]: new_tuple
Out[6]: (1,)

In [7]: my_tuple
Out[7]: (1, 4, 5)

What we can see here is that we tried to modify the original tuple using the += operator, and we were able to successfully do so. What we received, however, was a new object containing the original tuple plus the tuple we appended (4, 5). We did not end up changing the new_tuple variable, as what we did was assign a new place in memory to the new object. If you were to look at the memory ID before and after the += after, you would see it changed.

The main thing to remember about immutable objects is that when modified they do not hold the same place in memory, and if you modify them, you are actually creating completely new objects. This is especially important to remember if you are using methods or attributes of a class with immutable objects, as you want to ensure you understand when you are modifying them and when you are creating new immutable objects.

Type Checking

Python allows for easy type casting, meaning you can change strings to integers or lists to tuples, and so on. But this dynamic typing means issues can arise, especially in large code bases or when you are using new libraries. Some common issues are that a particular function, class, or method expects to see a certain type of object, and you mistakenly pass it the wrong type.

This becomes increasingly problematic as your code becomes more advanced and complex. As your code is more abstracted, you’ll be holding all of your objects in variables. If a function or method returns an unexpected type (say, None instead of a list), that object may be passed along to another function—possibly one that doesn’t accept None and then throws an error. Maybe that error is even caught, but the code assumes the exception was caused because of another problem and continues. It can very quickly get out of hand and become quite a mess to debug.

The best advice for handling these issues is to write very concise and clear code. You should ensure your functions always return what is expected by actively testing your code (to ensure there are no bugs) and keeping an eye on your scripts and any odd behavior. You should also add logging to help determine what your objects contain. In addition, being very clear about what exceptions you catch and not just catching all exceptions will help make these issues easier to find and fix.

Finally, at some point Python will implement PEP-484, which covers type hints, allowing you to check passed variables and your code to self-police these issues. This will likely not be incorporated until a future Python 3 release, but it’s good to know it’s in the works and you can expect to see a bit more structure around type checking in the future.

Catching Multiple Exceptions

As your code advances, you might want to catch more than one exception with the same line. For example, you might want to catch a TypeError along with an AttributeError. This might be the case if you believe you are passing a dictionary and you are actually passing a list. It might have some of the same attributes, but not all. If you need to catch more than one type of error on a line, you must write the exceptions in a tuple. Let’s take a look:

my_dict = {'foo': {}, 'bar': None, 'baz': []}

for k, v in my_dict.items():
    try:
        v.items()
    except (TypeError, AttributeError) as e:
        print "We had an issue!"
        print e

You should see the following output (possibly in a different order):

We had an issue!
'list' object has no attribute 'items'
We had an issue!
'NoneType' object has no attribute 'items'

Our exception successfully caught both errors and executed the exception block. As you can see, being aware of the types of errors you might need to catch and understanding the syntax (to put them in a tuple) is essential to your code. If you were to simply list them (in a list or just separated by commas), your code would not function properly and you would not be catching both exceptions.

The Power of Debugging

As you become a more advanced developer and data wrangler, you will come across many more issues and errors to debug. We wish we could tell you it gets easier, but it’s likely your debugging will become a bit more intense and rigorous before it becomes easier. This is because you will be working with more advanced code and libraries, and tackling more difficult problems.

That said, you have many skills and tools at your disposal to help you get unstuck. You can execute code in IPython to get more feedback during development. You can add logging to your scripts to better understand what is happening. You can have your web scrapers take screenshots and save them to files if you are having issues parsing a page. You can share your code with others in an IPython notebook or on many helpful sites to get feedback.

There are also some great tools for debugging with Python, including pdb, which allows you to step through your code (or other code in the module) and see exactly what each object holds immediately before and after any errors. There’s a great, quick introduction to pdb on YouTube, showing some ways to use pdb in your code.

Additionally, you should be reading and writing both documentation and tests. We’ve covered some basics in this book, but we highly recommend you use this as a starting point and investigate both documentation and testing further. Ned Batchelder’s recent PyCon talk on getting started with testing is a great place to begin. Jacob Kaplan-Moss also gave a great talk on getting started with documentation at PyCon 2011. By reading and writing documentation and writing and executing tests, you can make sure you haven’t introduced errors into your code through misinformation, or missed them by not running tests.

We hope this book is a good first introduction to these concepts, but we encourage you to continue your reading and development by seeking out more Python learning and continuing to excel as a Python developer.

1 For some further reading on how the GIL performs with some visualization, check out “A Zoomable Interactive Python Thread Visualization” by David Beazley.

2 For some great reading on what these packages do, check out Jeff Knupp’s writeup on how to go about alleviating GIL issues.