In Chapter 2 we looked at the basics of using the IPython shell and Jupyter notebook. In this chapter, we explore some deeper functionality in the IPython system that can either be used from the console or within Jupyter.
IPython maintains a small on-disk database containing the text of each command that you execute. This serves various purposes:
Searching, completing, and executing previously executed commands with minimal typing
Persisting the command history between sessions
Logging the input/output history to a file
These features are more useful in the shell than in the notebook, since the notebook by design keeps a log of the input and output in each code cell.
The IPython shell lets you search and execute previous code or other commands.
This is useful, as you may often find yourself repeating the same
commands, such as a %run command or some
other code snippet. Suppose you had run:
In[7]: %run first/second/third/data_script.py
and then explored the results of the script (assuming it ran
successfully) only to find that you made an incorrect calculation. After
figuring out the problem and modifying
data_script.py, you can start typing a few letters
of the %run command and then press either
the Ctrl-P key combination or the up arrow key. This will
search the command history for the first prior command matching the
letters you typed. Pressing either Ctrl-P or the up arrow key multiple
times will continue to search through the history. If you pass over the
command you wish to execute, fear not. You can move
forward through the command history by pressing
either Ctrl-N or the down arrow key. After doing this a few times,
you may start pressing these keys without thinking!
Using Ctrl-R gives you the same partial incremental searching capability
provided by the readline used in
Unix-style shells, such as the bash shell. On Windows, readline functionality is emulated by IPython. To use this, press Ctrl-R and then
type a few characters contained in the input line you want to search
for:
In [1]: a_command = foo(x, y, z) (reverse-i-search)`com': a_command = foo(x, y, z)
Pressing Ctrl-R will cycle through the history for each line matching the characters you’ve typed.
Forgetting to assign the result of a function call to a variable can be very
annoying. An IPython session stores references to
both the input commands and output Python objects
in special variables. The previous two outputs are stored in the _ (one underscore)
and __ (two underscores) variables,
respectively:
In[24]:2**27Out[24]:134217728In[25]:_Out[25]:134217728
Input variables are stored in variables named like _iX, where X is the input line number. For each input
variable there is a corresponding output variable _X. So after input line 27, say, there will be
two new variables _27 (for the
output) and _i27 for the
input:
In[26]:foo='bar'In[27]:fooOut[27]:'bar'In[28]:_i27Out[28]:u'foo'In[29]:_27Out[29]:'bar'
Since the input variables are strings they can be executed again
with the Python exec keyword:
In[30]:exec(_i27)
Here _i27 refers to the code input in
In [27].
Several magic functions allow you to work with the input and
output history. %hist is capable of
printing all or part of the input history, with or without line numbers.
%reset is for clearing the
interactive namespace and optionally the input and output caches. The
%xdel magic function is intended for
removing all references to a particular object from
the IPython machinery. See the documentation for both of these magics
for more details.
When working with very large datasets, keep in mind that
IPython’s input and output history causes any object referenced there
to not be garbage-collected (freeing up the memory), even if you
delete the variables from the interactive namespace using the del keyword. In such cases, careful usage
of %xdel and %reset can help you avoid running into memory problems.
Another feature of IPython is that it allows you to seamlessly access the filesystem and operating system shell. This means, among other things, that you can perform most standard command-line actions as you would in the Windows or Unix (Linux, macOS) shell without having to exit IPython. This includes shell commands, changing directories, and storing the results of a command in a Python object (list or string). There are also simple command aliasing and directory bookmarking features.
See Table B-1 for a summary of magic functions and syntax for calling shell commands. I’ll briefly visit these features in the next few sections.
Starting a line in IPython with an exclamation point !, or bang, tells IPython to execute
everything after the bang in the system shell. This means that you can
delete files (using rm or del, depending on your OS), change
directories, or execute any other process.
You can store the console output of a shell command in a
variable by assigning the expression escaped with
! to a variable. For example, on my Linux-based
machine connected to the internet via ethernet, I can get my IP address
as a Python variable:
In[1]:ip_info=!ifconfigwlan0|grep"inet "In[2]:ip_info[0].strip()Out[2]:'inet addr:10.0.0.11 Bcast:10.0.0.255 Mask:255.255.255.0'
The returned Python object ip_info is actually a custom list type
containing various versions of the console output.
IPython can also substitute in Python values defined in the
current environment when using !. To
do this, preface the variable name by the dollar sign $:
In[3]:foo='test*'In[4]:!ls$footest4.pytest.pytest.xml
The %alias magic function can define custom shortcuts for shell commands.
As a simple example:
In[1]:%aliasllls-lIn[2]:ll/usrtotal332drwxr-xr-x2rootroot696322012-01-2920:36bin/drwxr-xr-x2rootroot40962010-08-2312:05games/drwxr-xr-x123rootroot204802011-12-2618:08include/drwxr-xr-x265rootroot1269762012-01-2920:36lib/drwxr-xr-x44rootroot696322011-12-2618:08lib32/lrwxrwxrwx1rootroot32010-08-2316:02lib64->lib/drwxr-xr-x15rootroot40962011-10-1319:03local/drwxr-xr-x2rootroot122882012-01-1209:32sbin/drwxr-xr-x387rootroot122882011-11-0422:53share/drwxrwsr-x24rootsrc40962011-07-1718:38src/
You can execute multiple commands just as on the command line by separating them with semicolons:
In[558]:%aliastest_alias(cdexamples;ls;cd..)In[559]:test_aliasmacrodata.csvspx.csvtips.csv
You’ll notice that IPython “forgets” any aliases you define interactively as soon as the session is closed. To create permanent aliases, you will need to use the configuration system.
IPython has a simple directory bookmarking system to enable you to save aliases for common directories so that you can jump around very easily. For example, suppose you wanted to create a bookmark that points to the supplementary materials for this book:
In[6]:%bookmarkpy4da/home/wesm/code/pydata-book
Once you’ve done this, when we use the %cd magic, we can
use any bookmarks we’ve defined:
In[7]:cdpy4da(bookmark:py4da)->/home/wesm/code/pydata-book/home/wesm/code/pydata-book
If a bookmark name conflicts with a directory name in your current
working directory, you can use the -b
flag to override and use the bookmark location. Using the -l option with %bookmark lists
all of your bookmarks:
In[8]:%bookmark-lCurrentbookmarks:py4da->/home/wesm/code/pydata-book-source
Bookmarks, unlike aliases, are automatically persisted between IPython sessions.
In addition to being a comfortable environment for interactive computing and data
exploration, IPython can also be a useful companion for general Python
software development. In data analysis applications, it’s important first
to have correct code. Fortunately, IPython has
closely integrated and enhanced the built-in Python pdb debugger. Secondly you want your code to be
fast. For this IPython has easy-to-use code timing
and profiling tools. I will give an overview of these tools in detail
here.
IPython’s debugger enhances pdb with tab
completion, syntax highlighting, and context for each line in exception
tracebacks. One of the best times to debug code is right after an error
has occurred. The %debug command,
when entered immediately after an exception, invokes the “post-mortem”
debugger and drops you into the stack frame where the exception was
raised:
In[2]:runexamples/ipython_bug.py---------------------------------------------------------------------------AssertionErrorTraceback(mostrecentcalllast)/home/wesm/code/pydata-book/examples/ipython_bug.pyin<module>()13throws_an_exception()14--->15calling_things()/home/wesm/code/pydata-book/examples/ipython_bug.pyincalling_things()11defcalling_things():12works_fine()--->13throws_an_exception()1415calling_things()/home/wesm/code/pydata-book/examples/ipython_bug.pyinthrows_an_exception()7a=58b=6---->9assert(a+b==10)1011defcalling_things():AssertionError:In[3]:%debug>/home/wesm/code/pydata-book/examples/ipython_bug.py(9)throws_an_exception()8b=6---->9assert(a+b==10)10ipdb>
Once inside the debugger, you can execute arbitrary Python code
and explore all of the objects and data (which have been “kept alive” by
the interpreter) inside each stack frame. By default you start in the
lowest level, where the error occurred. By pressing u (up) and d (down), you can switch between the levels of
the stack trace:
ipdb>u>/home/wesm/code/pydata-book/examples/ipython_bug.py(13)calling_things()12works_fine()--->13throws_an_exception()14
Executing the %pdb command
makes it so that IPython automatically invokes the
debugger after any exception, a mode that many users will find
especially useful.
It’s also easy to use the debugger to help develop code,
especially when you wish to set breakpoints or step through the
execution of a function or script to examine the state at each stage.
There are several ways to accomplish this. The first is by using %run with the
-d flag, which invokes the debugger
before executing any code in the passed script. You must immediately
press s (step) to enter the
script:
In[5]:run-dexamples/ipython_bug.pyBreakpoint1at/home/wesm/code/pydata-book/examples/ipython_bug.py:1NOTE:Enter'c'attheipdb>prompttostartyourscript.><string>(1)<module>()ipdb>s--Call-->/home/wesm/code/pydata-book/examples/ipython_bug.py(1)<module>()1--->1defworks_fine():2a=53b=6
After this point, it’s up to you how you want to work your way
through the file. For example, in the preceding exception, we could set
a breakpoint right before calling the works_fine method and run the script until we
reach the breakpoint by pressing c
(continue):
ipdb>b12ipdb>c>/home/wesm/code/pydata-book/examples/ipython_bug.py(12)calling_things()11defcalling_things():2-->12works_fine()13throws_an_exception()
At this point, you can step
into works_fine() or execute works_fine() by pressing n (next) to advance to the next line:
ipdb>n>/home/wesm/code/pydata-book/examples/ipython_bug.py(13)calling_things()212works_fine()--->13throws_an_exception()14
Then, we could step into throws_an_exception and advance to the line
where the error occurs and look at the variables in the scope. Note that
debugger commands take precedence over variable names; in such cases,
preface the variables with ! to
examine their contents:
ipdb>s--Call-->/home/wesm/code/pydata-book/examples/ipython_bug.py(6)throws_an_exception()5---->6defthrows_an_exception():7a=5ipdb>n>/home/wesm/code/pydata-book/examples/ipython_bug.py(7)throws_an_exception()6defthrows_an_exception():---->7a=58b=6ipdb>n>/home/wesm/code/pydata-book/examples/ipython_bug.py(8)throws_an_exception()7a=5---->8b=69assert(a+b==10)ipdb>n>/home/wesm/code/pydata-book/examples/ipython_bug.py(9)throws_an_exception()8b=6---->9assert(a+b==10)10ipdb>!a5ipdb>!b6
Developing proficiency with the interactive debugger is largely a matter of practice and experience. See Table B-2 for a full catalog of the debugger commands. If you are accustomed to using an IDE, you might find the terminal-driven debugger to be a bit unforgiving at first, but that will improve in time. Some of the Python IDEs have excellent GUI debuggers, so most users can find something that works for them.
There are a couple of other useful ways to invoke the debugger.
The first is by using a special set_trace function (named after pdb.set_trace), which is basically a “poor
man’s breakpoint.” Here are two small recipes you might want to put
somewhere for your general use (potentially adding them to your
IPython profile as I do):
fromIPython.core.debuggerimportPdbdefset_trace():Pdb(color_scheme='Linux').set_trace(sys._getframe().f_back)defdebug(f,*args,**kwargs):pdb=Pdb(color_scheme='Linux')returnpdb.runcall(f,*args,**kwargs)
The first function, set_trace, is very simple. You can use a
set_trace in any part of your
code that you want to temporarily stop in order to more closely
examine it (e.g., right before an exception occurs):
In[7]:runexamples/ipython_bug.py>/home/wesm/code/pydata-book/examples/ipython_bug.py(16)calling_things()15set_trace()--->16throws_an_exception()17
Pressing c (continue) will
cause the code to resume normally with no harm done.
The debug function we just looked at enables you to invoke the interactive
debugger easily on an arbitrary function call. Suppose we had written
a function like the following and we wished to step through its
logic:
deff(x,y,z=1):tmp=x+yreturntmp/z
Ordinarily using f would look
like f(1, 2, z=3). To instead step
into f, pass f as the first argument to debug followed by the positional and keyword
arguments to be passed to f:
In[6]:debug(f,1,2,z=3)><ipython-input>(2)f()1deff(x,y,z):---->2tmp=x+y3returntmp/zipdb>
I find that these two simple recipes save me a lot of time on a day-to-day basis.
Lastly, the debugger can be used in conjunction with %run. By running
a script with %run -d, you will be
dropped directly into the debugger, ready to set any breakpoints and
start the script:
In[1]:%run-dexamples/ipython_bug.pyBreakpoint1at/home/wesm/code/pydata-book/examples/ipython_bug.py:1NOTE:Enter'c'attheipdb>prompttostartyourscript.><string>(1)<module>()ipdb>
Adding -b with a line number
starts the debugger with a breakpoint set already:
In[2]:%run-d-b2examples/ipython_bug.pyBreakpoint1at/home/wesm/code/pydata-book/examples/ipython_bug.py:2NOTE:Enter'c'attheipdb>prompttostartyourscript.><string>(1)<module>()ipdb>c>/home/wesm/code/pydata-book/examples/ipython_bug.py(2)works_fine()1defworks_fine():1--->2a=53b=6ipdb>
For larger-scale or longer-running data analysis applications, you may wish to measure the execution time of various components or of individual statements or function calls. You may want a report of which functions are taking up the most time in a complex process. Fortunately, IPython enables you to get this information very easily while you are developing and testing your code.
Timing code by hand using the built-in time module and its functions time.clock and time.time is often tedious and repetitive, as
you must write the same uninteresting boilerplate code:
importtimestart=time.time()foriinrange(iterations):# some code to run hereelapsed_per=(time.time()-start)/iterations
Since this is such a common operation, IPython has two magic
functions, %time and
%timeit, to automate this process for you.
%time runs a statement once,
reporting the total execution time. Suppose we had a large list of
strings and we wanted to compare different methods of selecting all
strings starting with a particular prefix. Here is a simple list of
600,000 strings and two identical methods of selecting only the ones
that start with 'foo':
# a very large list of stringsstrings=['foo','foobar','baz','qux','python','Guido Van Rossum']*100000method1=[xforxinstringsifx.startswith('foo')]method2=[xforxinstringsifx[:3]=='foo']
It looks like they should be about the same performance-wise,
right? We can check for sure using %time:
In[561]:%timemethod1=[xforxinstringsifx.startswith('foo')]CPUtimes:user0.19s,sys:0.00s,total:0.19sWalltime:0.19sIn[562]:%timemethod2=[xforxinstringsifx[:3]=='foo']CPUtimes:user0.09s,sys:0.00s,total:0.09sWalltime:0.09s
The Wall time (short for
“wall-clock time”) is the main number of interest. So, it looks like the
first method takes more than twice as long, but it’s not a very precise
measurement. If you try %time-ing
those statements multiple times yourself, you’ll find that the results
are somewhat variable. To get a more precise measurement, use the
%timeit magic function. Given an
arbitrary statement, it has a heuristic to run a statement multiple
times to produce a more accurate average runtime:
In[563]:%timeit[xforxinstringsifx.startswith('foo')]10loops,bestof3:159msperloopIn[564]:%timeit[xforxinstringsifx[:3]=='foo']10loops,bestof3:59.3msperloop
This seemingly innocuous example illustrates that it is worth understanding the performance characteristics of the Python standard library, NumPy, pandas, and other libraries used in this book. In larger-scale data analysis applications, those milliseconds will start to add up!
%timeit is especially useful
for analyzing statements and functions with very short execution times,
even at the level of microseconds (millionths of a second) or
nanoseconds (billionths of a second). These may seem like insignificant
amounts of time, but of course a 20 microsecond function invoked 1
million times takes 15 seconds longer than a 5 microsecond function. In
the preceding example, we could very directly compare the two string
operations to understand their performance characteristics:
In[565]:x='foobar'In[566]:y='foo'In[567]:%timeitx.startswith(y)1000000loops,bestof3:267nsperloopIn[568]:%timeitx[:3]==y10000000loops,bestof3:147nsperloop
Profiling code is closely related to timing code, except it is concerned
with determining where time is spent. The main
Python profiling tool is the cProfile
module, which is not specific to IPython at all. cProfile executes a program or any arbitrary
block of code while keeping track of how much time is spent in each
function.
A common way to use cProfile is
on the command line, running an entire program and outputting the
aggregated time per function. Suppose we had a simple script that does
some linear algebra in a loop (computing the maximum absolute
eigenvalues of a series of 100 × 100 matrices):
importnumpyasnpfromnumpy.linalgimporteigvalsdefrun_experiment(niter=100):K=100results=[]for_inxrange(niter):mat=np.random.randn(K,K)max_eigenvalue=np.abs(eigvals(mat)).max()results.append(max_eigenvalue)returnresultssome_results=run_experiment()'Largest one we saw:%s'%np.max(some_results)
You can run this script through cProfile using the following in the command
line:
python -m cProfile cprof_example.py
If you try that, you’ll find that the output is sorted by function
name. This makes it a bit hard to get an idea of where the most time is
spent, so it’s very common to specify a sort order
using the -s flag:
$python-mcProfile-scumulativecprof_example.pyLargestonewesaw:11.92320442215116functioncalls(14927primitivecalls)in0.720secondsOrderedby:cumulativetimencallstottimepercallcumtimepercallfilename:lineno(function)10.0010.0010.7210.721cprof_example.py:1(<module>)1000.0030.0000.5860.006linalg.py:702(eigvals)2000.5720.0030.5720.003{numpy.linalg.lapack_lite.dgeev}10.0020.0020.0750.075__init__.py:106(<module>)1000.0590.0010.0590.001{method'randn')10.0000.0000.0440.044add_newdocs.py:9(<module>)20.0010.0010.0370.019__init__.py:1(<module>)20.0030.0020.0300.015__init__.py:2(<module>)10.0000.0000.0300.030type_check.py:3(<module>)10.0010.0010.0210.021__init__.py:15(<module>)10.0130.0130.0130.013numeric.py:1(<module>)10.0000.0000.0090.009__init__.py:6(<module>)10.0010.0010.0080.008__init__.py:45(<module>)2620.0050.0000.0070.000function_base.py:3178(add_newdoc)1000.0030.0000.0050.000linalg.py:162(_assertFinite)...
Only the first 15 rows of the output are shown. It’s easiest to
read by scanning down the cumtime
column to see how much total time was spent inside
each function. Note that if a function calls some other function,
the clock does not stop running. cProfile records the start and end time of
each function call and uses that to produce the timing.
In addition to the command-line usage, cProfile can also be used programmatically to
profile arbitrary blocks of code without having to run a new process.
IPython has a convenient interface to this capability using the %prun command
and the -p option to %run. %prun
takes the same “command-line options” as cProfile but will profile an arbitrary Python
statement instead of a whole .py file:
In[4]:%prun-l7-scumulativerun_experiment()4203functioncallsin0.643secondsOrderedby:cumulativetimeListreducedfrom32to7duetorestriction<7>ncallstottimepercallcumtimepercallfilename:lineno(function)10.0000.0000.6430.643<string>:1(<module>)10.0010.0010.6430.643cprof_example.py:4(run_experiment)1000.0030.0000.5830.006linalg.py:702(eigvals)2000.5690.0030.5690.003{numpy.linalg.lapack_lite.dgeev}1000.0580.0010.0580.001{method'randn'}1000.0030.0000.0050.000linalg.py:162(_assertFinite)2000.0020.0000.0020.000{method'all'of'numpy.ndarray'}
Similarly, calling %run -p -s cumulative
cprof_example.py has the same effect as the command-line
approach, except you never have to leave IPython.
In the Jupyter notebook, you can use the %%prun magic (two
% signs) to profile an entire code block. This pops
up a separate window with the profile output. This can be useful in
getting possibly quick answers to questions like, “Why did that code
block take so long to run?”
There are other tools available that help make profiles easier to understand when you are using IPython or Jupyter. One of these is SnakeViz, which produces an interactive visualization of the profile results using d3.js.
In some cases the information you obtain from %prun (or another cProfile-based profile method) may not tell
the whole story about a function’s execution time, or it may be so
complex that the results, aggregated by function name, are hard to
interpret. For this case, there is a small library called line_profiler (obtainable via PyPI or one of
the package management tools). It contains an IPython extension enabling
a new magic function %lprun that
computes a line-by-line-profiling of one or more functions. You can
enable this extension by modifying your IPython configuration (see the
IPython documentation or the section on configuration later in this
chapter) to include the following line:
# A list of dotted module names of IPython extensions to load.c.TerminalIPythonApp.extensions=['line_profiler']
You can also run the command:
%load_ext line_profiler
line_profiler can be used
programmatically (see the full documentation), but it is perhaps most
powerful when used interactively in IPython. Suppose you had a module
prof_mod with the following code
doing some NumPy array operations:
fromnumpy.randomimportrandndefadd_and_sum(x,y):added=x+ysummed=added.sum(axis=1)returnsummeddefcall_function():x=randn(1000,1000)y=randn(1000,1000)returnadd_and_sum(x,y)
If we wanted to understand the performance of the add_and_sum function, %prun gives us the following:
In[569]:%runprof_modIn[570]:x=randn(3000,3000)In[571]:y=randn(3000,3000)In[572]:%prunadd_and_sum(x,y)4functioncallsin0.049secondsOrderedby:internaltimencallstottimepercallcumtimepercallfilename:lineno(function)10.0360.0360.0460.046prof_mod.py:3(add_and_sum)10.0090.0090.0090.009{method'sum'of'numpy.ndarray'}10.0030.0030.0490.049<string>:1(<module>)
This is not especially enlightening. With the line_profiler IPython extension activated, a
new command %lprun is available. The
only difference in usage is that we must instruct %lprun which function or functions we wish to
profile. The general syntax is:
%lprun -f func1 -f func2 statement_to_profileIn this case, we want to profile add_and_sum, so we run:
In[573]:%lprun-fadd_and_sumadd_and_sum(x,y)Timerunit:1e-06sFile:prof_mod.pyFunction:add_and_sumatline3Totaltime:0.045936sLine# Hits Time Per Hit % Time Line Contents==============================================================3defadd_and_sum(x,y):413651036510.079.5added=x+y5194259425.020.5summed=added.sum(axis=1)6111.00.0returnsummed
This can be much easier to interpret. In this case we profiled the
same function we used in the statement. Looking at the preceding module
code, we could call call_function and
profile that as well as add_and_sum,
thus getting a full picture of the performance of the code:
In[574]:%lprun-fadd_and_sum-fcall_functioncall_function()Timerunit:1e-06sFile:prof_mod.pyFunction:add_and_sumatline3Totaltime:0.005526sLine# Hits Time Per Hit % Time Line Contents==============================================================3defadd_and_sum(x,y):4143754375.079.2added=x+y5111491149.020.8summed=added.sum(axis=1)6122.00.0returnsummedFile:prof_mod.pyFunction:call_functionatline8Totaltime:0.121016sLine# Hits Time Per Hit % Time Line Contents==============================================================8defcall_function():915716957169.047.2x=randn(1000,1000)1015830458304.048.2y=randn(1000,1000)11155435543.04.6returnadd_and_sum(x,y)
As a general rule of thumb, I tend to prefer %prun (cProfile) for “macro” profiling and %lprun (line_profiler) for “micro” profiling. It’s
worthwhile to have a good understanding of both tools.
The reason that you must explicitly specify the names of the
functions you want to profile with %lprun is that the overhead of “tracing” the
execution time of each line is substantial. Tracing functions that are
not of interest has the potential to significantly alter the profile
results.
Writing code in a way that makes it easy to develop, debug, and ultimately use interactively may be a paradigm shift for many users. There are procedural details like code reloading that may require some adjustment as well as coding style concerns.
Therefore, implementing most of the strategies described in this section is more of an art than a science and will require some experimentation on your part to determine a way to write your Python code that is effective for you. Ultimately you want to structure your code in a way that makes it easy to use iteratively and to be able to explore the results of running a program or function as effortlessly as possible. I have found software designed with IPython in mind to be easier to work with than code intended only to be run as as standalone command-line application. This becomes especially important when something goes wrong and you have to diagnose an error in code that you or someone else might have written months or years beforehand.
In Python, when you type import
some_lib, the code in some_lib is executed and all the variables,
functions, and imports defined within are stored in the newly created
some_lib module namespace. The next
time you type import some_lib, you
will get a reference to the existing module namespace. The potential
difficulty in interactive IPython code development comes when you, say,
%run a script that depends on some
other module where you may have made changes. Suppose I had the
following code in test_script.py:
importsome_libx=5y=[1,2,3,4]result=some_lib.get_answer(x,y)
If you were to execute %run
test_script.py then modify some_lib.py,
the next time you execute %run
test_script.py you will still get the old
version of some_lib.py because of
Python’s “load-once” module system. This behavior differs from some
other data analysis environments, like MATLAB, which automatically
propagate code changes.1 To cope with this, you have a couple of options. The first
way is to use the reload function in
the importlib module in the standard library:
importsome_libimportimportlibimportlib.reload(some_lib)
This guarantees that you will get a fresh copy of
some_lib.py every time you run
test_script.py. Obviously, if the dependencies go
deeper, it might be a bit tricky to be inserting usages of reload all over the place. For this problem,
IPython has a special dreload
function (not a magic function) for “deep”
(recursive) reloading of modules. If I were to run
some_lib.py then type dreload(some_lib), it will attempt to reload
some_lib as well as all of its
dependencies. This will not work in all cases, unfortunately, but when
it does it beats having to restart IPython.
There’s no simple recipe for this, but here are some high-level principles I have found effective in my own work.
It’s not unusual to see a program written for the command line with a structure somewhat like the following trivial example:
frommy_functionsimportgdeff(x,y):returng(x+y)defmain():x=6y=7.5result=x+yif__name__=='__main__':main()
Do you see what might go wrong if we were to run this program in
IPython? After it’s done, none of the results or objects defined in
the main function will be
accessible in the IPython shell. A better way is to have whatever code
is in main execute directly in the
module’s global namespace (or in the if
__name__ == '__main__': block, if you want the module to
also be importable). That way, when you %run the code, you’ll be able to look at all
of the variables defined in main.
This is equivalent to defining top-level variables in cells in the
Jupyter notebook.
Deeply nested code makes me think about the many layers of an onion. When testing or debugging a function, how many layers of the onion must you peel back in order to reach the code of interest? The idea that “flat is better than nested” is a part of the Zen of Python, and it applies generally to developing code for interactive use as well. Making functions and classes as decoupled and modular as possible makes them easier to test (if you are writing unit tests), debug, and use interactively.
If you come from a Java (or another such language) background, you may have been told to keep files short. In many languages, this is sound advice; long length is usually a bad “code smell,” indicating refactoring or reorganization may be necessary. However, while developing code using IPython, working with 10 small but interconnected files (under, say, 100 lines each) is likely to cause you more headaches in general than two or three longer files. Fewer files means fewer modules to reload and less jumping between files while editing, too. I have found maintaining larger modules, each with high internal cohesion, to be much more useful and Pythonic. After iterating toward a solution, it sometimes will make sense to refactor larger files into smaller ones.
Obviously, I don’t support taking this argument to the extreme, which would to be to put all of your code in a single monstrous file. Finding a sensible and intuitive module and package structure for a large codebase often takes a bit of work, but it is especially important to get right in teams. Each module should be internally cohesive, and it should be as obvious as possible where to find functions and classes responsible for each area of functionality.
Making full use of the IPython system may lead you to write your code in a slightly different way, or to dig into the configuration.
IPython makes every effort to display a console-friendly string
representation of any object that you inspect. For many objects, like
dicts, lists, and tuples, the built-in pprint module
is used to do the nice formatting. In user-defined classes, however, you
have to generate the desired string output yourself. Suppose we had the
following simple class:
classMessage:def__init__(self,msg):self.msg=msg
If you wrote this, you would be disappointed to discover that the default output for your class isn’t very nice:
In[576]:x=Message('I have a secret')In[577]:xOut[577]:<__main__.Messageinstanceat0x60ebbd8>
IPython takes the string returned by the __repr__ magic method (by doing output = repr(obj)) and prints that to the
console. Thus, we can add a simple __repr__ method to the preceding class to get
a more helpful output:
classMessage:def__init__(self,msg):self.msg=msgdef__repr__(self):return'Message:%s'%self.msg
In[579]:x=Message('I have a secret')In[580]:xOut[580]:Message:Ihaveasecret
Most aspects of the appearance (colors, prompt, spacing between lines, etc.) and behavior of the IPython and Jupyter environments are configurable through an extensive configuration system. Here are some things you can do via configuration:
Change the color scheme
Change how the input and output prompts look, or remove the
blank line after Out and before
the next In prompt
Execute an arbitrary list of Python statements (e.g., imports that you use all the time or anything else you want to happen each time you launch IPython)
Enable always-on IPython extensions, like the %lprun magic in line_profiler
Enabling Jupyter extensions
Define your own magics or system aliases
Configurations for the IPython shell are specified in special ipython_config.py files, which are usually found in the .ipython/ directory in your user home directory. Configuration is performed based on a particular profile. When you start IPython normally, you load up, by default, the default profile, stored in the profile_default directory. Thus, on my Linux OS the full path to my default IPython configuration file is:
/home/wesm/.ipython/profile_default/ipython_config.py
To initialize this file on your system, run in the terminal:
ipython profile create
I’ll spare you the gory details of what’s in this file. Fortunately it has comments describing what each configuration option is for, so I will leave it to the reader to tinker and customize. One additional useful feature is that it’s possible to have multiple profiles. Suppose you wanted to have an alternative IPython configuration tailored for a particular application or project. Creating a new profile is as simple as typing something like the following:
ipython profile create secret_project
Once you’ve done this, edit the config files in the newly created profile_secret_project directory and then launch IPython like so:
$ ipython --profile=secret_project Python 3.5.1 | packaged by conda-forge | (default, May 20 2016, 05:22:56) Type "copyright", "credits" or "license" for more information. IPython 5.1.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. IPython profile: secret_project
As always, the online IPython documentation is an excellent resource for more on profiles and configuration.
Configuration for Jupyter works a little differently because you can use its notebooks with languages other than Python. To create an analogous Jupyter config file, run:
jupyter notebook --generate-config
This writes a default config file to the .jupyter/jupyter_notebook_config.py directory in your home directory. After editing this to suit your needs, you may rename it to a different file, like:
$ mv ~/.jupyter/jupyter_notebook_config.py ~/.jupyter/my_custom_config.py
When launching Jupyter, you can then add the --config argument:
jupyter notebook --config=~/.jupyter/my_custom_config.py
As you work through the code examples in this book and grow your skills as a Python programmer, I encourage you to keep learning about the IPython and Jupyter ecosystems. Since these projects have been designed to assist user productivity, you may discover tools that enable you to do your work more easily than using the Python language and its computational libraries by themselves.
You can also find a wealth of interesting Jupyter notebooks on the nbviewer website.
1 Since a module or package may be imported in many different places in a particular program, Python caches a module’s code the first time it is imported rather than executing the code in the module every time. Otherwise, modularity and good code organization could potentially cause inefficiency in an application.