Appendix C. Learning the Command Line

One powerful development tool is the ability to navigate your computer using only the command line. Whatever operating system you use, knowing how to directly interact with your computer will pay off in your data wrangling and coding career. We aren’t saying you need to become a systems administrator, but it’s good to be somewhat adept at maneuvering via the command line.

One of the greatest feelings as a developer is being able to debug both systems and code problems you encounter. Understanding and working with your computer via the command line will give you some insight into those problems. If you encounter system errors and use the debugging tips you’ve learned throughout this book, you’ll likely learn more about your own computer, the operating system you use, and how to better interact via the command line. Then, when you encounter system errors in your Python code, you’ll be one step ahead when debugging and fixing those issues.

In this appendix, we’ll cover the basics of bash (used on Macs and many Linux installations) as well as the Windows cmd and PowerShell utilities. We will only provide an introduction here, but we encourage you to continue your learning and engagement. We’ve included suggestions for further reading in each section.

Bash

If you’re using a bash-based command line, what you learn as you navigate it will be applicable to any bash-based client, regardless of the operating system you are currently using…cool! Bash is a shell (or command-line) language with a lot of functionality. Let’s get started learning bash by covering how to navigate files on your computer.

Navigation

Navigating your computer from the command line will help you understand how to do this with Python, and remaining in your terminal or text editor will keep you focused.

Let’s start with the basics. Open up your terminal. It will likely open in ~, which signifies your home directory. If you are using Linux, this is likely /<home/your_computer_name>. If you are using a Mac, it is likely /Users/<your_name>. To see what folder you are in, type the following:

pwd

You should see a response like this:

/Users/katharine

or:

/home/katharine

pwd stands for “print working directory.” You are asking bash to tell you what folder (or directory) you are in currently. This can be very helpful when first learning to navigate via the command line, especially if you want to double-check you are in the proper folder.

Another useful command is to see what files are in a folder. To see what files are in your current working directory, type:

ls

You should see a response similar to this:

Desktop/
Documents/
Downloads/
my_doc.docx
...

Depending on your operating system, the contents will vary, and they may have different colors. ls means “list.” You can also call ls with additional arguments, called flags. These arguments will change the output. Try this:

ls -l

The output should be a list of columns, the final one being the same list you saw using only ls. The -l flag shows a detailed (long) view of your directory contents. It shows you the number of files and directories contained therein, as well as the permissions, creator’s name, group ownership, size, and last-modified date of each. Here’s an example:

drwxr-xr-x  2 katharine katharine  4096 Aug 20  2014 Desktop
drwxr-xr-x 22 katharine katharine 12288 Jul 20 18:19 Documents
drwxr-xr-x 26 katharine katharine 24576 Sep 16 11:39 Downloads

This level of detail can help you determine any problems you are having with permissions, and it allows you to see file sizes and other information. ls can also list any directory you pass to it. Try checking what’s in your downloads folder (type the following):

ls -l ~/Downloads

You’ll see a long output similar to the previous output example but listing all of the files and directories in the Downloads folder.

Now that you know how to list files from different folders, let’s investigate how to change your current folder. We use the command cd to “change directory.” Try this:

cd ~/Downloads

Now when you test what folder you are in using pwd and check the files in the folder using ls you should notice you are in your downloads folder. What if you wanted to move back to your home folder? We know the home folder is the parent folder. You can navigate to a parent folder using ... Try the following:

cd ..

Now you are back in your home folder. In bash, .. means “go up/back one directory.” You can also chain these together to go back two directories, like so: cd ../...

Tip

When moving around or selecting files in your command line, you should be able to use Tab to autocomplete file and folder names. Simply press Tab after you have typed the first letter or two of the name of the file or folder you want to select, and you should see different matching options (helping you spell and complete it), or, if there are no other files with similar names, the command line will autocomplete it for you. This helps you save time and typing!

You should now feel more comfortable moving around using your command line. Next, we’ll learn about how to move and change files using the command line.

Modifying Files

Moving, copying, and creating files with bash is easy. Let’s begin by creating a new file. First, navigate to your home directory (cd ~). Then, type the following:

touch test_file.txt

After that, go ahead and type ls. You should see that there is a new file called test_file.txt. touch can be used to create files that don’t already exist. The command will look for a file of that name; if that file exists, it will update the last-modified timestamp but make no changes; if it doesn’t exist, it will create the file.

Now that we have a file to use, let’s try copying it into our downloads folder:

cp test_file.txt ~/Downloads

Here we are saying, “copy test_file.txt into ~/Downloads.” Because bash knows ~/Downloads is a folder, it will automatically copy the file into the folder. If we wanted to copy the file and change its name, we could write something like this:

cp test_file.txt ~/Downloads/my_test_file.txt

What we are doing with this command is telling bash to copy the test file into the downloads folder and call the copy my_test_file.txt. Now your downloads folder should have two copies of this test file: one with the original name, and one with this new name.

Tip

If you need to run a command more than once, you can go through your command-line history by simply pressing the up arrow key. If you want to see all recent command-line history, type history.

Sometimes you don’t want to copy files, but instead want to move them or rename them. Using bash, we can move and rename files using the same command: mv. Let’s begin by renaming the file we created in our home folder:

mv test_file.txt empty_file.txt

What we are telling bash to do here is “move the file named test_file.txt to a file named empty_file.txt.” If you use ls you should no longer see a test_file.txt, but you should now see an empty_file.txt. We have renamed our file by simply “moving” it. We can also use mv to move files between folders:

mv ~/Downloads/test_file.txt .

Here we are saying, “move the downloaded folder’s test_file.txt into here“. In bash, . stands for your working directory (just like .. stands for the folder “above” your current directory). Now, if you use ls you will notice you have a test_file.txt folder in your home folder again. You can also use ls ~/Downloads to see your downloads folder no longer has the file.

Finally, you might want to delete files using the command line. To do so, you can use the rm, or remove, command. Try the following:

rm test_file.txt

Now when you ls you’ll see you have removed the test_file.txt from the folder.

Warning

Unlike deleting files with your mouse, deleting files with the command line really deletes them. There is no “Trash” you can go into to recover them, so use rm with care and set up regular scheduled backups for your computer and code.

Now that you know how to move, rename, copy, and delete files using bash, we’ll move on to executing files from the command line.

Executing Files

Executing files using bash is fairly straightforward. As you might have already learned in Chapter 3, to execute a Python file, you simply need to run:

python my_file.py

Where my_file.py is a Python file.

Note

For most languages you use and program, simply typing the name of the language (python, ruby, R) and then the filename (with the proper file path, or file location on your computer) will work. If you are having trouble executing files using a particular language, we recommend searching the Web for “command-line options” along with the language name.

There are other execution commands you will come across as a Python developer. Table C-1 shows some of them, so you can familiarize yourself with commands you may need to install and run extra libraries.

Table C-1. Bash for execution
Command Use case More documentation

sudo

Executing the following command as a sudo or (super) user. Usually necessary if you are modifying core pieces of the filesystem or installing packages.

https://en.wikipedia.org/wiki/Sudo

bash

Executing a bash file or moving back into a bash shell.

http://ss64.com/bash/

./configure

Running configuration setup on a package (first step when installing a package from source).

https://en.wikipedia.org/wiki/GNU_build_system#GNU_Autoconf

make

Executing a makefile after configuration to compile the code and prepare for installation (second step when installing a package from source).

http://www.computerhope.com/unix/umake.htm

make install

Executing the code compiled with make and installing the package on your computer (final step when installing a package from source).

http://www.codecoffee.com/tipsforlinux/articles/27.html

wget

Executing a call to a URL and downloading the file located at that URL (good for downloading packages or files).

http://www.gnu.org/software/wget/manual/wget.html

chown

Changing ownership of a file or folder. Often used with chgrp to change the group of a file. This can be useful if you need to move files so a different user can execute them.

http://linux.die.net/man/1/chown

chmod

Changing the permissions of a file or folder, often to make it executable or available for a different type of user or group.

http://ss64.com/bash/chmod.html

As you use your command line, you’ll likely come across a variety of other commands and documentation. We recommend you take time to learn, use, and ask questions; bash is another language, and it will take time to learn its quirks and uses. Before we finish our command-line introduction, we’d like to introduce you to using bash to search for files or file contents.

Searching with the Command Line

Searching for files and searching inside files is relatively easy in bash and can be done in numerous ways. We’ll show you a few options to get started. First, we will use a command to search for text in a file. Let’s start by downloading a file using wget:

wget http://corpus.byu.edu/glowbetext/samples/text.zip

This should download a text corpus we can use to search. To unzip the text into a new folder, simply type:

mkdir text_samples
unzip text.zip text_samples/

You should now have a bunch of text corpus files in your new folder, text_samples. Change into that directory using cd text_samples. Let’s now search inside those files using a tool called grep:

grep snake *.txt

What you are telling bash to do here is search for the string snake in any file in this folder whose name ends with .txt. You can learn more about wildcard characters in “RegEx Matching”; however, * almost always stands for a wildcard and can be used to mean “any matching string.”

When you ran that command you should have seen some matching text fly by. grep will return any lines from any matching file containing the search string. This is incredibly useful if you have a large repository and you want to find which files contain the function you need to update or change, for example. grep also has some extra arguments and options you can pass if you want to print surrounding lines.

Tip

To see options for any bash command, simply type the command followed by a space and then --help. Type grep --help and read about some of the grep’s extra options and features.

Another neat tool is cat. It simply prints out the contents of whatever file you identify. This can be useful especially if you need to “pipe” output somewhere else. In bash, the | character can be used to string together a series of actions you wish to perform with your files or text. For example, let’s cat the contents of one of our files and then use grep to search the output:

cat w_gh_b.txt | grep network

What we did was first return the full text of the file w_gh_b.txt and then “pipe” that output to grep, which then searched for the word network and returned the lines containing it to our command line.

We can do the same type of pipe using our bash history. Try this:

history | grep mv

This command lets you find and reuse commands you may have forgotten as you learn bash.

Let’s take our search a step further and look for files. First, we are going to use a command called find, which looks for matching filenames and can be used to traverse child directories and search for matching files there as well. Let’s search for any text files in our current folder or child folders:

find . -name "*.txt" -type f

What we are saying here is find (starting in this folder and then going through all child folders) files with a filename that matches any string but ends in .txt and that are file type f (meaning a normal file, rather than a directory, signified by type d). You should see a list of matching filenames as output. Now, let’s pipe those files so we can grep them:

find . -name "*.txt" -type f | xargs grep neat

What we are telling bash to do here is, “find those same text files, but this time search those files for the word neat.” We use xargs so we can properly pipe the find output to grep. xargs isn’t needed for all piping, but it’s useful when using find as the find command doesn’t send output uniformly.

You’ve learned a few neat tricks for searching and finding, which can be useful, especially as the code and projects you are working with grow larger and more involved. We’ll leave you with some more resources and reading on the topic.

More Resources

There are a lot of great bash resources on the Internet. The Linux Documentation Project has a great guide for beginners which takes you through some more advanced bash programming. O’Reilly also has a great bash Cookbook that can jumpstart your learning process.

Windows CMD/Power Shell

The Windows command line (now also supplemented with PowerShell), or cmd, is a powerful DOS-based utility. You can use the same syntax across Windows versions and server instances, and learning to navigate that syntax can help you be a more powerful programmer for Python and any other languages you choose to learn.

Navigation

Navigating files with cmd is very straightforward. Let’s begin by opening the cmd utility and taking a look at our present directory. Type:

echo %cd%

What this is telling cmd is that you want to echo (or print out) %cd%, which is the current directory. You should see a response similar to this:

C:\Users\Katharine>

To list all of the files in your present directory, type the following:

dir

You should see output similar to this:

13.03.2015  16:07    <DIR>          .ipython
11.09.2015  19:05    <DIR>          Contacts
11.09.2015  19:05    <DIR>          Desktop
11.09.2015  19:05    <DIR>          Documents
11.09.2015  19:05    <DIR>          Downloads
11.09.2015  19:05    <DIR>          Favorites
10.02.2014  15:15    <DIR>          Intel
11.09.2015  19:05    <DIR>          Links
11.09.2015  19:05    <DIR>          Music
11.09.2015  19:05    <DIR>          Pictures
13.03.2015  16:26    <DIR>          pip
11.09.2015  19:05    <DIR>          Saved Games

dir also has many options you can use to sort, group, or show more information. Let’s take a look at our Desktop folder.

dir Desktop /Q

We are asking cmd to show the files in the Desktop directory, and to show the owners for those files. You should see the owner of each file as the first part of the filename (e.g., MY-LAPTOP\Katharine\backupPDF). This can be tremendously useful in seeing your folders and files. There are also some great options for showing subfolders and sorting by last-modified timestamps.

Let’s navigate to our Desktop folder. Type the following:

chdir Desktop

Now when you check your current directory by typing echo %cd%, you should see a change. To navigate to a parent folder, simply use ... For example, if we wanted to navigate to the parent folder of our current directory, we could type:

chdir ..

You can also string together these “parent folders” symbols (chdir ..\.. to go into the grandparent folder, etc.). Depending on your file structure, you may receive an error if there is no parent to your current folder (i.e., you are at the root of your filesystem).

To get back to your home directory, simply type:

chdir %HOMEPATH%

You should end up back in the first folder we used. Now that we can navigate using cmd, let’s move on to creating, copying, and modifying files.

Modifying Files

To start with, let’s create a new file we can use to modify:

echo "my awesome file" > my_new_file.txt

If you use dir to take a look at the files in your folder, you should now see my_new_file.txt. If you open that file up in your text editor, you can see we wrote “my awesome file” in the file. If you are using Atom, you can launch Atom directly from your cmd (see “Atom Shell Commands”).

Now that we have that file, let’s try copying it to a new folder:

copy my_new_file.txt Documents

Now if we list our documents using:

dir Documents

we should see my_new_file.txt was successfully copied there.

Tip

For easy typing, you can use Tab to autocomplete filenames and paths. Try it out by typing copy my and then hitting Tab. cmd should be able to guess you meant the my_new_file.txt file and fill in the file name for you.

We might also want to move or rename files. To move a file using cmd, we can use the move command. Try the following:

move Documents\my_new_file.txt Documents\my_newer_file.txt

Now, if you list the files in your Documents directory, you should see there is no longer a my_new_file.txt and now just a my_newer_file.txt. Move is useful for renaming files (as we have done here) or moving a file or folder.

Finally, you may want to remove or delete files you don’t need anymore. To do so with cmd, you can use the del command. Try the following:

del my_new_file.txt

Now, when you check your current files, you should no longer see my_new_file.txt. Note that this will completely remove the file. You want to make sure you only do this if you absolutely do not need the file. It’s also a great idea to make regular hard drive backups in case of any issues.

Now that we can modify files, let’s take a look at how to execute files from cmd.

Executing Files

To execute files from your Windows cmd, you usually need to type the language name and then the path to the file. For example, to execute a Python file, you’ll need to type:

python my_file.py

This will execute the file my_file.py as long as it’s located in the same folder. You can execute a .exe file simply by typing the full filename and path into your cmd and hitting Enter.

Note

As you did when you installed Python, you’ll need to make sure installation packages and the file paths for the installed executables are sourced in your Path variable (for details, refer to “Setting Up Python on Your Machine”). This variable keeps a list of the executable strings for your cmd.

For more powerful command-line execution, we recommend learning Windows PowerShell—a powerful scripting language used to write scripts and execute them via a simple command line. Computerworld has a great introduction to PowerShell to get started.

To run installed programs from the command line, you can use the start command. Try the following:

start "" "http://google.com"

This should open up your default browser and navigate to Google’s home page. See the start command documentation for more information.

Now that we know how to execute using the command line, let’s investigate how to search for and find files and folders on our machine.

Searching with the Command Line

Let’s begin by downloading a corpus we can use. If you have Windows Vista or newer, you should be able to execute PowerShell commands. Try loading PowerShell by typing the following:

powershell

You should see a new prompt that looks similar to this:

Windows PowerShell
...
PS C:\Users\Katharine>

Now that we’re in PowerShell, let’s try downloading a file we want to use for searching (note that this and the following command should be entered on a single line; the commands are wrapped here to fit page constraints):

Invoke-WebRequest -OutFile C:\Downloads\text.zip
 http://corpus.byu.edu/glowbetext/samples/text.zip

If you don’t have the PowerShell version 3.0 or above, the command will throw an error. If you receive an error, try the following command, which will work for older versions of PowerShell:

(new-object System.Net.WebClient).DownloadFile(
'http://corpus.byu.edu/glowbetext/samples/text.zip','C:\Downloads\text.zip')

These commands use PowerShell to download a word corpus file to your computer. Let’s create a new directory to unzip the files:

mkdir Downloads\text_examples

Now we are going to add a new function to PowerShell to extract our zipped file. Type the following:

Add-Type -AssemblyName System.IO.Compression.FileSystem
function Unzip
{
    param([string]$zipfile, [string]$outpath)

        [System.IO.Compression.ZipFile]::ExtractToDirectory($zipfile, $outpath)
}

This function is now defined and we can use it to unzip files. Try unzipping the downloaded content into the new folder:

Unzip Downloads\text.zip Downloads\text_examples

To exit PowerShell, simply type exit. Your prompt should return to the normal cmd prompt. If you use dir Downloads\text_examples, you should have a list of text files from the corpus download. Let’s use findstr to search within those files:

findstr "neat" Downloads\text_examples\*.txt

You should see a bunch of text output fly by in your console. Those are the matching lines of the text files that have the word neat in them.

Sometimes you want to search for a particular filename, not a string in a file. To do that, you need to use the dir command, but with a filter:

dir -r -filter "*.txt"

This should find all of your .txt files in folders contained within your home folder. If you need to search within those files, you can use piping. A | character “pipes” your output from the first command into the next command. We can use this to, say, find all Python files with a particular function name in them, or find all CSV files containing a particular country name. Let’s pipe our findstr output into our find command to try it out:

findstr /s "snake" *.txt | find /i "snake" /c

This code searches for text files that contain the word snake and then uses find to count the number of occurrences of the word snake in these files. As you can see, learning more cmd commands and usage will help greatly in simplifying tasks like searching for files, executing code, and managing your work as a data wrangler and developer. This appendix has helped introduce you to some of these topics, and is a good stepping stone to learn more.

More Resources

There are some great online resources for looking up cmd commands to read through for learning how to use cmd for your daily programming and data wrangling needs.

If you’d like to learn more about PowerShell and how to use it to create powerful scripts for your Windows servers and computers, take a look at some tutorials, like Microsoft’s Getting Started with PowerShell 3.0. There is also an O’Reilly Windows Powershell Cookbook to get you started on writing your first scripts.