Index
Symbols
- $ (Mac/Linux prompt), Test Driving Python
- %logstart command, Why Clean Data?
- %save, Why Clean Data?
- .bashrc, Updating your .bashrc
- .gitignore files, Scripting Your Cleanup
- .pem file, Prepare Your Private Key
- =, = Versus == Versus is, and When to Just Copy-= Versus == Versus is, and When to Just Copy
- ==, How to Import XML Data, = Versus == Versus is, and When to Just Copy-= Versus == Versus is, and When to Just Copy
- > (Windows prompt), Test Driving Python
- >>> (Python prompt), Test Driving Python
- \ (escape), Converting PDF to Text
A
- ActionChains, Screen Reading with Selenium
- addition, Numerical Methods: Things Numbers Can Do
- Africa, data sources from, Africa
- agate library, Exploring Your Data-Further Exploration
- aggregate method, Creating Groupings
- Airbrake, Logging and exceptions
- Amazon Machine Image (AMI), AWS Step 1: Choose an Amazon Machine Image (AMI)
- Ansible, Ansible: Operations Automation
- APIs (application programming interfaces), APIs-Summary
- arguments, How to Import CSV Data, Parsing PDFs Using pdfminer
- Asia, data sources from, Asia
- Atom, Install a Code Editor
- Atom Shell commands, Modifying Files
- attrib method, How to Import XML Data
- audience, identifying, Know Your Audience
- autocompletion, Tab key for, Converting PDF to Text
- automation, Automation and Scaling-Summary
- basic steps for, Steps to Automate-Steps to Automate
- command-line arguments for, Command-line arguments
- config files for, Config files-Config files
- distributed processing for, Using Distributed Processing
- email, Email-Email
- errors and issues, What Could Go Wrong?-What Could Go Wrong?
- large-scale, Large-Scale Automation-Ansible: Operations Automation
- local files, Local files-Local files
- logging, Python Logging-Python Logging
- logging as a service, Logging and Monitoring as a Service
- messaging, Adding Automated Messaging-Chat integration
- monitoring of, Monitoring Your Automation-Logging and monitoring
- of operations with Ansible, Ansible: Operations Automation
- parallel processing for, Using Parallel Processing-Using Parallel Processing
- Python logging, Python Logging-Python Logging
- questions to clarify process, Steps to Automate
- queue-based (Celery), Celery: Queue-Based Automation-Celery: Queue-Based Automation
- reasons for, Why Automate?-Why Automate?
- script location, Where to Automate
- sharing code with Jupyter notebooks, Jupyter Notebooks
- simple, Simple Automation-Jupyter Notebooks
- special tools for, Special Tools for Automation-Using Distributed Processing
- uploading, Uploading and Other Reporting
- using cloud for data processing, Using the Cloud for Data Processing-Using Git to deploy Python
- when not to automate, No System Is Foolproof
- with cron, CronJobs-CronJobs
- with web interfaces, Web Interfaces
- AWS (Amazon Web Services), Using the Cloud for Data Processing, Web Interfaces, Using Amazon Web Services-Summary
B
- backup strategies, Storing Your Data: When, Why, and How?
- bad data, Finding Outliers and Bad Data-Finding Outliers and Bad Data
- bar chart, Charts
- bash, Bash-More Resources
- Beautiful Soup, Reading a Web Page with Beautiful Soup-Reading a Web Page with Beautiful Soup
- beginners, Python resources for, What to Do If You Get Stuck, Why Python, Python Resources for Beginners
- best practices, Scripting Your Cleanup
- bias, Avoiding Storytelling Pitfalls
- binary mode, How to Import CSV Data
- blocks, indented, How to Import CSV Data
- blogs, Your own blog
- Bokeh, Charting with Bokeh-Charting with Bokeh
- Booleans, Integers
- Boston Python, In-Person Groups
- Bottle, Web Interfaces
- browser-based parsing, Browser-Based Parsing-Screen Reading with Ghost.Py
- built-in functions/methods, Python Scope and Built-Ins: The Importance of Variable Names
- built-in tools, Helpful Tools: type, dir, and help-help
C
- C++, Python vs., C, C++, and Java Versus Python
- C, Python vs., C, C++, and Java Versus Python
- calling variables, Variables
- Canada, data sources from, South America and Canada
- capitalization, Saving the Code to a File; Running from Command Line-Saving the Code to a File; Running from Command Line
- case sensitivity, Saving the Code to a File; Running from Command Line-Saving the Code to a File; Running from Command Line
- cat command, Searching with the Command Line
- cd command, Install pip, Saving the Code to a File; Running from Command Line, Converting PDF to Text, Navigation
- Celery, Celery: Queue-Based Automation-Celery: Queue-Based Automation
- Central Asia, data sources from, Non-EU Europe, Central Asia, India, the Middle East, and Russia
- charts/charting, Charts-Charting with Bokeh
- chat, automated messaging with, SMS and voice
- chdir command, Navigation
- chmod command, Executing Files, Prepare Your Private Key
- chown command, Executing Files
- cloud
- cmd, Windows CMD/Power Shell-More Resources
- code
- code blocks, indented, How to Import CSV Data
- code editor, Install a Code Editor
- coding best practices, Scripting Your Cleanup
- command line
- command-line arguments, automation with, Command-line arguments
- command-line shortcuts, Step 3: (Mac Only) Tell Your System Where to Find Homebrew
- commands, Bash-Searching with the Command Line
- cat, Searching with the Command Line
- cd, Install pip, Saving the Code to a File; Running from Command Line, Converting PDF to Text, Navigation
- chdir, Navigation
- chmod, Executing Files, Prepare Your Private Key
- chown, Executing Files
- cp, Modifying Files
- del, Modifying Files
- dir, dir-dir, Navigation, Searching with the Command Line
- echo, Modifying Files-Searching with the Command Line
- find, How to Import XML Data, Searching with the Command Line
- history, Modifying Files, Searching with the Command Line
- if and fi, Step 3: (Mac Only) Tell Your System Where to Find Homebrew
- ls, Saving the Code to a File; Running from Command Line, Navigation-Modifying Files, Step 3: (Mac Only) Tell Your System Where to Find Homebrew
- make and make install, Executing Files
- move, Modifying Files
- pwd, Install pip, Saving the Code to a File; Running from Command Line, Navigation, Modifying Files
- rm, Modifying Files
- sudo, Install pip, Executing Files
- touch, Modifying Files
- unzip, Searching with the Command Line, Searching with the Command Line
- wget, Executing Files
- comments, Getting Started with Parsing
- communications officials, Using a Telephone
- comparison operators, = Versus == Versus is, and When to Just Copy-= Versus == Versus is, and When to Just Copy
- config files, Config files-Config files
- containers, What to Scrape and How
- copy method, = Versus == Versus is, and When to Just Copy
- copyrights, What to Scrape and How
- correlations, Identifying Correlations
- counters, Getting Started with Parsing
- cp command, Modifying Files
- Crawl Spider, Building a Spider with Scrapy
- cron, CronJobs-CronJobs
- crowdsourced data, Crowdsourced Data and APIs
- CSS (Cascading Style Sheets), Style basics-Style basics, A Case for XPath-A Case for XPath
- CSV data, CSV Data-Saving the Code to a File; Running from Command Line
- csv library, How to Import CSV Data
- cursor (class name), Advanced Data Collection from Twitter’s REST API
D
- data
- CSV, CSV Data-Saving the Code to a File; Running from Command Line
- Excel, Working with Excel Files-Summary
- formatting, Formatting Data-Formatting Data
- importing, Importing Data-Importing Data
- JSON, JSON Data-How to Import JSON Data
- machine-readable, Data Meant to Be Read by Machines-Summary
- manual cleanup exercise, Exercise: Clean the Data Manually
- from PDFs, PDFs and Problem Solving in Python-Summary
- publishing, Publishing Your Data-Shared Jupyter notebooks
- saving, Saving Your Data-Saving Your Data
- XML, XML Data-How to Import XML Data
- data acquisition, Acquiring and Storing Data-Child Labor
- and fact checking, Fact Checking
- case studies, Case Studies: Example Data Investigation-Child Labor
- checking for readability, cleanliness, and longevity, Readability, Cleanliness, and Longevity
- determining quality of data, Not All Data Is Created Equal
- from US government, US Government Data
- locating sources for, Where to Find Data-Crowdsourced Data and APIs
- locating via telephone, Using a Telephone
- smell test for new data, Not All Data Is Created Equal
- data analysis, Analyzing Your Data-Separating and Focusing Your Data
- data checking
- data cleanup, Data Cleanup: Investigation, Matching, and Formatting-Summary
- basics, Data Cleanup Basics-Summary
- determining right type of, Determining What Data Cleanup Is Right for Your Project
- finding duplicates, Finding Duplicates-What to Do with Duplicate Records
- finding outliers/bad data, Finding Outliers and Bad Data-Finding Outliers and Bad Data
- fuzzy matching, Fuzzy Matching-Fuzzy Matching
- identifying values for, Identifying Values for Data Cleanup-Zipping questions and answers
- normalizing, Normalizing and Standardizing Your Data-Normalizing and Standardizing Your Data
- reasons for, Data Cleanup: Investigation, Matching, and Formatting-Summary
- regex matching, RegEx Matching-RegEx Matching
- replacing headers, Replacing headers-Replacing headers
- saving cleaned data, Saving Your Data-Saving Your Data
- scripting, Scripting Your Cleanup-Scripting Your Cleanup
- standardizing, Normalizing and Standardizing Your Data-Normalizing and Standardizing Your Data
- testing with new data, Testing with New Data
- working with duplicate records, What to Do with Duplicate Records-What to Do with Duplicate Records
- zip method, Zipping questions and answers-Zipping questions and answers
- data containers, Data Containers-Dictionaries
- data exploration, Data Exploration and Analysis-Documenting Your Conclusions
- data presentation, Presenting Your Data-Summary
- avoiding storytelling pitfalls, Avoiding Storytelling Pitfalls-Know Your Audience
- charts, Charts-Charting with Bokeh
- images, Images, Video, and Illustrations
- interactives, Interactives
- maps, Maps-Maps
- publishing your data, Publishing Your Data-Shared Jupyter notebooks
- time-related data, Time-Related Data
- tools for, Presentation Tools
- video, Images, Video, and Illustrations
- visualization, Visualizing Your Data-Images, Video, and Illustrations
- with illustrations, Images, Video, and Illustrations
- with Jupyter, Jupyter (Formerly Known as IPython Notebooks)-Shared Jupyter notebooks
- with words, Words
- data processing, cloud-based, Using the Cloud for Data Processing-Using Git to deploy Python
- data storage, Storing Your Data: When, Why, and How?-Alternative Data Storage
- data types, Basic Data Types-Floats, decimals, and other non–whole number types
- and methods, What Can the Various Data Types Do?-Dictionary Methods: Things Dictionaries Can Do
- capabilities of, What Can the Various Data Types Do?-Dictionary Methods: Things Dictionaries Can Do
- decimals, Floats, decimals, and other non–whole number types
- dictionary methods, Dictionary Methods: Things Dictionaries Can Do
- floats, Floats, decimals, and other non–whole number types
- integers, Integers
- list methods, List Methods: Things Lists Can Do
- non-whole number types, Floats, decimals, and other non–whole number types-Floats, decimals, and other non–whole number types
- numerical methods, Numerical Methods: Things Numbers Can Do
- string methods, String Methods: Things Strings Can Do
- strings, Strings
- data wrangling
- databases, Databases: A Brief Introduction-Setting Up Your Local Database with Python
- Datadog, Logging and monitoring
- Dataset (wrapper library), Setting Up Your Local Database with Python
- datasets
- datetime module, Formatting Data
- debugging, Test Driving Python, Catching Multiple Exceptions
- decimal module, Floats, decimals, and other non–whole number types
- decimals, Floats, decimals, and other non–whole number types
- default function arguments, Default Function Arguments
- default values, arguments with, Parsing PDFs Using pdfminer
- del command, Modifying Files
- delimiters, help
- deprecation, How to Import XML Data
- dictionaries, Dictionaries
- dictionary methods, Dictionary Methods: Things Dictionaries Can Do
- dictionary values method, Replacing headers
- DigitalOcean, Web Interfaces
- dir command, dir-dir, Navigation, Searching with the Command Line
- directory, for project-related content, Step 6: Set Up a New Directory
- distributed processing, Using Distributed Processing
- Django, Web Interfaces
- DNS name, public, Get the Public DNS Name of the Instance
- documentation
- DOM (Document Object Model), Inspection: Markup Structure
- Dropbox, Cloud-Storage and Python
- duplicate records, Finding Duplicates-Finding Duplicates
E
- echo command, Modifying Files-Searching with the Command Line
- Element objects, How to Import XML Data
- ElementTree, How to Import XML Data
- Emacs, Install a Code Editor
- email, automation of, Email-Email
- emojis, Reading a Web Page with LXML
- enumerate function, Zipping questions and answers
- errors, Joining Numerous Datasets
- escaping characters (\), Converting PDF to Text
- etree objects, Reading a Web Page with LXML
- European Union, data sources from, EU and UK
- Excel
- except block, Joining Numerous Datasets, Joining Numerous Datasets
- exception handling, Joining Numerous Datasets
- exception method, Python Logging
- extract method, Fuzzy Matching
F
- Fabric, Large-Scale Automation
- Facebook chat, Chat integration
- fact checking, Fact Checking
- files
- find command, How to Import XML Data, Searching with the Command Line
- findall method, How to Import XML Data, RegEx Matching
- Flask, Web Interfaces
- floats, Floats, decimals, and other non–whole number types
- FOIA (Freedom of Information Act) requests, US Government Data
- folders, Data Meant to Be Read by Machines
- for loops, How to Import CSV Data
- format method, Formatting Data
- formatting data, Formatting Data-Formatting Data
- Freedom of Information Act (FOIA) requests, US Government Data
- functions, How to Import CSV Data
- fuzzy matching, Fuzzy Matching-Fuzzy Matching
G
- GCC (GNU Compiler Collection), Step 1: Install GCC
- get_config function, Email
- get_tables function, Exercise: Use Table Extraction, Try a Different Library, Joining Numerous Datasets
- Ghost, Ghost
- Ghost.py, Screen Reading with Ghost.Py-Screen Reading with Ghost.Py
- GhostDriver, Selenium and headless browsers
- GIL (Global Interpreter Lock), The Dreaded GIL
- Git, Scripting Your Cleanup, Using Git to deploy Python-Using Git to deploy Python
- GitHub Pages, GitHub Pages and Jekyll
- global private variables, Scripting Your Cleanup
- Google API, APIs
- Google Chat, Chat integration
- Google Drive, Cloud-Storage and Python
- Google Slides, Presentation Tools
- government data
- groupings, creating, Creating Groupings-Creating Groupings
H
- Hadoop, Alternative Data Storage
- Haiku Deck, Presentation Tools
- hashable values, Finding Duplicates
- HDF (Hierarchical Data Format), Alternative Data Storage
- headers
- headless browsers
- help method, help
- Heroku, One-click deploys, Web Interfaces
- Hexo, GitHub Pages and Jekyll
- Hierarchical Data Format (HDF), Alternative Data Storage
- HipChat, Chat integration
- HipLogging, Chat integration
- history command, Modifying Files, Searching with the Command Line
- Homebrew
- HTML, Python vs., HTML Versus Python
- HypChat, Chat integration
I
- if and fi commands, Step 3: (Mac Only) Tell Your System Where to Find Homebrew
- if not statements, Finding Outliers and Bad Data
- if statements, How to Import XML Data
- if-else statements, How to Import XML Data
- illustrations (visual data presentation), Images, Video, and Illustrations
- images, Images, Video, and Illustrations
- immutable objects, Changing Immutable Objects
- implicitly_wait method, Screen Reading with Selenium
- import errors, Test Driving Python
- import statements, How to Import XML Data
- importing data, Importing Data-Importing Data
- in method, Replacing headers
- indented code blocks, closing, How to Import CSV Data
- index method, Zipping questions and answers
- indexing
- India, data sources from, Non-EU Europe, Central Asia, India, the Middle East, and Russia
- inheritance, Building a Spider with Scrapy-Building a Spider with Scrapy
- innerHTML attribute, Screen Reading with Selenium
- installation (see setup)
- instance type, AWS, AWS Step 2: Choose an Instance Type
- integers, Integers
- interactives, Interactives
- internal methods, dir
- IPython, IPython Hints-Final Thoughts: A Simpler Terminal
- is (comparison operator), = Versus == Versus is, and When to Just Copy
- iterators, Importing Data
- itersiblings method, Reading a Web Page with LXML
J
- Java, Python vs., C, C++, and Java Versus Python
- JavaScript console
- JavaScript, Python vs., JavaScript Versus Python
- Jekyll, GitHub Pages and Jekyll
- join method, Joining Numerous Datasets
- jQuery, jQuery and JavaScript-jQuery and JavaScript
- JSON data, JSON Data-How to Import JSON Data
- Jupyter, Jupyter (Formerly Known as IPython Notebooks)-Shared Jupyter notebooks
L
- lambda function, Exploring Table Functions
- latency, Networks: How the Internet Works and Why It’s Breaking Your Script
- legal issues, What to Scrape and How
- libraries (packages), IPython Hints
- line chart, Charts
- LinkedIn API, APIs
- Linux
- installing Python on, Setting Up Python on Your Machine
- learning about new environment, Learning About Our New Environment (Windows, Mac, Linux)-Learning About Our New Environment (Windows, Mac, Linux)
- virtual environment testing, Testing Your Virtual Environment (Windows, Mac, Linux)
- virtualenv installation, Step 5: Install virtualenv (Windows, Mac, Linux)
- virtualenvwrapper installation, Installing virtualenvwrapper (Mac and Linux)
- list generators, Replacing headers
- list indexes, How to Import XML Data
- list methods, List Methods: Things Lists Can Do
- lists, Lists-Lists
- local files, automation with, Local files-Local files
- logging
- logging module, Python Logging
- Loggly, Logging and exceptions
- Logstash, Logging and exceptions
- ls command, Saving the Code to a File; Running from Command Line, Navigation-Modifying Files, Step 3: (Mac Only) Tell Your System Where to Find Homebrew
- Luigi, Large-Scale Automation
- LXML
M
- Mac OS X
- Homebrew installation, Step 2: (Mac Only) Install Homebrew
- installing Python on, Mac OS X
- learning about new environment, Learning About Our New Environment (Windows, Mac, Linux)-Learning About Our New Environment (Windows, Mac, Linux)
- Python 2.7 installation, Step 4: Install Python 2.7
- telling system where to find Homebrew, Step 3: (Mac Only) Tell Your System Where to Find Homebrew-Step 3: (Mac Only) Tell Your System Where to Find Homebrew
- virtual environment testing, Testing Your Virtual Environment (Windows, Mac, Linux)
- virtualenv installation, Step 5: Install virtualenv (Windows, Mac, Linux)
- virtualenvwrapper installation, Installing virtualenvwrapper (Mac and Linux)
- Mac prompt ($), Test Driving Python
- machine-readable data, Data Meant to Be Read by Machines-Summary
- magic commands, Why Clean Data?
- magic functions, Magic Functions-Magic Functions
- main function, Scripting Your Cleanup
- make and make install commands, Executing Files
- markup patterns, A Case for XPath-A Case for XPath
- match method (regex library), RegEx Matching
- math libraries, Floats, decimals, and other non–whole number types
- MATLAB, Python vs., R or MATLAB Versus Python
- matplotlib, Charting with matplotlib-Charting with matplotlib
- medical datasets, Medical and Scientific Data
- Medium.com, Medium
- Meetup (website), In-Person Groups
- messaging, automation of, Adding Automated Messaging-Chat integration
- methods, How to Import CSV Data
- Middle East, data sources from, Non-EU Europe, Central Asia, India, the Middle East, and Russia
- modules (term), Floats, decimals, and other non–whole number types
- MongoDB, MongoDB with Python
- monitoring, logging and, Logging and monitoring
- move command, Modifying Files
- moving files, Modifying Files
- MySQL, Relational Databases: MySQL and PostgreSQL-MySQL and Python
N
- NA responses, Finding Outliers and Bad Data
- nested for loop, Getting Started with Parsing
- Network tabs, Network/Timeline: How the Page Loads-Network/Timeline: How the Page Loads
- networks, Internet, Networks: How the Internet Works and Why It’s Breaking Your Script-Networks: How the Internet Works and Why It’s Breaking Your Script
- New Relic, Logging and monitoring
- newline characters, Parsing PDFs Using pdfminer
- Node.js, Python vs., Node.js Versus Python
- non-governmental organizations (NGOs), datasets from, Organization and Non-Government Organization (NGO) Data
- nonrelational databases, Non-Relational Databases: NoSQL-Setting Up Your Local Database with Python
- nose, Testing with New Data
- NoSQL, Non-Relational Databases: NoSQL
- numbers, Integers, Floats, decimals, and other non–whole number types
- numpy library, Finding Duplicates, Further Exploration
P
- packages (see libraries)
- parallel processing, Using Parallel Processing-Using Parallel Processing
- pdfminer, Parsing PDFs Using pdfminer-Parsing PDFs Using pdfminer
- PDFs, PDFs and Problem Solving in Python-Summary
- converting to text, Converting PDF to Text
- opening/reading with slate, Programmatic Approaches to PDF Parsing-Opening and Reading Using slate
- parsing tools, Programmatic Approaches to PDF Parsing
- parsing with pdfminer, Parsing PDFs Using pdfminer-Parsing PDFs Using pdfminer
- parsing with Tabula, Exercise: Try Another Tool-Exercise: Try Another Tool
- problem-solving exercises, Learning How to Solve Problems-Exercise: Try Another Tool
- programmatic approaches to parsing, Programmatic Approaches to PDF Parsing-Converting PDF to Text
- table extraction exercise, Exercise: Use Table Extraction, Try a Different Library-Exercise: Use Table Extraction, Try a Different Library
- things to consider before using data from, Avoid Using PDFs!
- Pelican, GitHub Pages and Jekyll
- PhantomJS, Selenium and headless browsers
- pip, Test Driving Python, Installing Python Packages
- PostgreSQL, PostgreSQL and Python
- PowerShell, Searching with the Command Line-Searching with the Command Line
- Prezi, Presentation Tools
- private key, AWS, Prepare Your Private Key
- private methods, dir
- process module, Fuzzy Matching
- prompt, Python vs. system, Test Driving Python
- public DNS name, Get the Public DNS Name of the Instance
- publishing data, Publishing Your Data-Shared Jupyter notebooks
- creating a site for, Open Source Platforms: Starting a New Site
- on Medium, Medium
- on pre-existing sites, Using Available Sites-Your own blog
- on Squarespace, Easy-to-start sites: WordPress, Squarespace
- on WordPress, Easy-to-start sites: WordPress, Squarespace
- on your own blog, Your own blog
- one-click deploys for, One-click deploys
- open source platforms for, Open Source Platforms: Starting a New Site
- with Ghost, Ghost
- with GitHub Pages, GitHub Pages and Jekyll
- with Jekyll, GitHub Pages and Jekyll
- with Jupyter, Jupyter (Formerly Known as IPython Notebooks)-Shared Jupyter notebooks
- pwd command, Install pip, Saving the Code to a File; Running from Command Line, Navigation, Modifying Files
- PyData, In-Person Groups
- pygal, Maps
- pylab charts, Charting with matplotlib
- PyLadies, In-Person Groups
- PyPI, Installing Python Packages
- pyplot, Charting with matplotlib
- pytest, Testing with New Data
- Python
- advanced setup, Advanced Python Setup-Advanced Setup Review
- basics, Python Basics-Summary
- beginner's resources, What to Do If You Get Stuck, Why Python, Python Resources for Beginners
- choosing version of, Which Python Version
- getting started with, Getting Started with Python-Optional: Install IPython
- idiosyncrasies, Python Gotchas-The Power of Debugging
- installation, Step 4: Install Python 2.7
- launching, Python Basics
- reasons for using, Preface, Why Python
- setup, Setting Up Python on Your Machine-Windows 8 and 10
- test driving, Test Driving Python-Test Driving Python
- version 2.7 vs. 3.4, Which Python Version
- Python prompt (>>>), system prompt vs., Test Driving Python
R
- R, Python vs., R or MATLAB Versus Python
- range() function, Getting Started with Parsing
- rate limits, Rate Limits
- ratio function, Fuzzy Matching
- Read the Docs (website), Online Resources
- read-only files, How to Import CSV Data
- reader function, How to Import JSON Data
- regular expressions (regex), Opening and Reading Using slate, RegEx Matching-RegEx Matching
- relational databases, Relational Databases: MySQL and PostgreSQL-PostgreSQL and Python
- remove method, Zipping questions and answers
- removing files, Modifying Files
- renaming files, Modifying Files
- reports, automated uploading of, Uploading and Other Reporting
- requests, web page, Getting Pages: How to Request on the Internet-Getting Pages: How to Request on the Internet
- REST APIs
- return statement, Parsing PDFs Using pdfminer
- rm command, Modifying Files
- robots.txt file, In-Depth Analysis of a Page, A (Few) Word(s) of Caution
- Rollbar, Logging and exceptions
- round-trip latency, Networks: How the Internet Works and Why It’s Breaking Your Script
- Ruby/Ruby on Rails, Python vs., Ruby and Ruby on Rails Versus Python
- Russia, data sources from, Non-EU Europe, Central Asia, India, the Middle East, and Russia
S
- SaltStack, Large-Scale Automation
- scatter charts, Charting with Bokeh
- scatter method, Charting with Bokeh
- scientific datasets, Medical and Scientific Data
- scope, Python Scope and Built-Ins: The Importance of Variable Names
- Scrapely, Crawling Whole Websites with Scrapy
- Scrapy, Building a Spider with Scrapy-Crawling Whole Websites with Scrapy
- screen reading, Browser-Based Parsing
- scripting
- search method, RegEx Matching
- Selenium
- Selenium ActionChains, Screen Reading with Selenium
- Sentry, Logging and exceptions
- separators, help
- setup
- advanced, Advanced Python Setup-Advanced Setup Review
- code editor, Install a Code Editor
- directory for project-related content, Step 6: Set Up a New Directory
- GCC installation, Step 1: Install GCC
- Homebrew, Step 2: (Mac Only) Install Homebrew-Step 3: (Mac Only) Tell Your System Where to Find Homebrew
- IPython, Optional: Install IPython, Getting Started with IPython
- learning about new environment, Learning About Our New Environment (Windows, Mac, Linux)-Learning About Our New Environment (Windows, Mac, Linux)
- libraries (packages), Step 4: Install Python 2.7
- Mac, Mac OS X
- pip, Test Driving Python
- Python, Setting Up Python on Your Machine-Windows 8 and 10, Step 4: Install Python 2.7
- Python 2.7 installation, Step 4: Install Python 2.7
- sudo, Install pip
- virtual environment testing, Testing Your Virtual Environment (Windows, Mac, Linux)
- virtualenv installation, Step 5: Install virtualenv (Windows, Mac, Linux)
- virtualenvwrapper installation, Step 7: Install virtualenvwrapper
- virtualenvwrapper-win installation, Installing virtualenvwrapper-win (Windows)
- Windows, Setting Up Python on Your Machine, Windows 8 and 10-Windows 8 and 10
- set_field_value method, Screen Reading with Ghost.Py
- shortcuts, command-line, Step 3: (Mac Only) Tell Your System Where to Find Homebrew
- slate library, Programmatic Approaches to PDF Parsing-Opening and Reading Using slate
- SleekXMPP, Chat integration
- slicing, Getting Started with Parsing
- smell test, Not All Data Is Created Equal
- SMS automation, SMS and voice
- South America, data sources from, South America and Canada
- Spark, Using Distributed Processing
- Spider class, Building a Spider with Scrapy
- spiders, Spidering the Web-Crawling Whole Websites with Scrapy
- SQLite, Setting Up Your Local Database with Python-Setting Up Your Local Database with Python
- Squarespace, Easy-to-start sites: WordPress, Squarespace
- Stack Overflow (website), Online Resources
- stacked chart, Charts
- startproject command, Building a Spider with Scrapy
- statistical libraries, Further Exploration
- storytelling
- streaming APIs
- strftime method, Formatting Data
- string methods, String Methods: Things Strings Can Do
- strings
- strip method, What Can the Various Data Types Do?
- strptime method, Formatting Data
- Sublime Text, Install a Code Editor
- subtraction, Numerical Methods: Things Numbers Can Do
- sudo command, Install pip, Executing Files
- syntax errors, Test Driving Python
- sys module, Command-line arguments
- system prompt, Python prompt vs., Test Driving Python
T
- Tab key, autocompletion with, Converting PDF to Text
- table extraction exercise, Exercise: Use Table Extraction, Try a Different Library-Exercise: Use Table Extraction, Try a Different Library
- table functions (agate), Exploring Table Functions-Exploring Table Functions
- table joins, Joining Numerous Datasets
- Tabula, Exercise: Try Another Tool-Exercise: Try Another Tool
- tag attributes, XML Data, Reading a Web Page with LXML
- tags, XML Data
- target audience, identifying, Know Your Audience
- telephone messages, automating, SMS and voice
- telephone, locating data via, Using a Telephone
- terminal development
- text messages, automation for, SMS and voice
- text, converting PDFs to, Converting PDF to Text
- time series data, Time series data
- time-related data, Time-Related Data
- timeline data, Timeline data
- Timeline tabs, Network/Timeline: How the Page Loads-Network/Timeline: How the Page Loads
- token, API, API Keys and Tokens-Creating a Twitter API key and access token
- tools
- touch command, Modifying Files
- trademarks, What to Scrape and How
- try block, Joining Numerous Datasets
- TSV, CSV Data
- tuples, Parsing PDFs Using pdfminer
- Twillo, SMS and voice
- Twitter, Introduction to Python
- type checking, Type Checking
- type method, type
U
- United Kingdom, data sources from, EU and UK
- unittest, Testing with New Data
- universities, datasets from, Education and University Data
- unsupported code, Exercise: Use Table Extraction, Try a Different Library
- unzip command, Searching with the Command Line, Searching with the Command Line
- upper method, String Methods: Things Strings Can Do
V
- Vagrant, Large-Scale Automation
- values, Python dictionary, Dictionaries
- variables, Variables-Variables, Type Checking
- version (Python), choosing, Which Python Version
- Vi, Install a Code Editor
- video, Images, Video, and Illustrations
- Vim, Install a Code Editor
- virtual environment
- virtualenv, Step 5: Install virtualenv (Windows, Mac, Linux)
- virtualenvwrapper
- virtualenvwrapper-win, Installing virtualenvwrapper-win (Windows)
- visualization of data, Visualizing Your Data-Images, Video, and Illustrations
- charts, Charts-Charting with Bokeh
- images, Images, Video, and Illustrations
- interactives, Interactives
- maps, Maps-Maps
- time-related data, Time-Related Data
- video, Images, Video, and Illustrations
- with illustrations, Images, Video, and Illustrations
- with words, Words
- voice message automation, SMS and voice
W
- web interfaces, Web Interfaces
- web page analysis, Analyzing a Web Page-In-Depth Analysis of a Page
- web pages
- web scraping
- advanced techniques, Advanced Web Scraping: Screen Scrapers and Spiders-The Changing Web (or Why Your Script Broke)
- and network problems, Networks: How the Internet Works and Why It’s Breaking Your Script-Networks: How the Internet Works and Why It’s Breaking Your Script
- basics, Web Scraping: Acquiring and Storing Data from the Web-Summary
- browser-based parsing, Browser-Based Parsing-Screen Reading with Ghost.Py
- ethical issues, A (Few) Word(s) of Caution
- legal issues, What to Scrape and How, A (Few) Word(s) of Caution
- reading web pages with Beautiful Soup, Reading a Web Page with Beautiful Soup-Reading a Web Page with Beautiful Soup
- reading web pages with LXML, Reading a Web Page with LXML-A Case for XPath
- screen reading with Ghost.py, Screen Reading with Ghost.Py-Screen Reading with Ghost.Py
- screen reading with Selenium, Screen Reading with Selenium-Selenium and headless browsers
- simple text scraping, What to Scrape and How-What to Scrape and How
- web page analysis, Analyzing a Web Page-In-Depth Analysis of a Page
- web page requests, Getting Pages: How to Request on the Internet-Getting Pages: How to Request on the Internet
- with Scrapy, Building a Spider with Scrapy-Crawling Whole Websites with Scrapy
- with spiders, Spidering the Web-Crawling Whole Websites with Scrapy
- with XPath, A Case for XPath-A Case for XPath
- wget command, Executing Files
- where function, Exploring Table Functions
- whitespace, help, Saving the Code to a File; Running from Command Line-Saving the Code to a File; Running from Command Line, Hail the Whitespace
- Windows
- installing Python on, Setting Up Python on Your Machine, Windows 8 and 10-Windows 8 and 10
- learning about new environment, Learning About Our New Environment (Windows, Mac, Linux)-Learning About Our New Environment (Windows, Mac, Linux)
- virtual environment testing, Testing Your Virtual Environment (Windows, Mac, Linux)
- virtualenv installation, Step 5: Install virtualenv (Windows, Mac, Linux)
- virtualenvwrapper-win installation, Installing virtualenvwrapper-win (Windows)
- Windows 8, Windows 8 and 10-Windows 8 and 10
- Windows command line, Windows CMD/Power Shell-More Resources
- Windows PowerShell, Searching with the Command Line-Searching with the Command Line
- Windows prompt (>), Test Driving Python
- WordPress, Easy-to-start sites: WordPress, Squarespace
- wrapper libraries, Setting Up Your Local Database with Python