Index

A note on the digital index

A link in an index entry is displayed as the section title in which that entry appears. Because some sections have multiple index markers, it is not unusual for an entry to have several links to the same section. Clicking on any link will take you directly to the place in the text in which the marker appears.

Symbols

%*% (percent, asterisk, percent), for matrix multiplication, A Brief Introduction to Distance Metrics and Multidirectional Scaling
? (question mark) syntax, for R help, Loading libraries and the data
?? (question mark, double) syntax, for R help, Loading libraries and the data

B

baseline model, for linear regression, The Baseline ModelThe Baseline Model
Bayesian classifier, Writing Our First Bayesian Spam Classifier (see Naive Bayes classifier)
bell curve, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization
modes in, Exploratory Data VisualizationExploratory Data Visualization
types of, Exploratory Data VisualizationExploratory Data Visualization
verifying with density plots, Exploratory Data VisualizationExploratory Data Visualization
verifying with different binwidths, Exploratory Data VisualizationExploratory Data Visualization
bimodal, Exploratory Data Visualization
binary classification, This or That: Binary ClassificationThis or That: Binary Classification, Logistic Regression to the RescueLogistic Regression to the Rescue
(see also spam detection case study)
book popularity prediction case study, Text RegressionLogistic Regression to the Rescue
books and publications, How This Book Is Organized, Further Reading on R, Social Network Analysis, Works CitedWorks Cited
bibliography of, Works CitedWorks Cited
machine learning, How This Book Is Organized
R language, Further Reading on R
social network analysis, Social Network Analysis
boot package, Logistic Regression to the Rescue

C

Caesar cipher, Code Breaking as Optimization
case studies, How This Book Is OrganizedHow This Book Is Organized, R Basics for Machine LearningAnalyzing the data, Loading libraries and the dataLoading libraries and the data, Converting date strings and dealing with malformed dataConverting date strings and dealing with malformed data, Organizing location dataDealing with data outside our scope, Aggregating and organizing the dataAggregating and organizing the data, Analyzing the dataAnalyzing the data, This or That: Binary ClassificationImproving the Results, Writing Our First Bayesian Spam ClassifierDefining the Classifier and Testing It with Hard Ham, Writing Our First Bayesian Spam ClassifierWriting Our First Bayesian Spam Classifier, Defining the Classifier and Testing It with Hard HamTesting the Classifier Against All Email Types, Improving the ResultsImproving the Results, How Do You Sort Something When You Don’t Know the Order?Training and Testing the Ranker, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity, Training and Testing the RankerTraining and Testing the Ranker, Training and Testing the RankerTraining and Testing the Ranker, Predicting Web TrafficPredicting Web Traffic, Text RegressionLogistic Regression to the Rescue, Code Breaking as OptimizationCode Breaking as Optimization, Unsupervised LearningUnsupervised Learning, How Do US Senators Cluster?Exploring senator MDS clustering by Congress, R Package Installation DataR Package Installation Data, Social Network AnalysisVisualizing the Clustered Twitter Network with Gephi, Hacking Twitter Social Graph DataWorking with the Google SocialGraph API, Analyzing Twitter NetworksAnalyzing Twitter Networks, Local Community StructureLocal Community Structure, Local Community StructureLocal Community Structure, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi, Building Your Own “Who to Follow” EngineBuilding Your Own “Who to Follow” Engine
book popularity prediction, Text RegressionLogistic Regression to the Rescue
code breaking, Code Breaking as OptimizationCode Breaking as Optimization
list of, How This Book Is OrganizedHow This Book Is Organized
priority inbox, How Do You Sort Something When You Don’t Know the Order?Training and Testing the Ranker, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity, Training and Testing the RankerTraining and Testing the Ranker, Training and Testing the RankerTraining and Testing the Ranker
feature generation, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set
testing, Training and Testing the RankerTraining and Testing the Ranker
training, Training and Testing the RankerTraining and Testing the Ranker
weighting scheme for, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity
R package installation, R Package Installation DataR Package Installation Data
spam detection, This or That: Binary ClassificationImproving the Results, Writing Our First Bayesian Spam ClassifierDefining the Classifier and Testing It with Hard Ham, Writing Our First Bayesian Spam ClassifierWriting Our First Bayesian Spam Classifier, Defining the Classifier and Testing It with Hard HamTesting the Classifier Against All Email Types, Improving the ResultsImproving the Results
improving results of classifier, Improving the ResultsImproving the Results
testing classifier, Defining the Classifier and Testing It with Hard HamTesting the Classifier Against All Email Types
training classifier, Writing Our First Bayesian Spam ClassifierWriting Our First Bayesian Spam Classifier
writing classifier, Writing Our First Bayesian Spam ClassifierDefining the Classifier and Testing It with Hard Ham
stock market index, Unsupervised LearningUnsupervised Learning
Twitter follower recommendations, Building Your Own “Who to Follow” EngineBuilding Your Own “Who to Follow” Engine
Twitter network analysis, Social Network AnalysisVisualizing the Clustered Twitter Network with Gephi, Hacking Twitter Social Graph DataWorking with the Google SocialGraph API, Analyzing Twitter NetworksAnalyzing Twitter Networks, Local Community StructureLocal Community Structure, Local Community StructureLocal Community Structure, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
building networks, Analyzing Twitter NetworksAnalyzing Twitter Networks
data for, obtaining, Hacking Twitter Social Graph DataWorking with the Google SocialGraph API
ego-network analysis, Local Community StructureLocal Community Structure
k-core analysis, Local Community StructureLocal Community Structure
visualizations for, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
UFO sightings, R Basics for Machine LearningAnalyzing the data, Loading libraries and the dataLoading libraries and the data, Converting date strings and dealing with malformed dataConverting date strings and dealing with malformed data, Organizing location dataDealing with data outside our scope, Aggregating and organizing the dataAggregating and organizing the data, Analyzing the dataAnalyzing the data
aggregating data, Aggregating and organizing the dataAggregating and organizing the data
analyzing data, Analyzing the dataAnalyzing the data
cleaning data, Organizing location dataDealing with data outside our scope
loading data, Loading libraries and the dataLoading libraries and the data
malformed data, handling, Converting date strings and dealing with malformed dataConverting date strings and dealing with malformed data
US Senate clustering, How Do US Senators Cluster?Exploring senator MDS clustering by Congress
web traffic predictions, Predicting Web TrafficPredicting Web Traffic
cast function, Unsupervised Learning, Unsupervised Learning, R Package Installation Data, R Package Installation Data
categorical variables, Loading libraries and the data
Cauchy distribution, Exploratory Data VisualizationExploratory Data Visualization
class function, The k-Nearest Neighbors Algorithm
classification, This or That: Binary ClassificationThis or That: Binary Classification, This or That: Binary ClassificationMoving Gently into Conditional Probability, How Do You Sort Something When You Don’t Know the Order?Ordering Email Messages by Priority, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity, Training and Testing the RankerTraining and Testing the Ranker, Training and Testing the RankerTraining and Testing the Ranker, Logistic Regression to the RescueLogistic Regression to the Rescue, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
binary classification, This or That: Binary ClassificationThis or That: Binary Classification, Logistic Regression to the RescueLogistic Regression to the Rescue
ranking classes, How Do You Sort Something When You Don’t Know the Order?Ordering Email Messages by Priority, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity, Training and Testing the RankerTraining and Testing the Ranker, Training and Testing the RankerTraining and Testing the Ranker
(see also priority inbox case study)
feature generation, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set
testing, Training and Testing the RankerTraining and Testing the Ranker
training, Training and Testing the RankerTraining and Testing the Ranker
weighting scheme for, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity
SVM (support vector machine) for, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
text classification, This or That: Binary ClassificationMoving Gently into Conditional Probability
(see also spam detection case study)
classification picture, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns
cleaning data, Organizing location data (see data, cleaning)
clustering, Clustering Based on SimilarityClustering Based on Similarity, Clustering Based on SimilarityA Brief Introduction to Distance Metrics and Multidirectional Scaling, How Do US Senators Cluster?Exploring senator MDS clustering by Congress, Local Community StructureVisualizing the Clustered Twitter Network with Gephi
hierarchical clustering of node distances, Local Community StructureVisualizing the Clustered Twitter Network with Gephi
MDS (multidimensional scaling) for, Clustering Based on SimilarityA Brief Introduction to Distance Metrics and Multidirectional Scaling
of US Senate, How Do US Senators Cluster?Exploring senator MDS clustering by Congress
cmdscale function, A Brief Introduction to Distance Metrics and Multidirectional Scaling, Exploring senator MDS clustering by Congress
code breaking case study, Code Breaking as OptimizationCode Breaking as Optimization
code examples, using, Using Code Examples
coef function, Linear Regression in a NutshellLinear Regression in a Nutshell, Preventing Overfitting with Regularization
columns, Loading libraries and the data, Inferring the Types of Columns in Your DataInferring the Types of Columns in Your Data, Inferring Meaning, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns
meaning of, Inferring Meaning
names for, assigning, Loading libraries and the data
relationships between, visualizations for, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns
types of data in, determining, Inferring the Types of Columns in Your DataInferring the Types of Columns in Your Data
Comprehensive R Archive Network (CRAN), How This Book Is Organized
computer hacker, Machine Learning for Hackers (see hacker)
conditional probability, Moving Gently into Conditional ProbabilityMoving Gently into Conditional Probability
confirmatory data analysis, Exploration versus ConfirmationExploration versus Confirmation
Congress, US, Clustering Based on Similarity (see US Senate clustering case study)
contact information for this book, How to Contact Us
content features, of email, Priority Features of Email
control list, Writing Our First Bayesian Spam Classifier
conventions used in this book, Conventions Used in This Book
convergence, Introduction to Optimization
cor function, Defining Correlation, R Package Installation Data
Corpus function, Writing Our First Bayesian Spam Classifier
correlation, What Is Data?, Defining CorrelationDefining Correlation
Cowgill, Bo (Google, Inc.), R for Machine Learning, R for Machine Learning
regarding R language, R for Machine Learning, R for Machine Learning
CRAN (Comprehensive R Archive Network), How This Book Is Organized
Criswell, Joan (anthropologist), Social Network Analysis
sociometry used by, Social Network Analysis
cross-validation, Methods for Preventing OverfittingMethods for Preventing Overfitting
curve function, Introduction to Optimization, Ridge RegressionRidge Regression
curve, in scatterplot, Linear Regression in a Nutshell
cutree function, Local Community Structure

D

data, Loading libraries and the dataLoading libraries and the data, Loading libraries and the data, Converting date strings and dealing with malformed dataConverting date strings and dealing with malformed data, Organizing location dataDealing with data outside our scope, Aggregating and organizing the dataAggregating and organizing the data, What Is Data?What Is Data?, What Is Data?, What Is Data?, Inferring the Types of Columns in Your DataInferring the Types of Columns in Your Data, Inferring Meaning, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns
aggregating, Aggregating and organizing the dataAggregating and organizing the data
cleaning, Organizing location dataDealing with data outside our scope
columns in, Loading libraries and the data, Inferring the Types of Columns in Your DataInferring the Types of Columns in Your Data, Inferring Meaning, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns
meaning of, Inferring Meaning
names for, assigning, Loading libraries and the data
relationships between, visualizations for, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns
types of data in, determining, Inferring the Types of Columns in Your DataInferring the Types of Columns in Your Data
loading, Loading libraries and the dataLoading libraries and the data
malformed, handling, Converting date strings and dealing with malformed dataConverting date strings and dealing with malformed data
“as rectangles” model of, What Is Data?
source of, What Is Data?
data analysis, Aggregating and organizing the dataAggregating and organizing the data, Analyzing the dataAnalyzing the data, Exploration versus ConfirmationExploration versus Confirmation, Exploration versus ConfirmationExploration versus Confirmation, What Is Data?, What Is Data?What Is Data?, What Is Data?, What Is Data?, Numeric SummariesNumeric Summaries, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data Visualization, Exploratory Data Visualization, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns, Defining CorrelationDefining Correlation, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
confirmatory, Exploration versus ConfirmationExploration versus Confirmation
correlation, What Is Data?, Defining CorrelationDefining Correlation
dimensionality reduction, What Is Data?
exploratory, Exploration versus ConfirmationExploration versus Confirmation
numeric summary, What Is Data?, Numeric SummariesNumeric Summaries
visualizations, Aggregating and organizing the dataAggregating and organizing the data, Analyzing the dataAnalyzing the data, What Is Data?What Is Data?, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data Visualization, Exploratory Data Visualization, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
density plot, Exploratory Data Visualization (see density plot)
histogram, Aggregating and organizing the dataAggregating and organizing the data, Exploratory Data VisualizationExploratory Data Visualization
line plot, Analyzing the dataAnalyzing the data
network graphs, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
relationships between columns, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns
scatterplot, Exploratory Data Visualization (see scatterplot)
data dictionary, Inferring the Types of Columns in Your Data
data frame structure, R for Machine Learning, Loading libraries and the data
data types and structures, R for Machine Learning, R for Machine Learning, Loading libraries and the data, Loading libraries and the data, Loading libraries and the dataConverting date strings and dealing with malformed data, Organizing location data, Aggregating and organizing the data, Inferring the Types of Columns in Your DataInferring the Types of Columns in Your Data
data frame structure, R for Machine Learning, Loading libraries and the data
dates, Loading libraries and the dataConverting date strings and dealing with malformed data, Aggregating and organizing the data
conversions to, Loading libraries and the dataConverting date strings and dealing with malformed data
sequence of, creating, Aggregating and organizing the data
determining for data columns, Inferring the Types of Columns in Your DataInferring the Types of Columns in Your Data
factor data type, Loading libraries and the data
list structure, Organizing location data
vector data type, R for Machine Learning
data.frame function, Writing Our First Bayesian Spam Classifier
database, data set compared to, What Is Data?
(see also matrices)
dates, Loading libraries and the dataConverting date strings and dealing with malformed data, Aggregating and organizing the data
conversions to, Loading libraries and the dataConverting date strings and dealing with malformed data
sequence of, creating, Aggregating and organizing the data
ddply function, Aggregating and organizing the data, Creating a Weighting Scheme for Ranking
decision boundary, This or That: Binary ClassificationThis or That: Binary Classification, Nonlinear Relationships Between Columns: Beyond Straight LinesNonlinear Relationships Between Columns: Beyond Straight Lines, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
linear, This or That: Binary ClassificationThis or That: Binary Classification
nonlinear, handling, Nonlinear Relationships Between Columns: Beyond Straight LinesNonlinear Relationships Between Columns: Beyond Straight Lines, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
dendrogram, Local Community Structure
density plot, Exploratory Data VisualizationExploratory Data Visualization, The Baseline ModelThe Baseline Model, Predicting Web TrafficPredicting Web Traffic, Unsupervised LearningUnsupervised Learning
dimensionality reduction, What Is Data?
dir function, Writing Our First Bayesian Spam Classifier
directed graph, Thinking Graphically
dist function, A Brief Introduction to Distance Metrics and Multidirectional Scaling, Local Community Structure
distance matrix, A Brief Introduction to Distance Metrics and Multidirectional Scaling
distance metrics, A Brief Introduction to Distance Metrics and Multidirectional ScalingA Brief Introduction to Distance Metrics and Multidirectional Scaling, The k-Nearest Neighbors AlgorithmThe k-Nearest Neighbors Algorithm
distributions, Exploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization
bell curve (normal distribution), Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization
modes in, Exploratory Data VisualizationExploratory Data Visualization
verifying with density plots, Exploratory Data VisualizationExploratory Data Visualization
verifying with different binwidths, Exploratory Data VisualizationExploratory Data Visualization
Cauchy distribution, Exploratory Data VisualizationExploratory Data Visualization
exponential distribution, Exploratory Data VisualizationExploratory Data Visualization
gamma distribution, Exploratory Data VisualizationExploratory Data Visualization
heavy-tailed distribution, Exploratory Data VisualizationExploratory Data Visualization
skewed distribution, Exploratory Data VisualizationExploratory Data Visualization
symmetric distribution, Exploratory Data VisualizationExploratory Data Visualization
thin-tailed distribution, Exploratory Data VisualizationExploratory Data Visualization
do.call function, Organizing location data, Aggregating and organizing the data, Functions for Extracting the Feature Set
.dta file extension, Analyzing US Senator Roll Call Data (101st–111th Congresses)
dummy coding, Inferring the Types of Columns in Your Data
dummy variables, regression using, Regression Using Dummy VariablesRegression Using Dummy Variables

G

Galston, William A. (Senior Fellow, Brookings Institute), How Do US Senators Cluster?
regarding polarization in US Congress, How Do US Senators Cluster?
gamma distribution, Exploratory Data VisualizationExploratory Data Visualization
Gaussian distribution, Exploratory Data Visualization (see bell curve)
geom_density function, Exploratory Data Visualization
(see also density plot)
geom_histogram function, Aggregating and organizing the data, Exploratory Data Visualization
(see also histogram)
geom_line function, Analyzing the data
geom_point function, Visualizing the Relationships Between Columns
(see also scatterplot)
geom_smooth function, Visualizing the Relationships Between Columns, Linear Regression in a Nutshell, Predicting Web Traffic, Defining Correlation, Nonlinear Relationships Between Columns: Beyond Straight Lines, Introducing Polynomial Regression
geom_text function, Exploring senator MDS clustering by Congress
Gephi software, Analyzing Twitter Networks, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
get.edgelist function, Building Your Own “Who to Follow” Engine
getURL function, Working with the Google SocialGraph API
ggplot object, Aggregating and organizing the data
ggplot2 package, Loading and Installing R Packages, Loading libraries and the data, Aggregating and organizing the data, Analyzing the data, Analyzing the data, Further Reading on R, Analyzing US Senator Roll Call Data (101st–111th Congresses), Exploring senator MDS clustering by Congress
(see also specific functions)
MDS results using, Analyzing US Senator Roll Call Data (101st–111th Congresses)
plotting themes of, Analyzing the data
resources for, Further Reading on R
two plots using, Exploring senator MDS clustering by Congress
ggsave function, Aggregating and organizing the data, Analyzing the data
glm function, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
glmnet function, Preventing Overfitting with RegularizationPreventing Overfitting with Regularization, Text Regression, Logistic Regression to the Rescue, Comparing Algorithms
glmnet package, Loading and Installing R Packages, Preventing Overfitting with Regularization
global optimum, Introduction to Optimization
Goffman, Erving (social scientist), Social Network Analysis
regarding nature of human interaction, Social Network Analysis
Google, Priority Features of EmailPriority Features of Email, Hacking Twitter Social Graph Data
priority inbox by, Priority Features of EmailPriority Features of Email
SocialGraph API, Hacking Twitter Social Graph Data (see SGA)
gradient, Introduction to Optimization
graph.coreness function, Local Community Structure
GraphML files, Local Community Structure
grepl function, Functions for Extracting the Feature Set, Functions for Extracting the Feature Set, Functions for Extracting the Feature Set, Working with the Google SocialGraph API
grid search, Introduction to OptimizationIntroduction to Optimization
gsub function, Organizing location data, Functions for Extracting the Feature Set

K

k-core analysis, Local Community StructureLocal Community Structure
k-nearest neighbors algorithm, The k-Nearest Neighbors Algorithm (see kNN algorithm)
KDE (kernel density estimate), Exploratory Data Visualization (see density plot)
kernel trick, SVMs: The Support Vector Machine (see SVM (support vector machine))
kNN (k-nearest neighbors) algorithm, The k-Nearest Neighbors AlgorithmR Package Installation Data, R Package Installation DataR Package Installation Data, Comparing Algorithms
comparing to other algorithms, Comparing Algorithms
R package installation case study using, R Package Installation DataR Package Installation Data
knn function, The k-Nearest Neighbors Algorithm
Königsberg Bridge problem, Social Network Analysis

L

L1 norm, Preventing Overfitting with Regularization
L2 norm, Preventing Overfitting with Regularization
label features, of email, Priority Features of Email
labels, compared to factors, Inferring the Types of Columns in Your Data
Lambda, for regularization, Preventing Overfitting with RegularizationPreventing Overfitting with Regularization, Text Regression, Ridge Regression
lapply function, Organizing location data, Aggregating and organizing the data, Functions for Extracting the Feature Set, Analyzing US Senator Roll Call Data (101st–111th Congresses), Exploring senator MDS clustering by Congress
length function, Converting date strings and dealing with malformed data
library function, Loading and Installing R Packages
line plot, Analyzing the data
line, in scatterplot, Linear Regression in a Nutshell
linear kernel SVM, SVMs: The Support Vector Machine, Comparing Algorithms
linear regression, R for Machine Learning, Introducing RegressionLinear Regression in a Nutshell, The Baseline ModelThe Baseline Model, Regression Using Dummy VariablesRegression Using Dummy Variables, Linear Regression in a NutshellLinear Regression in a Nutshell, Linear Regression in a NutshellLinear Regression in a Nutshell, Predicting Web TrafficPredicting Web Traffic, Predicting Web Traffic, Defining CorrelationDefining Correlation, Defining Correlation, Nonlinear Relationships Between Columns: Beyond Straight LinesNonlinear Relationships Between Columns: Beyond Straight Lines, Introduction to Optimization, Introduction to Optimization
adapting for nonlinear relationships, Nonlinear Relationships Between Columns: Beyond Straight LinesNonlinear Relationships Between Columns: Beyond Straight Lines
assumptions in, Linear Regression in a NutshellLinear Regression in a Nutshell
baseline model for, The Baseline ModelThe Baseline Model
correlation as indicator of, Defining CorrelationDefining Correlation
dummy variables for, Regression Using Dummy VariablesRegression Using Dummy Variables
lm function for, R for Machine Learning, Linear Regression in a NutshellLinear Regression in a Nutshell, Predicting Web Traffic, Defining Correlation, Introduction to Optimization
optimizing, Introduction to Optimization
web traffic predictions case study using, Predicting Web TrafficPredicting Web Traffic
linearity assumption, Linear Regression in a NutshellLinear Regression in a Nutshell
Linux, installing R language on, Linux
list structure, Organizing location data
list.files function, Analyzing US Senator Roll Call Data (101st–111th Congresses)
lm function, R for Machine Learning, Linear Regression in a NutshellLinear Regression in a Nutshell, Predicting Web Traffic, Defining Correlation, Introduction to Optimization
load function, Comparing Algorithms
loading data, Loading libraries and the data (see data, loading)
log base-10 transformation, A log-weighting scheme
log function, A log-weighting scheme
log-transformations, A log-weighting scheme
log-weighting scheme, A log-weighting schemeA log-weighting scheme
log1p function, A log-weighting scheme
logarithms, A log-weighting scheme
logistic regression, Logistic Regression to the RescueLogistic Regression to the Rescue, The k-Nearest Neighbors Algorithm, SVMs: The Support Vector MachineSVMs: The Support Vector Machine, SVMs: The Support Vector MachineSVMs: The Support Vector Machine, Comparing Algorithms
comparing to other algorithms, Comparing Algorithms
glm function for, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
when not to use, The k-Nearest Neighbors Algorithm, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
lubridate package, Unsupervised Learning

M

Mac OS X, installing R language on, Mac OS X
machine learning, Machine Learning for Hackers, How This Book Is Organized, How This Book Is Organized, Using R, Using R, Works CitedWorks Cited
compared to statistics, Using R
as pattern recognition algorithms, Using R
resources for, How This Book Is Organized, Works CitedWorks Cited
malformed data, Converting date strings and dealing with malformed dataConverting date strings and dealing with malformed data
match function, Dealing with data outside our scope
matrices, What Is Data?, Writing Our First Bayesian Spam Classifier, A Brief Introduction to Distance Metrics and Multidirectional Scaling, A Brief Introduction to Distance Metrics and Multidirectional ScalingA Brief Introduction to Distance Metrics and Multidirectional Scaling
conversions to, Writing Our First Bayesian Spam Classifier
data as, What Is Data?
multiplication of, A Brief Introduction to Distance Metrics and Multidirectional ScalingA Brief Introduction to Distance Metrics and Multidirectional Scaling
transposition of, A Brief Introduction to Distance Metrics and Multidirectional Scaling
max function, Quantiles
maximum value, Numeric Summaries, Quantiles
MDS (multidimensional scaling), Clustering Based on SimilarityA Brief Introduction to Distance Metrics and Multidirectional Scaling, A Brief Introduction to Distance Metrics and Multidirectional Scaling, A Brief Introduction to Distance Metrics and Multidirectional Scaling, How Do US Senators Cluster?Exploring senator MDS clustering by Congress
cmdscale function for, A Brief Introduction to Distance Metrics and Multidirectional Scaling
dimensions of, A Brief Introduction to Distance Metrics and Multidirectional Scaling
for US Senate clustering, How Do US Senators Cluster?Exploring senator MDS clustering by Congress
mean, Means, Medians, and ModesMeans, Medians, and Modes
mean function, Means, Medians, and Modes
mean squared error (MSE), The Baseline ModelThe Baseline Model, Linear Regression in a Nutshell
median, Means, Medians, and ModesMeans, Medians, and Modes
median function, Means, Medians, and Modes
melt function, Unsupervised Learning, SVMs: The Support Vector Machine
merge function, Aggregating and organizing the data
Metropolis method, Code Breaking as OptimizationCode Breaking as Optimization
min function, Quantiles
minimum value, Numeric Summaries, Quantiles
mode, Means, Medians, and Modes, Exploratory Data VisualizationExploratory Data Visualization
monotonicity, Linear Regression in a Nutshell
Moreno, Jacob L. (psychologist), Social Network Analysis
sociometry developed by, Social Network Analysis
MSE (mean squared error), The Baseline ModelThe Baseline Model, Linear Regression in a Nutshell
multidimensional scaling, Clustering Based on Similarity (see MDS)
multimodal, Exploratory Data Visualization

N

Naive Bayes classifier, Moving Gently into Conditional ProbabilityMoving Gently into Conditional Probability, Writing Our First Bayesian Spam ClassifierDefining the Classifier and Testing It with Hard Ham, Writing Our First Bayesian Spam ClassifierWriting Our First Bayesian Spam Classifier, Defining the Classifier and Testing It with Hard HamTesting the Classifier Against All Email Types, Improving the ResultsImproving the Results
improving results of, Improving the ResultsImproving the Results
testing, Defining the Classifier and Testing It with Hard HamTesting the Classifier Against All Email Types
training, Writing Our First Bayesian Spam ClassifierWriting Our First Bayesian Spam Classifier
writing, Writing Our First Bayesian Spam ClassifierDefining the Classifier and Testing It with Hard Ham
names function, Loading libraries and the data
natural log, A log-weighting scheme
nchar function, Converting date strings and dealing with malformed data
neighbors function, Local Community Structure, Building Your Own “Who to Follow” Engine
Netflix, How Do You Sort Something When You Don’t Know the Order?
recommendation system used by, How Do You Sort Something When You Don’t Know the Order?
network graphs, Thinking GraphicallyThinking Graphically, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
network hairball, Visualizing the Clustered Twitter Network with Gephi
noise, This or That: Binary Classification (see jittering)
normal distribution, Exploratory Data Visualization (see bell curve)
nrow function, Aggregating and organizing the data
numbers, determining whether column contains, Inferring the Types of Columns in Your Data
numeric summary, What Is Data?, Numeric SummariesNumeric Summaries

O

objective function, Introduction to Optimization
online resources, R for Machine Learning (see website resources)
optim function, Introduction to OptimizationIntroduction to Optimization, Ridge RegressionRidge Regression
optimization, Introduction to OptimizationIntroduction to Optimization, Introduction to OptimizationIntroduction to Optimization, Introduction to OptimizationIntroduction to Optimization, Ridge RegressionRidge Regression, Code Breaking as OptimizationCode Breaking as Optimization, Code Breaking as Optimization, Code Breaking as OptimizationCode Breaking as Optimization
code breaking case study using, Code Breaking as OptimizationCode Breaking as Optimization
grid search for, Introduction to OptimizationIntroduction to Optimization
Metropolis method for, Code Breaking as OptimizationCode Breaking as Optimization
optim function for, Introduction to OptimizationIntroduction to Optimization
ridge regression for, Ridge RegressionRidge Regression
stochastic optimization, Code Breaking as Optimization
optimum, Introduction to Optimization, Introduction to Optimization
opts function, Analyzing the data
orthogonal polynomials, Introducing Polynomial Regression
overfitting, Introducing Polynomial RegressionPreventing Overfitting with Regularization, Methods for Preventing OverfittingMethods for Preventing Overfitting, Preventing Overfitting with RegularizationPreventing Overfitting with Regularization
cross-validation preventing, Methods for Preventing OverfittingMethods for Preventing Overfitting
regularization preventing, Preventing Overfitting with RegularizationPreventing Overfitting with Regularization

P

p-value, Predicting Web Traffic, Predicting Web Traffic, Predicting Web Traffic
packages for R, Loading and Installing R PackagesLoading and Installing R Packages, Loading and Installing R Packages, Loading and Installing R PackagesLoading and Installing R Packages, Loading and Installing R PackagesLoading and Installing R Packages, R Package Installation DataR Package Installation Data
(see also specific packages)
case study involving, R Package Installation DataR Package Installation Data
installing, Loading and Installing R PackagesLoading and Installing R Packages
list of, Loading and Installing R PackagesLoading and Installing R Packages
loading, Loading and Installing R Packages
paste function, Writing Our First Bayesian Spam Classifier
pattern matching, in expressions, Using R (see regular expressions)
patterns in data, Using R, Using R, Exploration versus ConfirmationExploration versus Confirmation
(see also classification; distributions; regression)
confirming, Exploration versus ConfirmationExploration versus Confirmation
pattern recognition algorithms for, Using R
PCA (principal components analysis), Unsupervised LearningUnsupervised Learning
percent, asterisk, percent (%*%), for matrix multiplication, A Brief Introduction to Distance Metrics and Multidirectional Scaling
plot function, R for Machine Learning, Local Community Structure
plotting results, R for Machine Learning (see visualizations of data)
plyr package, Loading libraries and the data, Aggregating and organizing the data, Creating a Weighting Scheme for Ranking
poly function, Introducing Polynomial RegressionIntroducing Polynomial Regression
polynomial kernel SVM, SVMs: The Support Vector Machine, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
polynomial regression, Introducing Polynomial RegressionIntroducing Polynomial Regression, Introducing Polynomial RegressionPreventing Overfitting with Regularization, Methods for Preventing Overfitting
overfitting with, preventing, Introducing Polynomial RegressionPreventing Overfitting with Regularization
underfitting with, Methods for Preventing Overfitting
Poole, Keith (political scientist), How Do US Senators Cluster?
roll call data repository by, How Do US Senators Cluster?
predict function, Linear Regression in a Nutshell, Logistic Regression to the Rescue, Unsupervised Learning
predictions, improving, Introduction to Optimization (see optimization)
principle components analysis (PCA), Unsupervised LearningUnsupervised Learning
princomp function, Unsupervised Learning
print function, Loading and Installing R Packages
priority inbox case study, How Do You Sort Something When You Don’t Know the Order?Training and Testing the Ranker, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity, Training and Testing the RankerTraining and Testing the Ranker, Training and Testing the RankerTraining and Testing the Ranker
feature generation, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set
testing, Training and Testing the RankerTraining and Testing the Ranker
training, Training and Testing the RankerTraining and Testing the Ranker
weighting scheme for, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity

Q

quantile function, Quantiles, Standard Deviations and Variances, Predicting Web Traffic
quantiles, Numeric Summaries, QuantilesQuantiles
question mark (?) syntax, for R help, Loading libraries and the data
question mark, double (??) syntax, for R help, Loading libraries and the data

R

R console, Windows, Mac OS X, Linux
Linux, Linux
Mac OS X, Mac OS X
Windows, Windows
R programming language, How This Book Is Organized, Using RUsing R, R for Machine LearningR for Machine Learning, R for Machine Learning, R for Machine Learning, R for Machine LearningR for Machine Learning, Downloading and Installing R, Downloading and Installing RLinux, IDEs and Text Editors, IDEs and Text Editors, Loading and Installing R Packages, Loading and Installing R PackagesLoading and Installing R Packages, Loading and Installing R Packages, Loading and Installing R Packages, Loading libraries and the data, Loading libraries and the data, Loading libraries and the data, Loading libraries and the dataConverting date strings and dealing with malformed data, Organizing location data, Aggregating and organizing the data, Further Reading on R, R Package Installation DataR Package Installation Data, Works CitedWorks Cited
data types and structures, R for Machine Learning, R for Machine Learning, Loading libraries and the data, Loading libraries and the data, Loading libraries and the dataConverting date strings and dealing with malformed data, Organizing location data, Aggregating and organizing the data
data frame structure, R for Machine Learning, Loading libraries and the data
dates, Loading libraries and the dataConverting date strings and dealing with malformed data, Aggregating and organizing the data
factor data type, Loading libraries and the data
list structure, Organizing location data
vector data type, R for Machine Learning
disadvantages of, R for Machine LearningR for Machine Learning
downloading, Downloading and Installing R
help for, Loading libraries and the data
IDEs for, IDEs and Text Editors
installing, Downloading and Installing RLinux
packages for, Loading and Installing R Packages, Loading and Installing R PackagesLoading and Installing R Packages, Loading and Installing R Packages, Loading and Installing R Packages, R Package Installation DataR Package Installation Data
case study involving, R Package Installation DataR Package Installation Data
checking for, Loading and Installing R Packages
installing, Loading and Installing R PackagesLoading and Installing R Packages
list of, Loading and Installing R Packages
loading, Loading and Installing R Packages
resources for, Further Reading on R, Works CitedWorks Cited
text editors for, IDEs and Text Editors
R Project for Statistical Computing, R for Machine Learning
R-Bloggers website, R for Machine Learning
R2 (R squared), Linear Regression in a NutshellLinear Regression in a Nutshell, Predicting Web Traffic, Predicting Web Traffic, Nonlinear Relationships Between Columns: Beyond Straight Lines
radial kernel SVM, SVMs: The Support Vector Machine, SVMs: The Support Vector MachineSVMs: The Support Vector Machine, Comparing Algorithms
random number generation, Code Breaking as Optimization, A Brief Introduction to Distance Metrics and Multidirectional Scaling
range function, Quantiles, Standard Deviations and Variances
ranking classes, How Do You Sort Something When You Don’t Know the Order?Ordering Email Messages by Priority, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity, Training and Testing the RankerTraining and Testing the Ranker, Training and Testing the RankerTraining and Testing the Ranker
feature generation, Priority Features of EmailPriority Features of Email, Functions for Extracting the Feature SetFunctions for Extracting the Feature Set
testing, Training and Testing the RankerTraining and Testing the Ranker
training, Training and Testing the RankerTraining and Testing the Ranker
weighting scheme for, Creating a Weighting Scheme for RankingWeighting from Email Thread Activity
rbind function, Organizing location data, Functions for Extracting the Feature Set, Methods for Preventing Overfitting
RCurl package, Loading and Installing R Packages, Working with the Google SocialGraph API
read.* functions, Loading libraries and the data
read.delim function, Loading libraries and the data, Loading libraries and the data
read.dta function, Analyzing US Senator Roll Call Data (101st–111th Congresses)
readKH function, Analyzing US Senator Roll Call Data (101st–111th Congresses)
readLines function, Writing Our First Bayesian Spam Classifier, Functions for Extracting the Feature Set
recommendation system, How Do You Sort Something When You Don’t Know the Order?, The k-Nearest Neighbors AlgorithmR Package Installation Data, R Package Installation DataR Package Installation Data, Building Your Own “Who to Follow” EngineBuilding Your Own “Who to Follow” Engine
(see also ranking classes)
k-nearest neighbors algorithm for, The k-Nearest Neighbors AlgorithmR Package Installation Data
R package installation case study using, R Package Installation DataR Package Installation Data
of Twitter followers, Building Your Own “Who to Follow” EngineBuilding Your Own “Who to Follow” Engine
rectangles, data as, What Is Data?
(see also matrices)
regression, R for Machine Learning, Introducing Regression, Introducing RegressionLinear Regression in a Nutshell, The Baseline ModelThe Baseline Model, Regression Using Dummy VariablesRegression Using Dummy Variables, Linear Regression in a NutshellLinear Regression in a Nutshell, Linear Regression in a Nutshell, Predicting Web Traffic, Defining CorrelationDefining Correlation, Defining Correlation, Nonlinear Relationships Between Columns: Beyond Straight LinesNonlinear Relationships Between Columns: Beyond Straight Lines, Introducing Polynomial RegressionIntroducing Polynomial Regression, Introducing Polynomial RegressionPreventing Overfitting with Regularization, Methods for Preventing Overfitting, Text RegressionLogistic Regression to the Rescue, Logistic Regression to the RescueLogistic Regression to the Rescue, Introduction to Optimization, Introduction to Optimization, Ridge RegressionRidge Regression, The k-Nearest Neighbors Algorithm, SVMs: The Support Vector MachineSVMs: The Support Vector Machine, SVMs: The Support Vector MachineSVMs: The Support Vector Machine, Comparing Algorithms
linear regression, R for Machine Learning, Introducing RegressionLinear Regression in a Nutshell, The Baseline ModelThe Baseline Model, Regression Using Dummy VariablesRegression Using Dummy Variables, Linear Regression in a NutshellLinear Regression in a Nutshell, Linear Regression in a Nutshell, Predicting Web Traffic, Defining CorrelationDefining Correlation, Defining Correlation, Nonlinear Relationships Between Columns: Beyond Straight LinesNonlinear Relationships Between Columns: Beyond Straight Lines, Introduction to Optimization, Introduction to Optimization
adapting for nonlinear relationships, Nonlinear Relationships Between Columns: Beyond Straight LinesNonlinear Relationships Between Columns: Beyond Straight Lines
assumptions in, Linear Regression in a NutshellLinear Regression in a Nutshell
baseline model for, The Baseline ModelThe Baseline Model
correlation as indicator of, Defining CorrelationDefining Correlation
dummy variables for, Regression Using Dummy VariablesRegression Using Dummy Variables
lm function for, R for Machine Learning, Linear Regression in a Nutshell, Predicting Web Traffic, Defining Correlation, Introduction to Optimization
optimizing, Introduction to Optimization
logistic regression, Logistic Regression to the RescueLogistic Regression to the Rescue, The k-Nearest Neighbors Algorithm, SVMs: The Support Vector MachineSVMs: The Support Vector Machine, SVMs: The Support Vector MachineSVMs: The Support Vector Machine, Comparing Algorithms
comparing to other algorithms, Comparing Algorithms
glm function for, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
when not to use, The k-Nearest Neighbors Algorithm, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
polynomial regression, Introducing Polynomial RegressionIntroducing Polynomial Regression, Introducing Polynomial RegressionPreventing Overfitting with Regularization, Methods for Preventing Overfitting
overfitting with, preventing, Introducing Polynomial RegressionPreventing Overfitting with Regularization
underfitting with, Methods for Preventing Overfitting
ridge regression, Ridge RegressionRidge Regression
text regression, Text RegressionLogistic Regression to the Rescue
regression picture, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns
regular expressions, Organizing location data, Functions for Extracting the Feature Set, Functions for Extracting the Feature Set
grepl function for, Functions for Extracting the Feature Set, Functions for Extracting the Feature Set
gsub function for, Organizing location data
regularization, Preventing Overfitting with RegularizationPreventing Overfitting with Regularization, Text RegressionText Regression, Logistic Regression to the RescueLogistic Regression to the Rescue, Ridge Regression, SVMs: The Support Vector Machine, Comparing Algorithms, Comparing Algorithms
logistic regression using, Logistic Regression to the RescueLogistic Regression to the Rescue, Comparing Algorithms, Comparing Algorithms
preventing overfitting using, Preventing Overfitting with RegularizationPreventing Overfitting with Regularization
ridge regression using, Ridge Regression
SVM using, SVMs: The Support Vector Machine
text regression using, Text RegressionText Regression
rep function, Aggregating and organizing the data
require function, Loading and Installing R Packages
reshape package, Loading and Installing R Packages, Loading libraries and the data, Unsupervised Learning, R Package Installation Data
residuals function, Linear Regression in a NutshellLinear Regression in a Nutshell
resources, Works Cited (see books and publications; website resources)
RGui and R Console, Windows
ridge regression, Ridge RegressionRidge Regression
RJSONIO package, Loading and Installing R Packages, Working with the Google SocialGraph API
rm function, Comparing Algorithms
RMSE (root mean squared error), Regression Using Dummy VariablesRegression Using Dummy Variables, Linear Regression in a Nutshell, Predicting Web Traffic, Methods for Preventing Overfitting
root mean squared error (RMSE), Regression Using Dummy VariablesRegression Using Dummy Variables, Linear Regression in a Nutshell, Predicting Web Traffic, Methods for Preventing Overfitting
Rosenthal, Howard (political scientist), How Do US Senators Cluster?
roll call data repository by, How Do US Senators Cluster?
ROT13 cipher, Code Breaking as Optimization
rowSums function, Writing Our First Bayesian Spam Classifier
Rscript utility, Analyzing Twitter Networks
RSeek website, R for Machine Learning
RSiteSearch function, Loading libraries and the data
#rstats Twitter community, R for Machine Learning

S

sample function, Methods for Preventing Overfitting, A Brief Introduction to Distance Metrics and Multidirectional Scaling
sapply function, Writing Our First Bayesian Spam Classifier, Writing Our First Bayesian Spam Classifier, Introduction to Optimization
scale function, Unsupervised Learning
scale_color_manual function, Analyzing the data
scale_x_date function, Aggregating and organizing the data, Analyzing the data
scale_x_log function, Predicting Web Traffic
scale_y_log function, Predicting Web Traffic
scatterplot, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns, Testing the Classifier Against All Email Types, Linear Regression in a NutshellLinear Regression in a Nutshell, Predicting Web TrafficPredicting Web Traffic, Predicting Web Traffic, Nonlinear Relationships Between Columns: Beyond Straight Lines
second quartile, Numeric Summaries, Means, Medians, and Modes
seed, Code Breaking as Optimization (see ego-network; random number generation)
Senators, US, Clustering Based on Similarity (see US Senate clustering case study)
separability, Linear Regression in a Nutshell
separating hyperplane, Visualizing the Relationships Between Columns, This or That: Binary Classification
(see also decision boundary)
seq function, Quantiles
seq.Date function, Aggregating and organizing the data
set.seed function, A Brief Introduction to Distance Metrics and Multidirectional Scaling
set.vertex.attributes function, Local Community Structure
setwd function, Loading and Installing R Packages
SGA (SocialGraph API), Hacking Twitter Social Graph DataWorking with the Google SocialGraph API
sgeom_point function, Exploring senator MDS clustering by Congress
shortest.paths function, Local Community Structure
sigmoid kernel SVM, SVMs: The Support Vector Machine, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
simulated annealing, Code Breaking as Optimization
singularity, Introducing Polynomial Regression
skewed distribution, Exploratory Data VisualizationExploratory Data Visualization
social balance theory, Building Your Own “Who to Follow” Engine
social features, of email, Priority Features of Email
social network analysis, Social Network AnalysisThinking Graphically
(see also Twitter network analysis case study)
SocialGraph API, Hacking Twitter Social Graph Data (see SGA)
sociometry, Social Network Analysis
source function, Loading and Installing R Packages, Analyzing Twitter Networks
spam detection case study, This or That: Binary ClassificationImproving the Results, Writing Our First Bayesian Spam ClassifierDefining the Classifier and Testing It with Hard Ham, Writing Our First Bayesian Spam ClassifierWriting Our First Bayesian Spam Classifier, Defining the Classifier and Testing It with Hard HamTesting the Classifier Against All Email Types, Improving the ResultsImproving the Results
improving results of classifier, Improving the ResultsImproving the Results
testing classifier, Defining the Classifier and Testing It with Hard HamTesting the Classifier Against All Email Types
training classifier, Writing Our First Bayesian Spam ClassifierWriting Our First Bayesian Spam Classifier
writing classifier, Writing Our First Bayesian Spam ClassifierDefining the Classifier and Testing It with Hard Ham
SpamAssassin public corpus, This or That: Binary Classification, Priority Features of Email
spread, Standard Deviations and VariancesStandard Deviations and Variances
squared error, The Baseline ModelThe Baseline Model, Linear Regression in a Nutshell, Introduction to OptimizationIntroduction to Optimization
StackOverflow website, R for Machine Learning
standard deviation, Standard Deviations and VariancesStandard Deviations and Variances
statistics, How This Book Is Organized, Using R, R for Machine Learning, Works CitedWorks Cited
compared to machine learning, Using R
R language for, R for Machine Learning (see R programming language)
resources for, How This Book Is Organized, Works CitedWorks Cited
stochastic optimization, Code Breaking as Optimization
stock market index case study, Unsupervised LearningUnsupervised Learning
strftime function, Aggregating and organizing the data
strings, determining whether column contains, Inferring the Types of Columns in Your Data
strptime function, Functions for Extracting the Feature Set
strsplit function, Organizing location data, Functions for Extracting the Feature Set
subgraph function, Local Community Structure
subset function, Dealing with data outside our scope, Aggregating and organizing the data
substitution cipher, Code Breaking as Optimization
summary function, Aggregating and organizing the data, Numeric SummariesNumeric Summaries, Predicting Web TrafficPredicting Web Traffic
summary, numeric, What Is Data? (see numeric summary)
supervised learning, How Do You Sort Something When You Don’t Know the Order?
SVM (support vector machine), SVMs: The Support Vector MachineSVMs: The Support Vector Machine
svm function, SVMs: The Support Vector MachineSVMs: The Support Vector Machine
symmetric distribution, Exploratory Data VisualizationExploratory Data Visualization

T

t function, A Brief Introduction to Distance Metrics and Multidirectional Scaling
t value, Predicting Web Traffic, Predicting Web Traffic
tab-delimited files, Loading libraries and the data
table function, Building Your Own “Who to Follow” Engine
tables, What Is Data? (see matrices)
tail function, Loading libraries and the data
TDM (term document matrix), Writing Our First Bayesian Spam Classifier
Temple, Duncan (developer), Working with the Google SocialGraph API
packages developed by, Working with the Google SocialGraph API
term document matrix (TDM), Writing Our First Bayesian Spam Classifier
text classification, This or That: Binary ClassificationMoving Gently into Conditional Probability
(see also spam detection case study)
text editors, for R, IDEs and Text Editors
text mining package, Writing Our First Bayesian Spam Classifier (see tm package)
text regression, Text RegressionLogistic Regression to the Rescue
thin-tailed distribution, Exploratory Data VisualizationExploratory Data Visualization
third quartile, Numeric Summaries
thread features, for email, Priority Features of Email
tm package, Loading and Installing R Packages, Loading and Installing R Packages, Writing Our First Bayesian Spam ClassifierWriting Our First Bayesian Spam Classifier, Text Regression
tolower function, Organizing location data
traffic order, Social Network Analysis
(see also social network analysis)
training set, Using R, Writing Our First Bayesian Spam ClassifierWriting Our First Bayesian Spam Classifier, Methods for Preventing Overfitting
transform function, Organizing location data, Writing Our First Bayesian Spam Classifier, Exploring senator MDS clustering by Congress
Traveling Salesman problem, Social Network Analysis
tryCatch function, Organizing location data
.tsv file extension, Loading libraries and the data
(see also tab-delimited files)
Tukey, John (statistician), Exploration versus Confirmation, Text Regression
regarding data not always having an answer, Text Regression
regarding exploratory data analysis, Exploration versus Confirmation
Twitter follower recommendations case study, Building Your Own “Who to Follow” EngineBuilding Your Own “Who to Follow” Engine
Twitter network analysis case study, Social Network AnalysisVisualizing the Clustered Twitter Network with Gephi, Hacking Twitter Social Graph DataWorking with the Google SocialGraph API, Analyzing Twitter NetworksAnalyzing Twitter Networks, Local Community StructureLocal Community Structure, Local Community StructureLocal Community Structure, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
building networks, Analyzing Twitter NetworksAnalyzing Twitter Networks
data for, obtaining, Hacking Twitter Social Graph DataWorking with the Google SocialGraph API
ego-network analysis, Local Community StructureLocal Community Structure
k-core analysis, Local Community StructureLocal Community Structure
visualizations for, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi

V

var function, Standard Deviations and Variances
variables, Loading libraries and the data, Regression Using Dummy VariablesRegression Using Dummy Variables
categorical, Loading libraries and the data
dummy, for linear regression, Regression Using Dummy VariablesRegression Using Dummy Variables
variance, Standard Deviations and VariancesStandard Deviations and Variances
vector data type, R for Machine Learning
VectorSource function, Writing Our First Bayesian Spam Classifier
Video Rchive website, R for Machine Learning
visualizations of data, Aggregating and organizing the dataAggregating and organizing the data, Analyzing the dataAnalyzing the data, What Is Data?What Is Data?, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data VisualizationExploratory Data Visualization, Exploratory Data Visualization, Exploratory Data Visualization, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
density plot, Exploratory Data Visualization (see density plot)
histogram, Aggregating and organizing the dataAggregating and organizing the data, Exploratory Data VisualizationExploratory Data Visualization
line plot, Analyzing the dataAnalyzing the data
network graphs, Visualizing the Clustered Twitter Network with GephiVisualizing the Clustered Twitter Network with Gephi
relationships between columns, Visualizing the Relationships Between ColumnsVisualizing the Relationships Between Columns
scatterplot, Exploratory Data Visualization (see scatterplot)

X

xlab function, Analyzing the data
XML package, for R, Loading and Installing R Packages
XML-based file formats, Local Community Structure (see GraphML files)

Y

ylab function, Analyzing the data
ymd function, Unsupervised Learning