Index

A note on the digital index

A link in an index entry is displayed as the section title in which that entry appears. Because some sections have multiple index markers, it is not unusual for an entry to have several links to the same section. Clicking on any link will take you directly to the place in the text in which the marker appears.

A

absolute value function, Other Functions
accuracy
defined, How Good Are Those Numbers?
displaying, Before You Get Started: Feasibility and Cost
advanced indexing (NumPy), NumPy in Detail
agglomerative hierarchical clustering, Center Seekers
algebra
about, On Reading Formulas
linear algebra, Linear Algebra, On Math
algorithms, A Horror Story
for classification, Some Classification Terminology
allometric scaling, Logarithmic Plots
alternate hypotheses, Statistics Defined
ANOVA (analysis of variance), Design of Experiments
Anscombe’s Quartet, Linear Regression and All That
approximations, function approximation with least squares, Least Squares, Function Approximation
Taylor expansions, Derivatives
apriori algorithm, A Special Case: Market Basket Analysis
arguments, scaling, Using and Misusing Models
artificial neural networks, Decision Trees and Rule-Based Classifiers
aspect ratios, banking, Banking
association analysis, Other Thoughts
autocorrelation function, Don’t Overlook the Obvious!, Examples
averaging averages, Intermezzo: Mythbusting—Bigfoot, Least Squares, and All That

B

back-of-the-envelope calculations, Guesstimation and the Back of the Envelope, More Examples
backups, data files, The Care and Feeding of Your Data Zoo
bagging, Other Classifiers
bandwidth selection, KDEs, Histograms, Kernel Density Estimates
banking, Banking
base of the natural logarithm, Miscellaneous symbols
Bayesian classifiers, Algorithms for Classification
Bayesian networks, Bayesian Classifiers
Bayesian statistics, Perspective
Bayesian interpretation of probability, The Frequentist Interpretation of Probability
data analysis example, The Bayesian Interpretation of Probability
frequentist interpretation of probability, Perspective
inference, Bayesian Data Analysis: A Worked Example
Berkeley DB, Data Consistency
Bernoulli trials, Exact Results
binary relationships, notation for, Sets, Sequences, and Series
Binomial distribution and Bernoulli trials, Arguments from Probability Models
binomial theorem, Power Series and Taylor Expansion
biplots, PCA, Practical Points, Workshop: PCA with R
birth processes, Unconstrained Growth and Decay Phenomena
bivariate analysis, Two Variables: Establishing Relationships
(see also time-series analysis)
banking, Banking
linear regression, Banking
logarithmic plots, Additional Ideas and Warnings
noise and smoothing, Scatter Plots
scatter plots, Two Variables: Establishing Relationships
bivariate data sets, Types of Data Sets
blind experiments, Design of Experiments
blocking, Controlled Experiments Versus Observational Studies
boosting, Other Classifiers
bootstrap, Resampling Methods
box-and-whisker plots
about, Box-and-Whisker Plots
Quintus Curtius Snodgrass example, Example: Formal Tests Versus Graphical Methods
broadcasting (NumPy), NumPy in Action
brushing and linking, multivariate analysis, Querying and Zooming
business intelligence, Business Intelligence
(see also financial calculations)

C

C Clustering Library, A Word of Warning
C Library: (see GSL)
calculus, Results from Calculus
absolute value function, Other Functions
binomial theorem, Power Series and Taylor Expansion
derivatives, The Inverse of a Function
dividing by zero, The Linear Transformation
exponential functions, Polynomials and Rational Functions
factorial function, Other Functions
Gaussian function and the Normal distribution, Trigonometric Functions
hyperbolic tangent function, Other Functions
integrals, Finding Minima and Maxima
inverse of a function, Other Functions
limits, sequences and series, Limits, Sequences, and Series
linear transformation, The Binomial Theorem
logarithms, Polynomials and Rational Functions
mathematical notation, Dividing by Zero
minima and maxima, Derivatives
on math, Where to Go from Here
polynomials, Powers
power series and Taylor expansion, Limits, Sequences, and Series
powers, Results from Calculus
rational functions, Powers
trigonometric functions, Exponential Function and Logarithm
capital expenditures (CapEx), Fixed and Variable Costs
carrying capacity (logistic equation), Constrained Growth: The Logistic Equation
cash-flow analysis, A Single Payment: Future and Present Value, Calculational Tricks with Compounding
categorical data
about, Skills
clustering, Numerical data
CDF (cumulative distribution function), Optional: Optimal Bandwidth Selection
Central Limit Theorem
Gaussian distribution, The Gaussian Distribution and the Central Limit Theorem, Trigonometric Functions
power-law distributions, Power-Law Distributions and Non-Normal Statistics
centroids, clusters, Clustering Methods, Tree Builders
Chaco library (Python), R
chi-square (χ2) distribution, Statistics Defined
class imbalance problems, Estimating Prediction Error
classical statistics (see statistics)
classification, Predictive Analytics
(see also predictive analytics)
about, Predictive Analytics
terminology, Topics in Predictive Analytics
cleaning and conditioning data, Sources for Data
clustering, Finding Clusters
about, Finding Clusters, Predictive Analytics
distance and similarity measures, A Different Point of View
market basket analysis, Other Thoughts
methods, Special-purpose metrics
pre- and postprocessing, Pre- and Postprocessing
Pycluster and the C Clustering Library, A Word of Warning
CO2 measurements above Mauna Loa on Hawaii, Examples, Intermezzo: A Data Analysis Session
cohesion, clusters, Cluster Properties and Evaluation
color, false-color plots, False-Color Plots
combinatorial problems, Monte Carlo Simulations
complete clustering, Cluster Properties and Evaluation
composition, multivariate analysis, Variations
compounding, A Single Payment: Future and Present Value
compression, data files, The Care and Feeding of Your Data Zoo
conditional probability, The Frequentist Interpretation of Probability
confidence intervals
bootstrap, Resampling Methods
example, Statistics Explained
least squares, Statistical Parameter Estimation
confidence, association rules, A Special Case: Market Basket Analysis
confounding variables, Controlled Experiments Versus Observational Studies
confusion matrix, Topics in Predictive Analytics
conservation laws, Optional: Scaling Arguments Versus Dimensional Analysis
consistency, data consistency, Data Availability
contingency tables, Multidimensional Composition: Tree and Mosaic Plots
continuous time simulations, When Does Bootstrapping Work?
contour plots, False-Color Plots
convex clusters, What Constitutes a Cluster?
convolution, Implementation Issues
coplots, The Scatter-Plot Matrix
correlation coefficient
clustering, Numerical data
PCA, Optional: Theory
correlation function, Don’t Overlook the Obvious!
correlations, clustering, Numerical data
costs
cost concepts and depreciation, Cost Concepts and Depreciation
cost model example, Example: An Optimization Problem
direct and indirect costs, Cost Concepts and Depreciation
fixed and variable costs, Direct and Indirect Costs
opportunity costs, Using Expectation Values to Account for Uncertainty
CPU (cost per unit), Direct and Indirect Costs
cross-validation, Ensemble Methods: Bagging and Boosting
cumulative distribution function (CDF), Optional: Optimal Bandwidth Selection
curse of dimensionality, The Secret Sauce

D

dashboards, Reporting
data, Working with Data
cleaning and conditioning, Sources for Data
file formats, Sampling
maintenance, The Care and Feeding of Your Data Zoo
quality issues, Recommendations for a Metrics Program
sampling, Cleaning and Conditioning
skills, The Care and Feeding of Your Data Zoo
sources and availability, Recommendations for a Metrics Program, Working with Data
terminology and data types, Skills
data analysis
bivariate analysis, Two Variables: Establishing Relationships
calculus, Results from Calculus
clustering, Finding Clusters
data, Working with Data
dimensionality reduction, Seeing the Forest for the Trees: Finding Important Attributes
financial calculations, Financial Calculations and Modeling
guesstimation, Guesstimation and the Back of the Envelope
multivariate analysis, More Than Two Variables: Graphical Multivariate Analysis
predictive analytics, Predictive Analytics
probability models, Arguments from Probability Models
reporting, business intelligence and dashboards, Reporting, Business Intelligence, and Dashboards
scaling, Models from Scaling Arguments
session example, Intermezzo: A Data Analysis Session
simulations, Simulations
software, Programming Environments for Scientific Computation and Data Analysis
statistics, What You Really Need to Know About Classical Statistics
time-series analysis, Time As a Variable: Time-Series Analysis
univariate analysis, A Single Variable: Shape and Distribution
data frames (R), Workshop: R
data warehouses, Business Intelligence
data-driven decision making, Epilogue: Facts Are Not Reality
databases
about, Working with Data
browsing, Skills
DBSCAN algorithm, Neighborhood Growers, Cluster Properties and Evaluation
death processes, Background and Further Examples
decision boundaries, Regression
decision trees, Support Vector Machines
delimiter-separated text files, Data File Formats
delimiters, Binary relationships
dendrograms, Tree Builders
density, clusters, Cluster Properties and Evaluation
depreciation, Fixed and Variable Costs
derivatives, The Inverse of a Function
differencing, time-series, Implementation Issues
digital signal processing (DSP), Implementation Issues
dimensional analysis versus scaling arguments, Example: A Cost Model
dimensional argument example, Scaling Arguments
dimensionality reduction, Seeing the Forest for the Trees: Finding Important Attributes
Kohonen maps, Multidimensional Scaling
principal component analysis (PCA), Seeing the Forest for the Trees: Finding Important Attributes, Kohonen Maps
R statistical analysis package, Kohonen Maps
visual techniques, Biplots
dimensionality, curse of, The Secret Sauce
direct costs, Cost Concepts and Depreciation
discrete event simulations, Workshop: Discrete Event Simulations with SimPy
distance matrices, Common Distance and Similarity Measures, Other Thoughts
distance measures, clustering, A Different Point of View, Center Seekers
distributions, Arguments from Probability Models
(see also Gaussian distribution)
Binomial distribution and Bernoulli trials, Arguments from Probability Models
chi-square (χ2) distribution, Statistics Defined
Fisher’s F distribution, Statistics Explained
geometric distribution, Geometric Distribution
log-normal distribution, Poisson Distribution
Monte Carlo simulation for outcome distributions, Combinatorial Problems
Poisson distribution, Geometric Distribution
posterior probability distribution, Bayesian Data Analysis: A Worked Example, Bayesian Data Analysis: A Worked Example
power-law distributions, Beware: The World Is Not Normal!, Optional: Case Study—Unique Visitors over Time
sampling distributions, Statistics Defined
special-purpose distributions, Log-Normal Distribution
statistics, Statistics Defined
Student t distribution, Statistics Explained
dividing by zero, The Linear Transformation
document vectors, Special-purpose metrics
dot plots, A Single Variable: Shape and Distribution
dot product, Numerical data
double exponential smoothing, Exponential Smoothing
double logarithmic plots, Additional Ideas and Warnings, Logarithmic Plots
double-blind experiments, Design of Experiments
draft lottery, LOESS, Examples
DSP (digital signal processing), Implementation Issues
duplicate records, Cleaning and Conditioning

E

e (base of the natural logarithm), Miscellaneous symbols
edit distance, Categorical data
Ehrenberg’s rule, Before You Get Started: Feasibility and Cost
eigenvectors, Optional: Theory, Optional: Theory, Workshop: PCA with R
embedded databases, Workshop: Berkeley DB and SQLite
ensemble methods, Other Classifiers
error propagation, After You Finish: Quoting and Displaying Numbers
estimation, parameter estimation, Genesis
ethics, Epilogue: Facts Are Not Reality
Euclidean distance, Common Distance and Similarity Measures
Euler’s number, Miscellaneous symbols
expectation values
accounting for uncertainty, The Whole Picture: Cash-Flow Analysis and Net Present Value
distributions with infinite expectation values, Working with Power-Law Distributions
experiments, versus observational studies, Example: Formal Tests Versus Graphical Methods
exponential distribution, Optional: Queueing Theory
exponential function, Polynomials and Rational Functions
exponential growth or decay, Background and Further Examples
exponential smoothing, Running Averages
exporting files from gnuplot, Workshop: gnuplot
extrema, Finding Minima and Maxima
extreme-value considerations, Optional: Scaling Arguments Versus Dimensional Analysis

F

factorial function, Other Functions
factorization, Controlled Experiments Versus Observational Studies
false-color plots, False-Color Plots
feasibility, numerical correctness, How Good Are Those Numbers?
feature selection (see dimensionality reduction)
Feynman, R. P., Estimating Sizes, Background and Further Examples
files
formats, Workshop: gnuplot, Sampling
maintenance, The Care and Feeding of Your Data Zoo
filters, time-series analysis, Implementation Issues
financial calculations, Financial Calculations and Modeling
cost concepts and depreciation, Cost Concepts and Depreciation
newsvendor problem, Is This All That Matters?
time value of money, Financial Calculations and Modeling
uncertainty and opportunity costs, The Whole Picture: Cash-Flow Analysis and Net Present Value
Fisher’s F distribution, Statistics Explained
Fisher’s Iris data set, The Nature of Statistical Learning
Fisher’s LDA (linear discriminant analysis), Decision Trees and Rule-Based Classifiers
fixed costs, Direct and Indirect Costs
floating averages, Smoothing
force-based algorithms, Multidimensional Scaling
format
data, Sources for Data
file formats, Sampling
Fourier series, Oscillations
FP-Growth Algorithm, A Special Case: Market Basket Analysis
fractions
about, Elementary Algebra
division by zero, The Linear Transformation
frequentist interpretation of probability, Perspective
function approximation, Function Approximation
functions (see calculus, Gaussian distribution)
future value, Financial Calculations and Modeling
fuzzy clustering, Center Seekers

G

gain ratio, Decision Trees and Rule-Based Classifiers
Gaussian distribution (Gaussian function)
about, Statistics Defined, Trigonometric Functions
Central Limit Theorem, The Gaussian Distribution and the Central Limit Theorem
histograms, Histograms
KDEs, Kernel Density Estimates
moving averages, Running Averages
Gaussian distribution function, Gaussian Function and the Normal Distribution
Gaussian kernel, LOESS, Examples
generalization errors, Topics in Predictive Analytics
geometric distribution, Geometric Distribution
ggobi, R
glyphs, Multidimensional Composition: Tree and Mosaic Plots
Gnu Scientific Library (GSL), Workshop: The Gnu Scientific Library (GSL), Other Players
gnuplot, A Data Analysis Session
grand tours and projection pursuits, multivariate analysis, Querying and Zooming
graphical analysis
defined, Showing What’s Important
interpretation, Additional Ideas and Warnings
process, Linear Regression and All That
versus statistical tests, Example: Formal Tests Versus Graphical Methods
Greek alphabet, Miscellaneous symbols
growth and decay phenomena, unconstrained, Background and Further Examples
growth, the logistic equation, Constrained Growth: The Logistic Equation
GSL (Gnu Scientific Library), Workshop: The Gnu Scientific Library (GSL), Other Players
guesstimation, Guesstimation and the Back of the Envelope
numerical correctness, More Examples
perturbation theory and error propagation, After You Finish: Quoting and Displaying Numbers
principles, Guesstimation and the Back of the Envelope

H

Hamming distance, Numerical data
HCL (hue–chroma–luminance) space, False-Color Plots
hidden variables, Controlled Experiments Versus Observational Studies
histograms
about, Histograms
bandwidth selection, Kernel Density Estimates
scatter-plot matrices, Variations
homoscedasticity, LOESS, Residuals
Hunt’s algorithm, Support Vector Machines
hyperbolic tangent function, Other Functions
hypothesis testing, Genesis

I

indirect costs, Cost Concepts and Depreciation
infinite expectation values, distributions, Working with Power-Law Distributions
instance-based classifiers, Algorithms for Classification
integrals
about, Finding Minima and Maxima
Gaussian integrals, Why Is the Gaussian so Useful?
interpolation, least squares, Least Squares
inverse of a function, Other Functions
item sets, A Special Case: Market Basket Analysis

K

k-means algorithm, Clustering Methods
k-medoids algorithm, Center Seekers
kernel density estimate (KDE), Histograms, Other Thoughts
kernelization, Support Vector Machines
Kohonen maps, Multidimensional Scaling

L

LDA (linear discriminant analysis), Decision Trees and Rule-Based Classifiers
least squares, Optional: The Standard Error
function approximation, Function Approximation
statistical parameter estimation, Least Squares
Levenshtein distance, Categorical data
lift charts, Rank-Order Plots and Lift Charts
(see also ROC)
likelihood function, The Bayesian Interpretation of Probability, Bayesian Data Analysis: A Worked Example
limits, calculus, Limits, Sequences, and Series
linear algebra, Linear Algebra
linear discriminant analysis (LDA), Decision Trees and Rule-Based Classifiers
linear functions, Results from Calculus
linear regression
about, Linear Regression and All That
LOESS, LOESS, Using matplotlib Interactively
linear transformation, calculus, The Binomial Theorem
linking and brushing, multivariate analysis, Querying and Zooming
Linux, Skills
location, clusters, Cluster Properties and Evaluation
LOESS
about, LOESS
matplotlib case study, Using matplotlib Interactively
log-log plots, Additional Ideas and Warnings
log-normal distribution, Poisson Distribution
logarithmic plots, Additional Ideas and Warnings
logarithms
about, Small perturbations
calculus, Polynomials and Rational Functions
logfiles, Working with Data
logistic equation, constrained growth, Constrained Growth: The Logistic Equation
longest common subsequence, Categorical data
lurking variables, Controlled Experiments Versus Observational Studies

M

Manhattan distance, Common Distance and Similarity Measures
map/reduce techniques, Some Suggestions
margin of error, How Good Are Those Numbers?
market basket analysis, Other Thoughts
mass, clusters, Cluster Properties and Evaluation
math, What’s with the Workshops?, Results from Calculus
absolute value function, Other Functions
binomial theorem, Power Series and Taylor Expansion
derivatives, The Inverse of a Function
dividing by zero, The Linear Transformation
exponential functions, Polynomials and Rational Functions
factorial function, Other Functions
Gaussian function and the Normal distribution, Trigonometric Functions
hyperbolic tangent function, Other Functions
integrals, Finding Minima and Maxima
inverse of a function, Other Functions
limits, sequences and series, Limits, Sequences, and Series
linear transformation, The Binomial Theorem
logarithms, Polynomials and Rational Functions
mathematical notation, Dividing by Zero
minima and maxima, Derivatives
on math, What’s with the Workshops?, Where to Go from Here
polynomials, Powers
power series and Taylor expansion, Limits, Sequences, and Series
powers, Results from Calculus
rational functions, Powers
trigonometric functions, Exponential Function and Logarithm
mathematics, Results from Calculus
(see also calculus, distributions, financial calculations, Gaussian distribution)
about, What’s with the Workshops?, Where to Go from Here
notation, Dividing by Zero
Matlab, Scientific Software Is Different, Other Players
matplotlib, Graphical Analysis and Presentation Graphics
LOESS case study, Using matplotlib Interactively
object model and architecture, Managing Properties
properties, Case Study: LOESS with matplotlib
using interactively, Workshop: matplotlib
matrix operations, Interpretation, Intermezzo: When More Is Different
maximum distance, Common Distance and Similarity Measures
maximum margin hyperplanes, Regression
MDS (multidimensional scaling), Visual Techniques
mean
about, Rank-Order Plots and Lift Charts
exponential distribution, Optional: Queueing Theory
mean-field approximations, Mean-Field Approximations
mean-field models, Exact Results
mean-square error, KDE bandwidth, Kernel Density Estimates
median, Rank-Order Plots and Lift Charts, Summary Statistics
merging data sets, Cleaning and Conditioning
metrics programs, Reporting
minima and maxima, functions, Derivatives
Minkowski distance, Common Distance and Similarity Measures
missing values, Sources for Data, Cleaning and Conditioning
modeling, Financial Calculations and Modeling
(see also financial calculations, probability models, scaling, simulations)
about, Introduction
and data analysis, Case Study: How Many Servers Are Best?
principles, Models from Scaling Arguments
Mondrian, R
money (see time value of money)
Monte Carlo simulations, Monte Carlo Simulations
combinatorial problems, Monte Carlo Simulations
outcome distributions, Combinatorial Problems
mosaic plots, multidimensional composition, Changes in Composition
moving averages, Smoothing
multidimensional scaling (MDS), Visual Techniques
multiplots, False-Color Plots
coplots, The Scatter-Plot Matrix
scatter-plot matrices, False-Color Plots
multivariate analysis, More Than Two Variables: Graphical Multivariate Analysis
(see also dimensionality reduction)
composition problems, Variations
false-color plots, False-Color Plots
glyphs, Multidimensional Composition: Tree and Mosaic Plots
interactive explorations, Parallel Coordinate Plots
multiplots, False-Color Plots
parallel coordinate plots, Glyphs
tools, Grand Tours and Projection Pursuits
multivariate data sets, Types of Data Sets

N

naive Bayesian classifier, Bayesian Classifiers
nearest-neighbor methods, Algorithms for Classification
neighborhood growers clustering algorithms, Neighborhood Growers
nested clusters, What Constitutes a Cluster?
net present value (NPV), Calculational Tricks with Compounding
network graphs, Multidimensional Scaling
neural networks, artificial, Decision Trees and Rule-Based Classifiers
noise, Scatter Plots
examples, Examples
ideas and warnings, Residuals
LOESS, LOESS
residuals, Residuals
splines, Conquering Noise: Smoothing
time-series, The Task
nominal data, Skills
non-normal statistics and power-law distributions, Beware: The World Is Not Normal!
nonmetric classifiers, Support Vector Machines
nonnumerical data, Skills
nonparametric bootstrap, When Does Bootstrapping Work?
Normal distribution function, Gaussian Function and the Normal Distribution
normalization
about, Sources for Data
scale normalization: clustering, Pre- and Postprocessing
normalized histograms, Histograms
NPV (net present value), Calculational Tricks with Compounding
null hypotheses, Statistics Defined
numarray (Python), R
Numeric (Python), R
numerical data
about, Skills
clustering, Common Distance and Similarity Measures
NumPy (Python), Workshop: NumPy, The matplotlib Object Model and Architecture, R, Python, Recommendations

O

object model, matplotlib, Managing Properties
OLAP (Online Analytical Processing) cubes, Business Intelligence
operating costs, Fixed and Variable Costs
opportunity costs, Using Expectation Values to Account for Uncertainty
optimization problems
extrema, Finding Minima and Maxima
scaling, Example: A Dimensional Argument
order-of-magnitude estimates, Working with Numbers
ordinal data, Skills
outliers, Sources for Data
overfitting, Some Classification Terminology

P

p-distance, Common Distance and Similarity Measures
p-values, Statistics Explained
parallel coordinate plots, Glyphs
parallelization, Some Suggestions
parameter estimation, Genesis
parametric bootstrap, When Does Bootstrapping Work?
parenthesis and other delimiters, Binary relationships
Pareto charts, Rank-Order Plots and Lift Charts
Pareto distribution, standard form, Workshop: Power-Law Distributions
PCA (principal component analysis), Seeing the Forest for the Trees: Finding Important Attributes
about, Seeing the Forest for the Trees: Finding Important Attributes
biplots, Practical Points
computation, Interpretation
interpretation, Optional: Theory
issues, Computation
R statistical analysis package, Kohonen Maps
theory, Motivation
percentiles, Rank-Order Plots and Lift Charts, Summary Statistics
performance
matrix operations and other computational applications, Interpretation, Intermezzo: When More Is Different
permutations, What About Map/Reduce?
perturbation theory, Powers of ten, After You Finish: Quoting and Displaying Numbers
plot command (matplotlib), Case Study: LOESS with matplotlib
plot function (gnuplot), Intermezzo: A Data Analysis Session
plot function (R), Workshop: PCA with R
point estimates, Genesis
least squares, Statistical Parameter Estimation
Poisson distribution, Geometric Distribution
polynomials
about, Powers, Power Series and Taylor Expansion
LOESS, LOESS
splines, Conquering Noise: Smoothing
posterior probability (posterior probability distribution), The Bayesian Interpretation of Probability, Bayesian Data Analysis: A Worked Example, Bayesian Data Analysis: A Worked Example
power series and Taylor expansion, Limits, Sequences, and Series
power-law distributions
example, Optional: Case Study—Unique Visitors over Time
non-normal statistics, Beware: The World Is Not Normal!
powers of ten, Working with Numbers
powers, calculus, Results from Calculus
precision
defined, How Good Are Those Numbers?
metrics, Estimating Prediction Error
predictive analytics, Predictive Analytics
about, Predictive Analytics
algorithms for classification, Some Classification Terminology
class imbalance problems, Estimating Prediction Error
classification terminology, Topics in Predictive Analytics
do-it-yourself classifiers, The Nature of Statistical Learning
ensemble methods, Other Classifiers
prediction error, Ensemble Methods: Bagging and Boosting
statistical learning, The Secret Sauce
present value, Financial Calculations and Modeling
presentation graphics, defined, Showing What’s Important
prewhitening, Pre- and Postprocessing
principal components analysis (see PCA)
prior probability, The Bayesian Interpretation of Probability
probability
Bayesian interpretation, The Frequentist Interpretation of Probability
frequentist interpretation, Perspective
probability models, Arguments from Probability Models
Binomial distribution and Bernoulli trials, Arguments from Probability Models
Gaussian Distribution and the Central Limit Theorem, The Gaussian Distribution and the Central Limit Theorem
geometric distribution, Geometric Distribution
log-normal distribution, Poisson Distribution
Poisson distribution, Geometric Distribution
power-law distributions, Beware: The World Is Not Normal!, Optional: Case Study—Unique Visitors over Time
special-purpose distributions, Log-Normal Distribution
unique visitors over time case study, Log-Normal Distribution
probability plots, comparing with distributions, The Cumulative Distribution Function
projection pursuits and grand tours, multivariate analysis, Querying and Zooming
pseudo-randomization, Design of Experiments
pseudo-replication, Design of Experiments
Pycluster and the C Clustering Library, A Word of Warning
pyplot, The matplotlib Object Model and Architecture
Python
about, R
matplotlib, The matplotlib Object Model and Architecture
NumPy, Box-and-Whisker Plots
SciPy, R
scipy.signal, Optional: Filters and Convolutions
SimPy, When Does Bootstrapping Work?

Q

QQ plots
comparing with distributions, The Cumulative Distribution Function
LOESS, Residuals
QT algorithm, Other Thoughts
quality, data quality issues, Recommendations for a Metrics Program
quantile plots, The Cumulative Distribution Function
quantiles, Summary Statistics
querying and zooming, multivariate analysis, Querying and Zooming
queueing problems, When Does Bootstrapping Work?

R

R statistical analysis package, Tools, Bayesian Inference: Summary and Discussion, Kohonen Maps, Matlab, Recommendations
radius, clusters, Cluster Properties and Evaluation
random forests, Other Classifiers
randomization, Controlled Experiments Versus Observational Studies
rank-order plots, Rank-Order Plots and Lift Charts
rational functions, Powers
recall, Estimating Prediction Error
recommendations, Predictive Analytics
recurrence relations, exponential smoothing, Exponential Smoothing
regression, Banking
(see also linear regression)
using for classification, Bayesian Classifiers
regular expressions, The Care and Feeding of Your Data Zoo
relationships, establishing, Estimating Sizes
replication, Controlled Experiments Versus Observational Studies
reports, Business Intelligence
resampling methods, simulations, Pro and Con
residuals, smoothing, Residuals
reversal of association, Simpson’s Paradox
ROC (receiver operating characteristic) curve, Class Imbalance Problems
rule-based classifiers, Support Vector Machines
running averages, Smoothing

S

Sage, Case Study: How Many Servers Are Best?
sampling distributions, Statistics Defined
sampling, data, Cleaning and Conditioning
SAS, Other Players
scale normalization, clustering, Pre- and Postprocessing
scaling, Models from Scaling Arguments
arguments, Using and Misusing Models
mean-field approximations, Mean-Field Approximations
modeling principles, Models from Scaling Arguments
time-evolution scenarios, Background and Further Examples
scatter plots, Two Variables: Establishing Relationships
scatter-plot matrices, False-Color Plots
ScientificPython, R
SciLab, Other Players
SciPy, R, Python, Recommendations
scipy.signal, Optional: Filters and Convolutions
scree plots, Workshop: PCA with R
scripting languages, The Care and Feeding of Your Data Zoo
seasonality
CO2 measurements above Mauna Loa on Hawaii, A Data Analysis Session
time-series, Time As a Variable: Time-Series Analysis, The Task
self-organizing maps (SOMs), Multidimensional Scaling
semi-logarithmic plots, Additional Ideas and Warnings
sensitivity analysis, perturbation theory, Small perturbations
separation, clusters, Cluster Properties and Evaluation
sequences, calculus, Limits, Sequences, and Series
series, calculus, Limits, Sequences, and Series
servers case study, Oscillations
sets, sequences and series, Sets, Sequences, and Series
sigmoid function, Other Functions
signals, DSP, Implementation Issues
significance, statistical significance, Statistics Defined
silhouette coefficient, Cluster Properties and Evaluation
similarity measures, clustering, A Different Point of View
Simpson’s paradox, How to Average Averages
SimPy, When Does Bootstrapping Work?
about, Workshop: Discrete Event Simulations with SimPy
queueing, Introducing SimPy
running simulations, Running SimPy Simulations
simulations, Simulations
about, Simulations
discrete event simulations with SimPy, When Does Bootstrapping Work?
Monte Carlo simulations, Monte Carlo Simulations
resampling methods, Pro and Con
single logarithmic plots, Additional Ideas and Warnings
singular value decomposition (SVD), Computation
size, estimating, Principles of Guesstimation
slicing (NumPy), NumPy in Detail
smoothing, Scatter Plots
examples, Examples
ideas and warnings, Residuals
least squares, Least Squares
LOESS, LOESS
residuals, Residuals
splines, Conquering Noise: Smoothing
time-series analysis, The Task
smoothness, clustering, Distance and Similarity Measures
SNN (shared nearest neighbor) similarity, Special-purpose metrics
software, Programming Environments for Scientific Computation and Data Analysis
about, Workshop: Two Do-It-Yourself Classifiers
Berkeley DB, Data Consistency
Chaco, R
ggobi, R
GSL, Workshop: The Gnu Scientific Library (GSL), Other Players
Java, NumPy/SciPy
libSVM, Workshop: Two Do-It-Yourself Classifiers
manyeyes, R
Matlab, Scientific Software Is Different, Other Players
Mondrian, R
NumPy, Workshop: NumPy, The matplotlib Object Model and Architecture, R, Python, Recommendations
Python, The matplotlib Object Model and Architecture, R, Workshop: Sage, A Word of Warning, R
R statistical analysis package, Tools, Bayesian Inference: Summary and Discussion, Kohonen Maps, Matlab, Recommendations
RapidMiner, Workshop: Two Do-It-Yourself Classifiers
Sage, Case Study: How Many Servers Are Best?
SAS, Other Players
ScientificPython, R
SciLab, Other Players
SciPy, R, Python, Recommendations
Shogun, Workshop: Two Do-It-Yourself Classifiers
SimPy, When Does Bootstrapping Work?
skills, The Care and Feeding of Your Data Zoo
SQLite, Berkeley DB
Tulip, R
WEKA, Workshop: Two Do-It-Yourself Classifiers
SOMs (self-organizing maps), Multidimensional Scaling
special symbols, Sets, Sequences, and Series
spectral clustering, Other Thoughts
spectral decomposition theorem, Optional: Theory
splines
about, Conquering Noise: Smoothing
weighted splines, Conquering Noise: Smoothing
SQLite, Berkeley DB
stacked plots, Variations
standard deviation, Rank-Order Plots and Lift Charts, Exact Results, The Standard Deviation
standard error
about, How to Calculate
bootstrap estimate, Resampling Methods
star convex clusters, What Constitutes a Cluster?
statistical parameter estimation, Least Squares
statistical significance, Statistics Defined
statistics, What You Really Need to Know About Classical Statistics
about, Genesis
Bayesian statistics, Perspective
controlled experiments versus observational studies, Example: Formal Tests Versus Graphical Methods
distributions, Statistics Defined
historical development, What You Really Need to Know About Classical Statistics
R statistical analysis package, Bayesian Inference: Summary and Discussion
stochastic processes, The Simplest Queueing Process
string data, clustering, Categorical data
Student t distribution, Statistics Explained
subspace clustering, Other Thoughts
summary statistics, Rank-Order Plots and Lift Charts
supervised learning, Predictive Analytics
support count, A Special Case: Market Basket Analysis
support vector machines (SVM), Regression
supremum distance, Common Distance and Similarity Measures
surface plots, False-Color Plots
SVD (singular value decomposition), Computation
symbols, Sets, Sequences, and Series
symmetry
clustering, Distance and Similarity Measures
models, Optional: Scaling Arguments Versus Dimensional Analysis

T

t distribution, Statistics Explained
taxicab distance, Common Distance and Similarity Measures
Taylor expansion, Limits, Sequences, and Series
test sets, Topics in Predictive Analytics
tests
hypothesis testing, Genesis
versus graphical methods, Example: Formal Tests Versus Graphical Methods
text files, Data File Formats
time value of money, Financial Calculations and Modeling
cash-flow analysis and net present value, Calculational Tricks with Compounding
compounding, A Single Payment: Future and Present Value
future and present value, Financial Calculations and Modeling
time-evolution scenarios, Background and Further Examples
constrained growth: the Logistic equation, Constrained Growth: The Logistic Equation
oscillations, Constrained Growth: The Logistic Equation
unconstrained growth and decay phenomena, Background and Further Examples
time-series analysis, Time As a Variable: Time-Series Analysis
components of, The Task
correlation function, Don’t Overlook the Obvious!
examples, Time As a Variable: Time-Series Analysis
filters and convolutions, Implementation Issues
scipy.signal, Optional: Filters and Convolutions
smoothing, The Task
tools (see software)
topology, Bayesian networks, Bayesian Classifiers
training errors, Ensemble Methods: Bagging and Boosting
training sets, Topics in Predictive Analytics
transcendental functions, Polynomials and Rational Functions
tree plots, multidimensional composition, Changes in Composition
trends
CO2 measurements above Mauna Loa on Hawaii, A Data Analysis Session
time-series, Time As a Variable: Time-Series Analysis, The Task, Examples
versus variations, Recommendations for a Metrics Program
trigonometric functions, Exponential Function and Logarithm
triple exponential smoothing, Exponential Smoothing

U

ufuncs (NumPy), NumPy in Action
uncertainty in planning, The Whole Picture: Cash-Flow Analysis and Net Present Value
underfitting, Some Classification Terminology
unique visitors over time case study, Log-Normal Distribution
univariate analysis, A Single Variable: Shape and Distribution
cumulative distribution function, Optional: Optimal Bandwidth Selection
dot and jitter plots, A Single Variable: Shape and Distribution
histograms and kernel density estimates, Dot and Jitter Plots
rank-order plots and lift charts, Rank-Order Plots and Lift Charts
summary statistics and box plots, Rank-Order Plots and Lift Charts
univariate data sets, Types of Data Sets
Unix, Skills
unnormalized histograms, Histograms
unsupervised learning, Finding Clusters, Predictive Analytics

V

variable costs, Direct and Indirect Costs
vectors
document vectors, Special-purpose metrics
eigenvectors, Optional: Theory, Optional: Theory, Workshop: PCA with R
visual uniformity, False-Color Plots

W

Ward’s method, Tree Builders
weight functions, Other Functions
weighted moving averages, Running Averages
weighted splines, Conquering Noise: Smoothing
whitening, Pre- and Postprocessing

X

XML data file format, Data File Formats

Z

zero, dividing by, The Linear Transformation
zooming and querying, multivariate analysis, Querying and Zooming