Index
A
- accuracy, Solution-Discussion
- AdaBoostClassifier, Problem-Discussion
- AdaBoostRegressor, Problem-Discussion
- adaptiveThreshold, Problem-Discussion
- affinity, Discussion
- Agglomerative clustering, Problem-Discussion
- algorithm, Discussion
- algorithms for faster model selection, Solution-Discussion
- ANOVA F-value statistic, Solution, Discussion
- append, Discussion
- apply, Solution, Solution
- area under the ROC curve (AUCROC), Discussion
- arrays
- augment_images, Discussion
- average, Problem, Solution
- axis, Discussion, Solution, Discussion
B
- back-filling, Discussion
- backpropagation, Introduction
- Bag of Words model, Problem-Problem
- balanced, Problem, Discussion
- bandwidth, Discussion
- baseline classification model, Problem-Discussion
- baseline regression model, Problem-Discussion
- base_estimator, Discussion
- batch_size, Discussion, Discussion
- Bayes' theorem, Introduction-Discussion
- (see also naive Bayes classifiers)
- Beautiful Soup, Solution
- Bernoulli naive Bayes, Problem-Discussion
- best_estimator_, Discussion
- best_params, Discussion
- BigramTagger, Discussion
- Binarizer, Solution
- binarizing images, Problem-Discussion
- binary classifier thresholds, Problem-Discussion
- binary classifiers, Problem-Discussion, Discussion, Problem-Discussion
- binary feature variance thresholding, Problem
- bins, Solution
- blurring images, Problem-Discussion
- boolean conditions, Solution
- bootstrap, Discussion, Discussion
- Brown Corpus text tagger, Discussion
C
- C, Discussion
- calibrating predicted probabilities, Problem-Discussion
- callbacks, Discussion, Solution
- Canny edge detector, Solution-Discussion
- categorical data, Introduction-Discussion
- chi-squared statistics, Solution-Discussion
- class separability maximization, Problem-Discussion
- classes, Solution
- classification_report, Problem
- classifier prediction evaluation, Problem-Discussion
- classifier__, Discussion
- cleaning and parsing HTML, Problem
- cleaning text, Problem-Discussion
- clustering, Problem-Discussion, Introduction-Discussion
- color isolation, Problem-Discussion
- compile, Discussion
- concat, Solution
- confusion matrix, Problem-Discussion
- contamination, Solution
- contrast in images, Problem-Discussion
- convolutional neural networks (ConvNets), Problem-Discussion
- corner detection, Problem-Problem
- correlation matrix, Solution
- count, Solution
- CountVectorizer, Solution-Problem
- criterion, Discussion
- cropping images, Problem-Discussion
- cross-validation
- cross-validation (CV), Discussion, Problem, Solution
- CSR matrices, Discussion
- CSV files, Problem
- custom evaluation metric, Problem-Discussion
- cv, Discussion
D
- data
- data loading, Introduction-Discussion
- data wrangling, Introduction-Discussion
- DataFrames, Introduction-Discussion
- applying function over all elements, Problem
- applying function to groups, Problem
- concatenating, Problem-Problem
- conditional statements, Problem
- creating, Solution
- deleting columns, Problem-Discussion
- deleting rows, Problem
- describing data in, Problem-Discussion
- descriptive statistics and, Problem-Problem
- dropping duplicate rows, Problem-Discussion
- grouping rows, Problem-See Also
- index values, Discussion
- looping over a column, Problem
- merging, Problem-Discussion
- missing values selection, Problem-Problem
- navigating, Problem-Discussion
- renaming columns, Problem-Problem
- replacing values, Problem-Discussion
- unique values in, Problem
- dates and times (datetimes) (see time series data)
- DBSCAN clustering, Problem-Discussion
- decision trees
- deep learning, Introduction
- describe, Solution, Discussion
- determinants, Problem
- diagonal, Problem
- dictionaries of features, Problem-Discussion
- dictionary of candidate learning algorithms, Problem-Discussion
- DictVectorize, Solution-Discussion
- dimensionality reduction, Introduction-Discussion
- (see also feature extraction; feature selection)
- discreditization, Problem
- distance, Discussion
- distance metrics, Discussion
- document frequency, Discussion
- DOT format, Problem-Discussion
- dot products, Problem
- downsampling, Solution, Discussion
- drop/drop_duplicates, Solution, Discussion-Discussion
- dropout, Problem
- DummyClassifier, Problem-Discussion
- dummying, Discussion
- DummyRegressor, Solution-Discussion
E
- early stopping, Solution-Discussion
- edge detection, Problem-Discussion
- Eigenvalues/Eigenvectors, Problem
- elements, applying operations to, Problem
- epochs, Discussion
- eps, Discussion
- Euclidean distance, Discussion
- evaluating models (see model evaluation)
- Excel files, Problem
- explained_variance_ratio_, Solution
F
- false positive rate (FPR), Discussion
- feature creation, Problem-Discussion
- feature extraction, Introduction-Discussion
- feature reduction, Problem-Discussion
- feature selection, Introduction, Introduction-Discussion
- features_pca__n_components, Discussion
- FeatureUnion, Discussion
- feature_importances_, Problem-Discussion
- feedforward neural networks, Introduction, Discussion
- filepath, Discussion
- filter2D, Solution
- fit, Terminology Used in This Book, Discussion
- fit_generator, Discussion
- fit_transform, Discussion
- flatten, Problem
- flow_from_directory, Discussion
- for loops, Discussion
- forests (see random forests; tree-based models)
- forward propagation, Introduction
- forward-filling, Discussion
- FunctionTransformer, Solution, Discussion
G
- Gaussian naive Bayes, Problem-Discussion
- get_feature_names, Solution
- goodFeaturesToTrack, Discussion
- GrabCut, Solution-Discussion
- gridsearch.fit, Discussion
- GridSearchCV, Problem-Discussion, Discussion, Discussion, Problem-Discussion, Problem-Discussion
- groupby, Solution-Discussion, Solution
H
- Harris corner detector, Solution-Problem
- HDF5 model, Problem-Discussion
- head, Solution, Discussion
- hierarchical merging, Problem-Discussion
- highly correlated features, Problem-Discussion
- histogram equalization, Solution-Discussion
- histograms, Problem-Discussion
- hold-out, Discussion
- HTML, parsing and cleaning, Problem
- hyperparameters, Terminology Used in This Book
- hyperplanes, Introduction, Problem-Discussion
I
- iloc, Solution, Discussion
- image augmentation, Problem
- image classification, Introduction-Discussion
- background removal, Problem-Discussion
- binarizing, Problem-Discussion
- blurring, Problem-Discussion
- color histograms as features, Problem-Discussion
- contrast, Problem-Discussion
- corner detection, Problem-Problem
- cropping, Problem-Discussion
- edge detection, Problem-Discussion
- feature creation, Problem-Discussion
- isolating colors, Problem-Discussion
- loading, Problem-Discussion
- mean color as a feature, Problem
- resizing, Problem
- saving, Problem
- sharpening, Problem-Discussion
- ImageDataGenerator, Discussion
- imbalanced classes, Problem, Problem
- imputation, Solution, Problem-Discussion
- Imputer, Solution
- imwrite, Solution
- index slicing, Solution
- interaction features, Problem
- interactive effects, Solution-Discussion
- interpolation, Discussion
- IQR, Solution
- irrelevant classification features removal, Problem-Discussion
- isnull, Solution
K
- k-fold cross-validation (KFCV), Discussion-Discussion, Problem-Discussion
- k-means clustering, Solution, Solution
- k-nearest neighbors (KNN), Solution-Discussion, Solution-Discussion, Introduction-Discussion
- Keras, Introduction
- convolutional neural networks with, Problem-Discussion
- Dropout, Problem-Discussion
- EarlyStopping, Solution-Discussion
- fit method of training, Discussion
- GridSearchCV, Problem-Discussion
- input_shape, Discussion
- model saving and loading, Problem-Discussion
- ModelCheckpoint, Problem-Discussion
- model_to_dot, Solution-Discussion
- plot_model, Solution-Discussion
- predict, Problem-Discussion
- scikit-learn wrapper, Problem-Discussion
- Sequential, Problem-Discussion
- softmax activation, Problem-Discussion
- training for regression, Problem-Discussion
- weight regularization, Problem-Discussion
- KerasClassifier, Discussion
- kernelPCA, Solution-Discussion
- kernels, Solution-Discussion, Solution
- KNeighborsClassifier, Solution-Discussion
- kneighbors_graph, Discussion
- kurtosis, Discussion
L
- L1/L2 norm, Discussion
- LabelBinarizer, Solution
- lagged time features, Problem
- lasso regression, Solution-Discussion
- learning algorithms, Terminology Used in This Book
- learning curves, Problem-Discussion
- learning_rate, Discussion
- limit_direction, Discussion
- linear discriminant analysis, Problem-Discussion
- linear regression, Introduction-Discussion
- LinearDiscriminantAnalysis, Discussion
- LinearSVC, Problem-Discussion
- linkage, Discussion
- loading images, Problem-Discussion
- loc, Solution, Discussion
- logistic regression, Introduction-Discussion
- LogisticRegressionCV, Solution-Discussion, Discussion
- long short-term memory (LSTM) recurrent neural network, Solution-Discussion
- loops, Discussion
- loss/loss functions, Terminology Used in This Book, Introduction, Discussion-Discussion
M
- make_blobs, Solution-See Also
- make_circles, Discussion-Discussion
- make_classification, Solution-See Also
- make_regression, Solution-Discussion
- make_scorer, Solution-Discussion
- Manhattan distance, Discussion
- Matplotlib, Solution, Problem-Discussion
- matrices
- adding/subtracting, Problem
- calculating trace, Problem
- compressed sparse row (CSR), Discussion
- confusion matrix, Problem-Discussion
- creating, Problem-See Also
- describing, Problem
- determinants, Problem
- diagonal elements, Problem
- factorization, Solution-Discussion
- finding Eigenvalues/Eigenvectors, Problem
- flattening, Discussion
- inverting, Problem
- multiplying, Problem
- rank, Problem
- selecting elements in, Problem
- sparse, Solution, Discussion
- max pooling, Discussion
- maximum/minimum values, Problem, Solution
- max_depth, Discussion
- max_features, Discussion, Discussion
- max_output_value, Discussion
- mean, Solution
- mean color as feature, Problem
- mean squared error (MSE), Problem-Discussion
- meanshift clustering, Problem
- median, Discussion
- merge, Solution-Discussion
- metric, Discussion, Discussion, Discussion
- mini-batch k-means clustering, Problem
- Minkowski distance, Discussion
- MinMaxScaler, Solution-Discussion
- min_impurity_split, Discussion
- min_samples, Discussion
- Missing At Random (MAR), Discussion
- Missing Completely At Random (MCAR), Discussion
- missing data, Problem-Discussion
- missing data in time series, Problem-Discussion
- Missing Not At Random (MNAR), Discussion
- mode, Discussion
- model evaluation, Introduction-Discussion
- baseline classification model, Problem-Discussion
- baseline regression model, Problem-Discussion
- binary classifier prediction evaluation, Problem-Discussion
- binary classifier thresholds, Problem-Discussion
- classification report, Problem
- classifier performance visualization, Problem-Discussion
- clustering models, Problem-Discussion
- cross-validation (CV), Problem
- custom evaluation metric, Problem-Discussion
- hyperparameter value effects, Problem-Discussion
- multiclass classifier predictions, Problem-Discussion
- regression models, Problem-Discussion
- training set size visualization, Problem-Discussion
- model selection, Introduction-Discussion
- model.pkl, Discussion, Discussion
- ModelCheckpoint, Discussion, Problem-Discussion
- models, Terminology Used in This Book
- model_to_dot, Solution-Discussion
- moving time windows, Problem-Discussion
- multiclass classifier predictions, Problem-Discussion
- multiclass classifiers, Discussion, Problem-Discussion
- multinomial logistic regression (MLR), Problem-Discussion
- multinomial naive Bayes, Problem-Discussion
N
- n-grams, Discussion
- naive Bayes classifiers, Introduction-Discussion
- Natural Language Toolkit (NLTK), Solution
- neg_mean_squared_error, Discussion
- nested cross-validation (CV), Solution-Discussion
- neural networks, Introduction-Discussion
- binary classification, Problem-Discussion
- convolutional, Problem-Discussion
- deep, Introduction
- designing, Problem
- dropout, Problem
- early stopping, Solution-Discussion
- feedforward, Introduction, Discussion
- hyperparameter selection, Problem-Discussion
- image augmentation, Problem
- image classification, Problem-Discussion
- k-fold cross-validation (KFCV), Problem-Discussion
- making predictions, Problem-Discussion
- multiclass classifiers, Problem-Discussion
- overfitting, reducing, Problem-Discussion
- preprocessing data for, Problem-Discussion
- recurrent, Solution-Discussion
- regression training, Problem-Discussion
- saving model training process, Problem-Discussion
- text data classification, Solution-Discussion
- training history visualization, Problem-Discussion
- visualizing, Solution-Discussion
- weight regularization, Problem-Discussion
- nominal categorical data, Introduction
- non-negative matrix factorization (NMF), Solution-Discussion
- nonlinear decision boundaries, Problem-Discussion
- Normalizer, Problem-Discussion
- normalizing observations, Problem-Discussion
- notnull, Solution
- numerical data
- clustering observations, Problem-Discussion
- discreditization, Problem
- imputing missing values, Problem-Discussion
- observations with missing values, Problem-Discussion
- observations, normalizing, Problem-Discussion
- outliers, detecting, Problem-Discussion
- outliers, handling, Problem-Discussion
- polynomial and interaction features, Problem-Discussion
- rescaling, Solution-Discussion
- standardizing, Solution-Discussion
- transforming features, Problem
- NumPy
- add/subtract, Problem
- creating matrices in, Problem-See Also
- creating vectors in, Problem
- for deleting missing values, Solution
- describing matrices in, Problem
- det, Problem
- diagonal, Problem
- dot, Problem, Problem
- flatten, Problem, Solution
- inv, Problem
- linalg.eig, Solution
- matrix_rank, Discussion
- max and min, Problem
- mean, var, and std, Solution
- NaN, Discussion, Discussion
- offset, Discussion
- random, Problem
- reshape, Problem-Problem, Discussion
- selecting elements in, Problem-Discussion
- trace, Problem
- transpose, Problem
- vectorize function, Problem
- n_clusters, Discussion, Discussion
- n_components, Discussion, Discussion, Discussion, Discussion
- n_estimators, Discussion, Discussion, Discussion
- n_iter, Discussion
- n_jobs, Discussion, Discussion
- n_jobs=-1, Discussion
- n_support_, Discussion
O
- observations, Terminology Used in This Book
- offset, Discussion
- one-hot encoding, Solution-Discussion
- one-vs-rest logistic regression (OVR), Problem-Discussion
- Open Source Computer Vision Library (OpenCV), Introduction
- optimizers, Discussion
- ordinal categorical data, Introduction, Problem-Discussion
- out-of-bag (OOB) observations, Problem
- outliers
- outlier_label, Discussion
- overfitting, Problem-Discussion
P
- pad_sequences, Discussion
- pandas
- apply, Solution, Discussion
- create_engine, Discussion
- DataFrame object (see DataFrames)
- descriptive statistics, Solution-Problem
- for deleting missing values, Solution
- json_normalize, Discussion
- read_csv, Solution
- read_excel, Solution
- read_json, Problem
- read_sql_query, Solution, Discussion
- rolling, Problem
- Series.dt, Solution, Solution
- shift, Solution
- TimeDelta, Solution
- to_datetime, Solution
- transformation in, Solution
- tz_localize, Solution
- weekday_name, Solution
- parallelization, Problem-Discussion
- parameters, Terminology Used in This Book
- parsing and cleaning HTML, Problem
- Penn Treebank tags, Solution
- performance, Terminology Used in This Book
- performance boosting, Problem-Discussion
- performance evaluation, Problem-Discussion
- pickle model, Problem-Discussion
- Platt scaling, Discussion
- plot_model, Solution-Discussion
- polynomial regression, Problem-Discussion
- PolynomialFeatures, Problem-Discussion, Discussion-Discussion
- pooling layers, Discussion
- PorterStemmer, Solution
- pos_tag, Discussion
- precision, Solution
- predicted probabilities, Problem-Discussion, Problem-Discussion
- predictions, Problem-Discussion
- preprocess, Discussion
- preprocessing steps in model selection, Problem-Discussion
- principal component analysis (PCA), Solution-Discussion
- punctuation, removing, Problem
R
- Radius-based (RNN) classifier, Problem-Discussion
- random forests, Introduction
- random variables, Problem
- RandomizedSearchCV, Problem-Discussion
- rank, Problem
- recall, Solution
- Receiving Operating Characteristic (ROC) curve, Solution-Discussion
- rectified linear unit (RELU), Discussion
- recurrent neural network, Solution-Discussion
- recursive feature elimination (RFE), Problem-Discussion
- regression function, Discussion
- regression model evaluation, Problem-Discussion
- regression training, Problem-Discussion
- regularization, Problem-Discussion, Problem-Discussion, Discussion
- regularization penalty, Discussion
- rename, Solution
- resample, Solution-See Also
- rescaling, Solution-Discussion
- reshape, Problem-Problem, Discussion
- resize, Problem
- resizing images, Problem
- RFECV, Problem-Discussion
- RGB versus GBR, See Also
- ridge regression, Solution-Discussion
- RobustScaler, Discussion
- rolling time windows, Problem-Discussion
S
- saving images, Problem
- score, Discussion
- scoring, Discussion
- search, exhaustive, Problem-Discussion
- SelectFromModel, Problem-Discussion
- Series.dt, Solution
- shape, Discussion, Discussion
- sharpening images, Problem-Discussion
- Shi-Tomasi corner detector, Discussion
- show_shapes, Discussion
- shrinkage penalty (see regularization)
- silhouette coefficient, Discussion
- silhouette_score, Discussion, Discussion
- skewness, Discussion
- slicing arrays, Problem-Discussion
- softmax activation, Problem-Discussion
- sparse data feature reduction, Problem-Discussion
- sparse matrices, Discussion
- SQL queries, Problem
- sqrt, Discussion
- standard deviation, Problem, Discussion, Solution
- standard error of the mean, Discussion
- standardization, Discussion
- standardizer, Discussion
- StandardScaler, Solution, Problem-Discussion
- stochastic average gradient (SAG) solver, Problem
- stopwords, Problem
- strategy, Discussion, Discussion
- stratified, Discussion
- stratified k-fold, Discussion
- strings, converting to dates, Problem
- sum, Solution
- support vector classifiers (SVCs), Problem-Discussion
- support vector machines, Introduction-Discussion
- support_vectors_, Discussion
- svd_solver="randomized", Discussion
T
- tail, Discussion
- term frequency, Discussion
- term frequency-inverse document frequency (tf-idf), Solution
- text handling, Introduction-Discussion
- TfidfVectorizer, Solution
- thresholding, Solution
- time series data, Introduction-Discussion
- calculating difference between dates, Problem
- converting strings to dates, Problem
- encoding days of week, Problem
- lagged features, Problem
- missing data, Problem-Discussion
- multiple date feature creation, Problem, Discussion
- rolling time windows, Problem-Discussion
- selecting dates and times, Problem
- time zones, Problem-Discussion
- TimeDelta, Solution
- tokenization, Problem
- toy datasets, Discussion
- to_datetime, Solution
- trace, Problem
- train, Terminology Used in This Book
- training set size effects, Problem-Discussion
- transform, Discussion
- transforming features, Problem
- translate, Solution
- transposing, Problem
- tree-based models, Introduction-Discussion
- boosting, Problem-Discussion
- controlling tree size, Problem-Discussion
- decision tree classifier, Problem-Discussion
- decision tree regressor, Problem-Discussion
- imbalanced classes, Problem
- out-of-bag (OOB) observations, Problem
- random forest classifier, Problem-Discussion
- random forest feature importance, Problem-Discussion
- random forest feature selection, Problem-Discussion
- random forest regressor, Problem-Discussion
- visualizing, Problem-Discussion
- TrigramTagger, Discussion
- true positive rate (TPR), Discussion
- Truncated Singular Value Decomposition (TSVD), Problem-Discussion
- tz_localize, Solution
V
- validation, Discussion
- validation curve, Problem-Discussion
- validation_curve, Discussion
- validation_data, Discussion
- validation_split, Discussion
- value_counts, Solution
- variance, Problem, Discussion
- variance reduction, Problem-Discussion
- variance thresholding, Problem-Discussion
- vectorize, Problem
- vectors
- verbose, Discussion, Discussion
- visualization of classifier performance, Problem-Discussion