Index

A

AdaBoosting process
1. dataset
2. iteration 1
3. iteration 2
4. iteration 3
5. vs. stand-alone decision tree model
6. steps
7. weak classification models
Agglomerative clustering

SeeHierarchical cluster technique

Analytics
1. categorization
2. descriptive analytics
3. diagnostics
4. predictions/estimations
5. prescriptive
6. types
Artificial general intelligence (AGI)
Artificial intelligence (AI)
1. analytics
2. data mining
3. data science
4. evolution
  1. AGI
  2. ANI
  3. ASI
  4. data analytics
  5. data mining
  6. data science
  7. definition
  8. statistics
5. statistics
  1. Bayesian
  2. frequentist
  3. regression
6. statistics vs. data mining vs. data analytics vs. data science
Artificial narrow intelligence (ANI)
Artificial neural network (ANN)
1. activation function
2. autoencoders
3. biological neurons
4. CNN
5. CNN and MNIST dataset
6. correponding array
7. deep learning
8. handwritten digit(zero)image
9. hidden_layer_sizes
10. image classification
11. learning_rate_init
12. load MNIST data
13. LSTM
14. max_iter
15. MLP and Keras
16. MLP (feedforward network)
17. multilayer perceptron representation
18. multilayer perceptrons (feedforward neural network)
19. perceptron
20. RBM algorithm
21. reinforcement learning
22. scikit-learn MLP
23. solver
24. transfer learning
25. visual challenges
26. visualization
27. visual pathway
Artificial super intelligence (ASI)
Autocorrelation function (ACF)
Autoencoders
1. de-noise image
2. dimension reduction
3. elements
Autoregressive integrated moving average (ARIMA)
1. AM and MA
2. autocorrelation
3. build model and evaluate
4. check stationary
5. decompose time series
6. model
7. predict function
8. predictors
Autoregressive model (AM)
Average silhouette method

B

Bagging
Bag of words (BoW)
Bayesian statistics
Biological vs. artificial neuron
Bootstrapaggregation

SeeBagging

C

Clustering
1. hierarchicalcluster

SeeHierarchical cluster technique

K-means
1. accuracy
2. average silhouette method
3. elbow method
4. expectation maximization
5. limitations
6. methods
text
1. k-means
2. LSA
3. singular value decomposition
4. source code
5. SVD

Collaborative filtering (CF)
Command-line installer
Convolution neural network (CNN)
Cross-industry standard process for data mining (CRISP-DM)
1. business
2. data gaps/relevance
3. deployment
4. evaluation
5. framework and phases
6. modeling
7. preparation
8. process diagram

D

Data assemble (text)
1. dataframe
2. get access key
3. pdf, jpg, and audio file
4. social media
5. textract formats
6. twitter authentication
DataFrame
Data mining
1. KDD
2. techniques
Data preprocessing (text)
1. bag of words
2. lemmatization
3. lower() function
4. n-grams
5. PoS tagging
6. removing noise
7. sentence tokenization
8. stemming
9. TF-IDF
10. word tokenization
Data science
Deep learning
1. ANN

SeeArtificial neural network (ANN)

Caffe
Keras
Lasagne
libraries
MXNet
Pylearn2
TensorFlow
Theano

Deep natural language processing (DNLP)
1. sopex package
2. Word2Vec
Descriptive analytics
Diagnostic analytics
dir() operation code
Document term matrix (DTM)

E

Elbow method
Ensemble methods
1. bagging
2. decision boundaries
3. ExtraTree
4. feature importance function
5. RandomForest
6. types of
Enterprise resource planning (ERP) systems
Exception handling
1. code flow
2. file operations
3. Python built-in
4. source code
5. try clause
Exploration (text)
1. co-occurrence matrix
2. frequency chart
3. lexical dispersion plot
4. word cloud
Exploratory data analysis (EDA)
1. Iris dataset
2. multivariate analysis
  1. code creation
  2. correlation matrix
  3. findings values
  4. pair plot
3. pandas dataframe visualization
4. univariate analysis
Extremely randomized trees (ExtraTree)

F

Feature engineering
1. construction/generation
2. handling categorical data
  1. dummy variable creation
  2. number conversion
3. logical flow
4. missing data
5. normalization and scaling
6. raw data
7. summarization methods

G

Generalized linear models (GLM)
Global positioning system (GPS)
Gradient boosting
GridSearch

H

Hard voting vs. soft voting
Hierarchical cluster technique
1. key parameters
2. maximum linkage
3. source code
Hyperparameter tuning
1. approach
2. GridSearch
3. RandomSearch

I

Identity operators
Input/output file
1. opening mode
2. operations
3. sequence

J

Join statement
1. inner
2. left
3. outer
4. right

K

K-folds cross-validation
1. classification model
2. holdout/single fold method
3. stratification
k nearest neighbors (kNN)
Knowledge discovery databases (KDD)
1. data mining
2. data mining process flow
3. interpretation/evaluation
4. preprocessing and cleaning
5. selection
6. stages
7. transformation techniques

L

Latent Dirichlet Allocation (LDA)
Latent semantic analysis (LSA)
Lemmatization
Linear regression vs. logistic regression
Logistic regression
1. GLM distribution
2. load data
3. model training and evaluation
4. multi-classes
5. normalize data
6. split data
Long short-term memory (LSTM)

M

Machine learning
1. AI

SeeArtificial intelligence (AI)

AI evolution
1. areas
categorization
CRISP-DM

SeeCross-industry standard process for data mining (CRISP-DM)

data
1. attributes
2. comparison
3. continuous or quantitative
4. discrete/qualitative
5. fact and figures
6. interval scale
7. measurement scales
8. nominal level
9. ordinal scale
10. ratio scale
definitions
EDA

SeeExploratory data analysis (EDA)

feature engineering

SeeFeature engineering

frameworks
history
KDD

SeeKnowledge discovery databases (KDD)

libraries
ML imposters
overview
pattern recognition
process loop
prospect customer identification
Python

SeePython packages

questions/hypothesis
recommendation system
regression

SeeSupervised learning

reinforcement
resources
robotic intelligent"
scikit-learn
SEMMA

SeeSample, explore, modify, model, assess (SEMMA)

simple models
spam detection
statsmodels
supervised learning
1. classification
2. regression
Turing test
unsupervised learning
1. clustering
2. dimension reduction
wheels from scratch

Matplotlib
Mean absolute error
Model building, text similarity
Model diagnosis and tuning
1. attributes
2. bias and variance
3. boosting
  1. AdaBoosting process
  2. ensemble voting
  3. essential tuning parameters
  4. gradient boosting
  5. illustration
  6. sklearn wrapper
  7. stacking
  8. xgboost
4. ensemble methods
  1. bagging
  2. decision boundaries
  3. ExtraTree
  4. feature importance function
  5. RandomForest
  6. types of
5. hyperparameter

SeeHyperparameter tuning

k-fold cross-validation
probability cutoff point
1. class distribution
2. error message
3. functions
4. logistic regression model
5. optimal cutoff point
rare event/imbalanced dataset
1. disadvantages
2. handling techniques
3. make_classification function
4. re-sampling
variance

Moving average (MA)
Multivariate linear regression model
Multivariate regression
1. housing dataset (RDatasets)
2. multicollinearity and VIF
3. regression diagnostics
  1. homoscedasticity test
  2. linearity check
  3. model fittings
  4. outliers
  5. over-fitting
  6. under-fitting

N

Natural language processing (NLP)

SeeText mining

N-grams
Nonlinear regression
Non-negative matrix factorization (NMF)
NumPy
1. arrays
2. broadcasting
3. built-in functions
4. indexing
  1. boolean
  2. field access
  3. integer
  4. slice syntax
  5. types
5. mathematical functions
  1. array math
  2. sum function
  3. transpose function
6. types

O

Object-oriented
1. bar plots–ax.bar() and ax.barh()
2. colomaps reference
3. customization
4. grid creation
5. horizontal bar charts
6. line plots–ax.plot()
7. line style and marker style
8. marker reference
9. matplotlib line style reference
10. multiple lines-different axis
11. multiple lines-same axis
12. pie chart–ax.pie()
13. plotting defaults
14. side-by-side bar chart
15. stacked bar charts

P, Q

Pandas
1. DataFrame
2. data structures
3. grouping operation
4. join
5. merge/join
6. operations
7. pivot tables
8. reading and writing data
9. SQL/excel/R data frames
10. statistics
11. view function
Partial autocorrelation function (PACF)
Part of speech (PoS) tagging
Polynomial regression
Predictive analytics
Prescriptive analytics
Principal component analysis (PCA)
Problem types vs. potential ML algorithms
Python
1. code blocks
  1. correct indentation
  2. incorrect indentation
  3. indentation
  4. suites
2. control structure
  1. iteration
  2. loop control statement
  3. selection statements
3. definition
4. dictionary
5. exception handling
6. file input/output
7. identifier
8. interactive
9. keywords
10. lists
11. module
12. mottos
13. multiline statements
14. NumPy and Pandas
15. object types
  1. comments
  2. list vs. tuple vs. set vs. dictionary
  3. multiline comments
  4. single line
16. operators
  1. arithmetic operators
  2. assignment operators
  3. bitwise operators
  4. comparison/relational operators
  5. identity operators
  6. logical operators
  7. membership operators
  8. types
17. vs. others
18. popular coding language
19. sets
20. tuple
  1. accessing tuple
  2. deleting items
  3. operations
21. user-defined functions
22. 2.7/3.4.x
  1. Anaconda
  2. graphical installer
  3. Linux installation
  4. official website
  5. OSX installation
  6. run command line
  7. version
  8. Windows installation
Python packages
1. customizing labels
2. data analysis
  1. global functions
  2. key packages
  3. Matplotlib
  4. NumPy
  5. Pandas
3. libraries
4. object oriented

R

RandomForest
RandomSearch
Recommender systems
1. collaborative filtering (CF)
2. content-based filtering
3. types
Recurrent neural network (RNN)
Regression analysis
Regularization
Reinforcement learning
Restricted Boltzman Machines (RBM)
Robotic intelligent agent
1. components
2. definition
3. sensors and effectors
4. Turing test
Root mean squared error (RMSE)

S

Sample, explore, modify, model, assess (SEMMA)
1. assess
2. CRISP-DM and KDD
3. explore
4. frameworks
5. modeling/data mining
6. modify
7. sample
Sentiment analysis
Sets
1. accessing set elements
2. changing elements
3. code creation
4. difference
5. discard()/remove() method
6. intersection
7. key characteristics
8. operations
9. symmetric_difference()method
10. union
Stacking
Stochastic gradient descent algorithm
Supervised learning
1. cases
2. classification
3. confusion matrix
4. correlation and causation
5. decision trees
  1. key parameters
  2. model
  3. nodes
  4. splits and grows
  5. stopping partition
6. fitting line
7. k nearest neighbors (kNN)
8. linear regression model
  1. mean absolute error
  2. metrics
  3. RMSE
  4. R-squared metrics
9. logistic regression
10. model performance classification
11. multiclass logistic regression
12. multivariate regression
  1. coefficient
  2. Durbin-Watson statistics
  3. housing dataset (RDatasets)
  4. hypothesis testing steps
  5. multicollinearity and VIF
  6. normal distribution
  7. OLS regression results
  8. regression diagnostics
  9. results
  10. R-squared value
  11. standard error
  12. t and p-value
13. nonlinear regression
14. plot sigmoid function
15. polynomial regression
16. process flow
17. regularization
18. ROC curve
19. scatter plot
20. slope line fitting
21. stochastic gradient descent
22. students score vs. hours
23. SVM

SeeSupport vector machine (SVM)

time-series forecasting
1. ARIMA
2. components
3. stationary time series
under-fitting, right-fitting, and over-fitting

Supervised learning algorithms
1. classification
2. regression
Support vector machine (SVM)
1. decision boundaries
2. equation
3. key objective
4. key parameters
symmetric_difference()method

T

Term document matrix (TDM)
Term frequency-inverse document frequency (TF-IDF)
Text mining
1. data assemble
2. datapreprocessing

SeeData preprocessing (text)

DNLP

SeeDeep natural language processing (DNLP)

exploration
1. co-occurrence matrix
2. frequency chart
3. lexical dispersion plot
4. word cloud
libraries
model building
1. classification
2. clustering
3. document term matrix
4. Euclidian vs. cosine
5. sentiment analysis
6. text similarity
7. topic modeling
phases
process overview

Time-series forecasting
1. ARIMA
2. components
3. stationary time series
Transfer learning
Tuple, operations
Turing test

U, V

Unsupervised learning
1. clustering

SeeClustering

PCA
process flow

User-defined functions
1. default argument
2. definition
3. functions with arguments
4. functions without argument
5. **kwargs
6. length arguments
7. passing argumens (*args)
8. variable/identifier

W

Word2Vec

X, Y, Z

Xgboost (eXtreme gradient boosting)

Previous Chapter

7. Conclusion

Table of Contents for Mastering Machine Learning with Python in Six Steps: A Practical Implementation Guide to Predictive Data Analytics Using Python

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P, Q

R

S

T

U, V

W

X, Y, Z

Table of Contents for
Mastering Machine Learning with Python in Six Steps: A Practical Implementation Guide to Predictive Data Analytics Using Python