Index

A

  1. AdaBoosting process

    1. dataset

    2. iteration 1

    3. iteration 2

    4. iteration 3

    5. vs. stand-alone decision tree model

    6. steps

    7. weak classification models

  2. Agglomerative clustering

SeeHierarchical cluster technique
  1. Analytics

    1. categorization

    2. descriptive analytics

    3. diagnostics

    4. predictions/estimations

    5. prescriptive

    6. types

  2. Artificial general intelligence (AGI)

  3. Artificial intelligence (AI)

    1. analytics

    2. data mining

    3. data science

    4. evolution

      1. AGI

      2. ANI

      3. ASI

      4. data analytics

      5. data mining

      6. data science

      7. definition

      8. statistics

    5. statistics

      1. Bayesian

      2. frequentist

      3. regression

    6. statistics vs. data mining vs. data analytics vs. data science

  4. Artificial narrow intelligence (ANI)

  5. Artificial neural network (ANN)

    1. activation function

    2. autoencoders

    3. biological neurons

    4. CNN

    5. CNN and MNIST dataset

    6. correponding array

    7. deep learning

    8. handwritten digit(zero)image

    9. hidden_layer_sizes

    10. image classification

    11. learning_rate_init

    12. load MNIST data

    13. LSTM

    14. max_iter

    15. MLP and Keras

    16. MLP (feedforward network)

    17. multilayer perceptron representation

    18. multilayer perceptrons (feedforward neural network)

    19. perceptron

    20. RBM algorithm

    21. reinforcement learning

    22. scikit-learn MLP

    23. solver

    24. transfer learning

    25. visual challenges

    26. visualization

    27. visual pathway

  6. Artificial super intelligence (ASI)

  7. Autocorrelation function (ACF)

  8. Autoencoders

    1. de-noise image

    2. dimension reduction

    3. elements

  9. Autoregressive integrated moving average (ARIMA)

    1. AM and MA

    2. autocorrelation

    3. build model and evaluate

    4. check stationary

    5. decompose time series

    6. model

    7. predict function

    8. predictors

  10. Autoregressive model (AM)

  11. Average silhouette method

B

  1. Bagging

  2. Bag of words (BoW)

  3. Bayesian statistics

  4. Biological vs. artificial neuron

  5. Bootstrapaggregation

SeeBagging

C

  1. Clustering

    1. hierarchicalcluster

SeeHierarchical cluster technique
  1. K-means

    1. accuracy

    2. average silhouette method

    3. elbow method

    4. expectation maximization

    5. limitations

    6. methods

  2. text

    1. k-means

    2. LSA

    3. singular value decomposition

    4. source code

    5. SVD

  1. Collaborative filtering (CF)

  2. Command-line installer

  3. Convolution neural network (CNN)

  4. Cross-industry standard process for data mining (CRISP-DM)

    1. business

    2. data gaps/relevance

    3. deployment

    4. evaluation

    5. framework and phases

    6. modeling

    7. preparation

    8. process diagram

D

  1. Data assemble (text)

    1. dataframe

    2. get access key

    3. pdf, jpg, and audio file

    4. social media

    5. textract formats

    6. twitter authentication

  2. DataFrame

  3. Data mining

    1. KDD

    2. techniques

  4. Data preprocessing (text)

    1. bag of words

    2. lemmatization

    3. lower() function

    4. n-grams

    5. PoS tagging

    6. removing noise

    7. sentence tokenization

    8. stemming

    9. TF-IDF

    10. word tokenization

  5. Data science

  6. Deep learning

    1. ANN

SeeArtificial neural network (ANN)
  1. Caffe

  2. Keras

  3. Lasagne

  4. libraries

  5. MXNet

  6. Pylearn2

  7. TensorFlow

  8. Theano

  1. Deep natural language processing (DNLP)

    1. sopex package

    2. Word2Vec

  2. Descriptive analytics

  3. Diagnostic analytics

  4. dir() operation code

  5. Document term matrix (DTM)

E

  1. Elbow method

  2. Ensemble methods

    1. bagging

    2. decision boundaries

    3. ExtraTree

    4. feature importance function

    5. RandomForest

    6. types of

  3. Enterprise resource planning (ERP) systems

  4. Exception handling

    1. code flow

    2. file operations

    3. Python built-in

    4. source code

    5. try clause

  5. Exploration (text)

    1. co-occurrence matrix

    2. frequency chart

    3. lexical dispersion plot

    4. word cloud

  6. Exploratory data analysis (EDA)

    1. Iris dataset

    2. multivariate analysis

      1. code creation

      2. correlation matrix

      3. findings values

      4. pair plot

    3. pandas dataframe visualization

    4. univariate analysis

  7. Extremely randomized trees (ExtraTree)

F

  1. Feature engineering

    1. construction/generation

    2. handling categorical data

      1. dummy variable creation

      2. number conversion

    3. logical flow

    4. missing data

    5. normalization and scaling

    6. raw data

    7. summarization methods

G

  1. Generalized linear models (GLM)

  2. Global positioning system (GPS)

  3. Gradient boosting

  4. GridSearch

H

  1. Hard voting vs. soft voting

  2. Hierarchical cluster technique

    1. key parameters

    2. maximum linkage

    3. source code

  3. Hyperparameter tuning

    1. approach

    2. GridSearch

    3. RandomSearch

I

  1. Identity operators

  2. Input/output file

    1. opening mode

    2. operations

    3. sequence

J

  1. Join statement

    1. inner

    2. left

    3. outer

    4. right

K

  1. K-folds cross-validation

    1. classification model

    2. holdout/single fold method

    3. stratification

  2. k nearest neighbors (kNN)

  3. Knowledge discovery databases (KDD)

    1. data mining

    2. data mining process flow

    3. interpretation/evaluation

    4. preprocessing and cleaning

    5. selection

    6. stages

    7. transformation techniques

L

  1. Latent Dirichlet Allocation (LDA)

  2. Latent semantic analysis (LSA)

  3. Lemmatization

  4. Linear regression vs. logistic regression

  5. Logistic regression

    1. GLM distribution

    2. load data

    3. model training and evaluation

    4. multi-classes

    5. normalize data

    6. split data

  6. Long short-term memory (LSTM)

M

  1. Machine learning

    1. AI

SeeArtificial intelligence (AI)
  1. AI evolution

    1. areas

  2. categorization

  3. CRISP-DM

SeeCross-industry standard process for data mining (CRISP-DM)
  1. data

    1. attributes

    2. comparison

    3. continuous or quantitative

    4. discrete/qualitative

    5. fact and figures

    6. interval scale

    7. measurement scales

    8. nominal level

    9. ordinal scale

    10. ratio scale

  2. definitions

  3. EDA

SeeExploratory data analysis (EDA)
  1. feature engineering

SeeFeature engineering
  1. frameworks

  2. history

  3. KDD

SeeKnowledge discovery databases (KDD)
  1. libraries

  2. ML imposters

  3. overview

  4. pattern recognition

  5. process loop

  6. prospect customer identification

  7. Python

SeePython packages
  1. questions/hypothesis

  2. recommendation system

  3. regression

SeeSupervised learning
  1. reinforcement

  2. resources

  3. robotic intelligent"

  4. scikit-learn

  5. SEMMA

SeeSample, explore, modify, model, assess (SEMMA)
  1. simple models

  2. spam detection

  3. statsmodels

  4. supervised learning

    1. classification

    2. regression

  5. Turing test

  6. unsupervised learning

    1. clustering

    2. dimension reduction

  7. wheels from scratch

  1. Matplotlib

  2. Mean absolute error

  3. Model building, text similarity

  4. Model diagnosis and tuning

    1. attributes

    2. bias and variance

    3. boosting

      1. AdaBoosting process

      2. ensemble voting

      3. essential tuning parameters

      4. gradient boosting

      5. illustration

      6. sklearn wrapper

      7. stacking

      8. xgboost

    4. ensemble methods

      1. bagging

      2. decision boundaries

      3. ExtraTree

      4. feature importance function

      5. RandomForest

      6. types of

    5. hyperparameter

SeeHyperparameter tuning
  1. k-fold cross-validation

  2. probability cutoff point

    1. class distribution

    2. error message

    3. functions

    4. logistic regression model

    5. optimal cutoff point

  3. rare event/imbalanced dataset

    1. disadvantages

    2. handling techniques

    3. make_classification function

    4. re-sampling

  4. variance

  1. Moving average (MA)

  2. Multivariate linear regression model

  3. Multivariate regression

    1. housing dataset (RDatasets)

    2. multicollinearity and VIF

    3. regression diagnostics

      1. homoscedasticity test

      2. linearity check

      3. model fittings

      4. outliers

      5. over-fitting

      6. under-fitting

N

  1. Natural language processing (NLP)

SeeText mining
  1. N-grams

  2. Nonlinear regression

  3. Non-negative matrix factorization (NMF)

  4. NumPy

    1. arrays

    2. broadcasting

    3. built-in functions

    4. indexing

      1. boolean

      2. field access

      3. integer

      4. slice syntax

      5. types

    5. mathematical functions

      1. array math

      2. sum function

      3. transpose function

    6. types

O

  1. Object-oriented

    1. bar plots–ax.bar() and ax.barh()

    2. colomaps reference

    3. customization

    4. grid creation

    5. horizontal bar charts

    6. line plots–ax.plot()

    7. line style and marker style

    8. marker reference

    9. matplotlib line style reference

    10. multiple lines-different axis

    11. multiple lines-same axis

    12. pie chart–ax.pie()

    13. plotting defaults

    14. side-by-side bar chart

    15. stacked bar charts

P, Q

  1. Pandas

    1. DataFrame

    2. data structures

    3. grouping operation

    4. join

    5. merge/join

    6. operations

    7. pivot tables

    8. reading and writing data

    9. SQL/excel/R data frames

    10. statistics

    11. view function

  2. Partial autocorrelation function (PACF)

  3. Part of speech (PoS) tagging

  4. Polynomial regression

  5. Predictive analytics

  6. Prescriptive analytics

  7. Principal component analysis (PCA)

  8. Problem types vs. potential ML algorithms

  9. Python

    1. code blocks

      1. correct indentation

      2. incorrect indentation

      3. indentation

      4. suites

    2. control structure

      1. iteration

      2. loop control statement

      3. selection statements

    3. definition

    4. dictionary

    5. exception handling

    6. file input/output

    7. identifier

    8. interactive

    9. keywords

    10. lists

    11. module

    12. mottos

    13. multiline statements

    14. NumPy and Pandas

    15. object types

      1. comments

      2. list vs. tuple vs. set vs. dictionary

      3. multiline comments

      4. single line

    16. operators

      1. arithmetic operators

      2. assignment operators

      3. bitwise operators

      4. comparison/relational operators

      5. identity operators

      6. logical operators

      7. membership operators

      8. types

    17. vs. others

    18. popular coding language

    19. sets

    20. tuple

      1. accessing tuple

      2. deleting items

      3. operations

    21. user-defined functions

    22. 2.7/3.4.x

      1. Anaconda

      2. graphical installer

      3. Linux installation

      4. official website

      5. OSX installation

      6. run command line

      7. version

      8. Windows installation

  10. Python packages

    1. customizing labels

    2. data analysis

      1. global functions

      2. key packages

      3. Matplotlib

      4. NumPy

      5. Pandas

    3. libraries

    4. object oriented

R

  1. RandomForest

  2. RandomSearch

  3. Recommender systems

    1. collaborative filtering (CF)

    2. content-based filtering

    3. types

  4. Recurrent neural network (RNN)

  5. Regression analysis

  6. Regularization

  7. Reinforcement learning

  8. Restricted Boltzman Machines (RBM)

  9. Robotic intelligent agent

    1. components

    2. definition

    3. sensors and effectors

    4. Turing test

  10. Root mean squared error (RMSE)

S

  1. Sample, explore, modify, model, assess (SEMMA)

    1. assess

    2. CRISP-DM and KDD

    3. explore

    4. frameworks

    5. modeling/data mining

    6. modify

    7. sample

  2. Sentiment analysis

  3. Sets

    1. accessing set elements

    2. changing elements

    3. code creation

    4. difference

    5. discard()/remove() method

    6. intersection

    7. key characteristics

    8. operations

    9. symmetric_difference()method

    10. union

  4. Stacking

  5. Stochastic gradient descent algorithm

  6. Supervised learning

    1. cases

    2. classification

    3. confusion matrix

    4. correlation and causation

    5. decision trees

      1. key parameters

      2. model

      3. nodes

      4. splits and grows

      5. stopping partition

    6. fitting line

    7. k nearest neighbors (kNN)

    8. linear regression model

      1. mean absolute error

      2. metrics

      3. RMSE

      4. R-squared metrics

    9. logistic regression

    10. model performance classification

    11. multiclass logistic regression

    12. multivariate regression

      1. coefficient

      2. Durbin-Watson statistics

      3. housing dataset (RDatasets)

      4. hypothesis testing steps

      5. multicollinearity and VIF

      6. normal distribution

      7. OLS regression results

      8. regression diagnostics

      9. results

      10. R-squared value

      11. standard error

      12. t and p-value

    13. nonlinear regression

    14. plot sigmoid function

    15. polynomial regression

    16. process flow

    17. regularization

    18. ROC curve

    19. scatter plot

    20. slope line fitting

    21. stochastic gradient descent

    22. students score vs. hours

    23. SVM

SeeSupport vector machine (SVM)
  1. time-series forecasting

    1. ARIMA

    2. components

    3. stationary time series

  2. under-fitting, right-fitting, and over-fitting

  1. Supervised learning algorithms

    1. classification

    2. regression

  2. Support vector machine (SVM)

    1. decision boundaries

    2. equation

    3. key objective

    4. key parameters

  3. symmetric_difference()method

T

  1. Term document matrix (TDM)

  2. Term frequency-inverse document frequency (TF-IDF)

  3. Text mining

    1. data assemble

    2. datapreprocessing

SeeData preprocessing (text)
  1. DNLP

SeeDeep natural language processing (DNLP)
  1. exploration

    1. co-occurrence matrix

    2. frequency chart

    3. lexical dispersion plot

    4. word cloud

  2. libraries

  3. model building

    1. classification

    2. clustering

    3. document term matrix

    4. Euclidian vs. cosine

    5. sentiment analysis

    6. text similarity

    7. topic modeling

  4. phases

  5. process overview

  1. Time-series forecasting

    1. ARIMA

    2. components

    3. stationary time series

  2. Transfer learning

  3. Tuple, operations

  4. Turing test

U, V

  1. Unsupervised learning

    1. clustering

SeeClustering
  1. PCA

  2. process flow

  1. User-defined functions

    1. default argument

    2. definition

    3. functions with arguments

    4. functions without argument

    5. **kwargs

    6. length arguments

    7. passing argumens (*args)

    8. variable/identifier

W

  1. Word2Vec

X, Y, Z

  1. Xgboost (eXtreme gradient boosting)