Table of Contents for
Hands-On Unsupervised Learning Using Python
Close
Version ebook
/
Retour
Hands-On Unsupervised Learning Using Python
by Ankur A. Patel
Published by O'Reilly Media, Inc., 2019
Cover
nav
Hands-on Unsupervised Learning Using Python
Hands-On Unsupervised Learning Using Python
Preface
I. Fundamentals of Unsupervised Learning
1. Unsupervised Learning in the Machine Learning Ecosystem
2. End-to-End Machine Learning Project
II. Unsupervised Learning Using Scikit-Learn
3. Dimensionality Reduction
4. Anomaly Detection
5. Clustering
6. Group Segmentation
III. Unsupervised Learning using TensorFlow and Keras
7. Autoencoders
8. Hands-On Autoencoder
9. Semi-Supervised Learning
IV. Deep Unsupervised Learning using TensorFlow and Keras
10. Recommender Systems Using Restricted Boltzmann Machines
11. Feature Detection Using Deep Belief Networks
12. Generative Adversarial Networks
13. Time Series Clustering
14. Conclusion
Index
About the Author
Preface
A Brief History of Machine Learning
AI is Back, but Why Now?
The Emergence of Applied AI
Major Milestones in Applied AI over the Past 20 Years
From Narrow AI to General AI
Objective and Approach
Prerequisites
Roadmap
Other Resources
Conventions Used in This Book
Using Code Examples
O’Reilly Safari
How to Contact Us
I. Fundamentals of Unsupervised Learning
1. Unsupervised Learning in the Machine Learning Ecosystem
Basic Machine Learning Terminology
Rules-Based Versus Machine Learning
Supervised Versus Unsupervised
The Strengths and Weaknesses of Supervised Learning
The Strengths and Weaknesses of Unsupervised Learning
Using Unsupervised Learning to Improve Machine Learning Solutions
A Closer Look at Supervised Algorithms
Linear Methods
Neighborhood-Based Methods
Tree-Based Methods
Support Vector Machines
Neural Networks
A Closer Look at Unsupervised Algorithms
Dimensionality Reduction
Clustering
Feature Extraction
Unsupervised Deep Learning
Sequential Data Problems Using Unsupervised Learning
Reinforcement Learning Using Unsupervised Learning
Semi-Supervised Learning
Successful Applications of Unsupervised Learning
Conclusion
2. End-to-End Machine Learning Project
Environment Setup
Version Control: Git
Clone the Hands-On Unsupervised Learning Git Repository
Scientific Libraries: Anaconda Distribution of Python
Neural Networks: TensorFlow and Keras
Gradient Boosting, Version One: XGBoost
Gradient Boosting, Version Two: LightGBM
Clustering Algorithms
Interactive Computing Environment: Jupyter Notebook
Overview of the Data
Data Preparation
Data Acquisition
Data Exploration
Generate Feature Matrix and Labels Array
Feature Engineering and Feature Selection
Data Visualization
Model Preparation
Split into Training and Test Sets
Select Cost Function
Create k-Fold Cross Validation Sets
Machine Learning Models - Part One
Model One - Logistic Regression
Evaluation metrics
Confusion matrix
Precision-Recall Curve
Receiver Operating Characteristic
Machine Learning Models - Part Two
Model Two - Random Forests
Model Three - Gradient Boosting Machine (XGBoost)
Model Four - Gradient Boosting Machine (LightGBM)
Evaluation of the Four Models Using the Test Set
Ensembles
Stacking
Final Model Selection
Production Pipeline
Conclusion
II. Unsupervised Learning Using Scikit-Learn
3. Dimensionality Reduction
The Motivation for Dimensionality Reduction
The MNIST Digits Database
Dimensionality Reduction Algorithms
Linear Projection Versus Manifold Learning
Principal Component Analysis (PCA)
PCA, the concept
PCA in Practice
Incremental PCA
Sparse PCA
Kernel PCA
Singular Value Decomposition
Random Projection
Gaussian Random Projection
Sparse Random Projection
Isomap
Multidimensional Scaling (MDS)
Locally Linear Embedding
t-Distributed Stochastic Neighbor Embedding (t-SNE)
Other Dimensionality Reduction Methods
Dictionary Learning
Independent Component Analysis
Conclusion
4. Anomaly Detection
Credit Card Fraud Detection
Prepare the Data
Define Anomaly Score Function
Define Evaluation Metrics
Define Plotting Function
Normal PCA Anomaly Detection
PCA Components Equal Number of Original Dimensions
Search for the Optimal Number of Principal Components
Sparse PCA Anomaly Detection
Kernel PCA Anomaly Detection
Gaussian Random Projection Anomaly Detection
Sparse Random Projection Anomaly Detection
Non-Linear Anomaly Detection
Dictionary Learning Anomaly Detection
Independent Component Analysis Anomaly Detection
Fraud Detection on the Test Set
Normal PCA Anomaly Detection on the Test Set
Independent Component Analysis Anomaly Detection on the Test Set
Dictionary Learning Anomaly Detection on the Test Set
Conclusion
5. Clustering
MNIST Digits Dataset
Data Preparation
Clustering Algorithms
K-Means
K-Means Inertia
Evaluating the Clustering Results
K-Means Accuracy
K-Means and the Number of Principal Components
K-Means on the Original Dataset
Hierarchical Clustering
Agglomerative Hierarchical Clustering
The Dendrogram
Evaluating the Clustering Results
DBSCAN
DBSCAN Algorithm
Applying DBSCAN to Our Dataset
HDBSCAN
Conclusion
6. Group Segmentation
Lending Club Data
Data Preparation
Transform String to Numerical
Impute Missing Values
Engineer Features
Select Final Set of Features and Perform Scaling
Designate Labels for Evaluation
Goodness of the Clusters
K-Means Application
Hierarchical Clustering Application
HDBSCAN Application
Conclusion
III. Unsupervised Learning using TensorFlow and Keras
7. Autoencoders
Neural Networks
TensorFlow
Keras
Autoencoder - the Encoder and the Decoder
Undercomplete Autoencoders
Overcomplete Autoencoders
Dense Versus Sparse Autoencoders
Denoising Autoencoder (DAE)
Variational Autoencoder (VAE)
Conclusion
8. Hands-On Autoencoder
Data Preparation
The Components of an Autoencoder
Activation Functions
Our First Autoencoder
Loss Function
Optimizer
Training the Model
Evaluating on the Test Set
Two-Layer Undercomplete Autoencoder with Linear Activation Function
Increasing the Number of Nodes
Adding More Hidden Layers
Non-Linear Autoencoder
Overcomplete Autoencoder with Linear Activation
Overcomplete Autoencoder with Linear Activation and Dropout
Sparse Overcomplete Autoencoder with Linear Activation
Sparse Overcomplete Autoencoder with Linear Activation and Dropout
Working with Noisy Datasets
Denoising Autoencoder
Two-Layer Denoising Undercomplete Autoencoder with Linear Activation
Two-Layer Denoising Overcomplete Autoencoder with Linear Activation
Two-Layer Denoising Overcomplete Autoencoder with ReLu Activation
Conclusion
9. Semi-Supervised Learning
Data Preparation
Supervised Model
Unsupervised Model
Semi-Supervised Model
The Power of Supervised and Unsupervised
Conclusion
IV. Deep Unsupervised Learning using TensorFlow and Keras
10. Recommender Systems Using Restricted Boltzmann Machines
Boltzmann Machines
Restricted Boltzmann Machines
Recommender Systems
Collaborative Filtering
The Netflix Prize
MovieLens Dataset
Data Preparation
Define the Cost Function: Mean Squared Error
Perform Baseline Experiments
Matrix Factorization
One Latent Factor
Three Latent Factors
Five Latent Factors
Collaborative Filtering Using RBMs
RBM Neural Network Architecture
Build the Components of the RBM Class
Train RBM Recommender System
Conclusion
11. Feature Detection Using Deep Belief Networks
Deep Belief Networks in Detail
MNIST Image Classification
Restricted Boltzmann Machines
Build the Components of the RBM Class
Generate Images Using the RBM Model
View the Intermediate Feature Detectors
Train the Three RBMs for the DBN
Examine Feature Detectors
View Generated Images
The Full DBN
How Training of a DBN Works
Train the DBN
How Unsupervised Learning Helps Supervised Learning
Generate Images to Build a Better Image Classifier
Image Classifier Using LightGBM
Supervised Only
Unsupervised and Supervised Solution
Conclusion
12. Generative Adversarial Networks
GANs, the Concept
The Power of GANs
Deep Convolutional GANs (DCGANs)
Convolutional Neural Networks (CNNs)
DCGANs Revisited
Generator of the DCGAN
Discriminator of the DCGAN
Discriminator and Adversarial Models
DCGAN for the MNIST Dataset
MNIST DCGAN in Action
Synthetic Image Generation
Conclusion
13. Time Series Clustering
ECG Data
Approach to Time Series Clustering
k-Shape
Time Series Clustering using k-Shape on ECG Five Days
Data Preparation
Training and Evaluation
Time Series Clustering Using k-Shape on ECG 5000
Data Preparation
Training and Evaluation
Time Series Clustering Using k-Means on ECG 5000
Time Series Clustering Using Hierarchical DBSCAN on ECG 5000
Comparing the Time Series Clustering Algorithms
Full Run with k-Shape
Full Run with k-Means
Full Run with HDBSCAN
Compare all three time series clustering approaches
Conclusion
14. Conclusion
Supervised Learning
Unsupervised Learning
Scikit-Learn
TensorFlow and Keras
Reinforcement Learning
Most Promising Areas of Unsupervised Learning Today
The Future of Unsupervised Learning
Final Words
Index