Table of Contents for
Hands-On Machine Learning for Cybersecurity

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition

Hands-On Machine Learning for Cybersecurity by Sinan Ozdemir Published by Packt Publishing, 2018

Hands-on Machine Learning for Cybersecurity
Title Page
Copyright and Credits
Hands-On Machine Learning for Cybersecurity
About Packt
Why subscribe?
Packt.com
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Table of Contents
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Basics of Machine Learning in Cybersecurity
What is machine learning?
Problems that machine learning solves
Why use machine learning in cybersecurity?
Current cybersecurity solutions
Data in machine learning
Structured versus unstructured data
Labelled versus unlabelled data
Machine learning phases
Inconsistencies in data
Overfitting
Underfitting
Different types of machine learning algorithm
Supervised learning algorithms
Unsupervised learning algorithms
Reinforcement learning
Another categorization of machine learning
Classification problems
Clustering problems
Regression problems
Dimensionality reduction problems
Density estimation problems
Deep learning
Algorithms in machine learning
Support vector machines
Bayesian networks
Decision trees
Random forests
Hierarchical algorithms
Genetic algorithms
Similarity algorithms
ANNs
The machine learning architecture
Data ingestion
Data store
The model engine
Data preparation
Feature generation
Training
Testing
Performance tuning
Mean squared error
Mean absolute error
Precision, recall, and accuracy
How can model performance be improved?
Fetching the data to improve performance
Switching machine learning algorithms
Ensemble learning to improve performance
Hands-on machine learning
Python for machine learning
Comparing Python 2.x with 3.x
Python installation
Python interactive development environment
Jupyter Notebook installation
Python packages
NumPy
SciPy
Scikit-learn
pandas
Matplotlib
Mongodb with Python
Installing MongoDB
PyMongo
Setting up the development and testing environment
Use case
Data
Code
Summary
Time Series Analysis and Ensemble Modeling
What is a time series?
Time series analysis
Stationarity of a time series models
Strictly stationary process
Correlation in time series
Autocorrelation
Partial autocorrelation function
Classes of time series models
Stochastic time series model
Artificial neural network time series model
Support vector time series models
Time series components
Systematic models
Non-systematic models
Time series decomposition
Level
Trend
Seasonality
Noise
Use cases for time series
Signal processing
Stock market predictions
Weather forecasting
Reconnaissance detection
Time series analysis in cybersecurity
Time series trends and seasonal spikes
Detecting distributed denial of series with time series
Dealing with the time element in time series
Tackling the use case
Importing packages
Importing data in pandas
Data cleansing and transformation
Feature computation
Predicting DDoS attacks
ARMA
ARIMA
ARFIMA
Ensemble learning methods
Types of ensembling
Averaging
Majority vote
Weighted average
Types of ensemble algorithm
Bagging
Boosting
Stacking
Bayesian parameter averaging
Bayesian model combination
Bucket of models
Cybersecurity with ensemble techniques
Voting ensemble method to detect cyber attacks
Summary
Segregating Legitimate and Lousy URLs
Introduction to the types of abnormalities in URLs
URL blacklisting
Drive-by download URLs
Command and control URLs
Phishing URLs
Using heuristics to detect malicious pages
Data for the analysis
Feature extraction
Lexical features
Web-content-based features
Host-based features
Site-popularity features
Using machine learning to detect malicious URLs
Logistic regression to detect malicious URLs
Dataset
Model
TF-IDF
SVM to detect malicious URLs
Multiclass classification for URL classification
One-versus-rest
Summary
Knocking Down CAPTCHAs
Characteristics of CAPTCHA
Using artificial intelligence to crack CAPTCHA
Types of CAPTCHA
reCAPTCHA
No CAPTCHA reCAPTCHA
Breaking a CAPTCHA
Solving CAPTCHAs with a neural network
Dataset
Packages
Theory of CNN
Model
Code
Training the model
Testing the model
Summary
Using Data Science to Catch Email Fraud and Spam
Email spoofing
Bogus offers
Requests for help
Types of spam emails
Deceptive emails
CEO fraud
Pharming
Dropbox phishing
Google Docs phishing
Spam detection
Types of mail servers
Data collection from mail servers
Using the Naive Bayes theorem to detect spam
Laplace smoothing
Featurization techniques that convert text-based emails into numeric values
Log-space
TF-IDF
N-grams
Tokenization
Logistic regression spam filters
Logistic regression
Dataset
Python
Results
Summary
Efficient Network Anomaly Detection Using k-means
Stages of a network attack
Phase 1 – Reconnaissance
Phase 2 – Initial compromise
Phase 3 – Command and control
Phase 4 – Lateral movement
Phase 5 – Target attainment
Phase 6 – Ex-filtration, corruption, and disruption
Dealing with lateral movement in networks
Using Windows event logs to detect network anomalies
Logon/Logoff events
Account logon events
Object access events
Account management events
Active directory events
Ingesting active directory data
Data parsing
Modeling
Detecting anomalies in a network with k-means
Network intrusion data
Coding the network intrusion attack
Model evaluation
Sum of squared errors
Choosing k for k-means
Normalizing features
Manual verification
Summary
Decision Tree and Context-Based Malicious Event Detection
Adware
Bots
Bugs
Ransomware
Rootkit
Spyware
Trojan horses
Viruses
Worms
Malicious data injection within databases
Malicious injections in wireless sensors
Use case
The dataset
Importing packages
Features of the data
Model
Decision tree
Types of decision trees
Categorical variable decision tree
Continuous variable decision tree
Gini coeffiecient
Random forest
Anomaly detection
Isolation forest
Supervised and outlier detection with Knowledge Discovery Databases (KDD)
Revisiting malicious URL detection with decision trees
Summary
Catching Impersonators and Hackers Red Handed
Understanding impersonation
Different types of impersonation fraud
Impersonators gathering information
How an impersonation attack is constructed
Using data science to detect domains that are impersonations
Levenshtein distance
Finding domain similarity between malicious URLs
Authorship attribution
AA detection for tweets
Difference between test and validation datasets
Sklearn pipeline
Naive Bayes classifier for multinomial models
Identifying impersonation as a means of intrusion detection
Summary
Changing the Game with TensorFlow
Introduction to TensorFlow
Installation of TensorFlow
TensorFlow for Windows users
Hello world in TensorFlow
Importing the MNIST dataset
Computation graphs
What is a computation graph?
Tensor processing unit
Using TensorFlow for intrusion detection
Summary
Financial Fraud and How Deep Learning Can Mitigate It
Machine learning to detect financial fraud
Imbalanced data
Handling imbalanced datasets
Random under-sampling
Random oversampling
Cluster-based oversampling
Synthetic minority oversampling technique
Modified synthetic minority oversampling technique
Detecting credit card fraud
Logistic regression
Loading the dataset
Approach
Logistic regression classifier – under-sampled data
Tuning hyperparameters
Detailed classification reports
Predictions on test sets and plotting a confusion matrix
Logistic regression classifier – skewed data
Investigating precision-recall curve and area
Deep learning time
Adam gradient optimizer
Summary
Case Studies
Introduction to our password dataset
Text feature extraction
Feature extraction with scikit-learn
Using the cosine similarity to quantify bad passwords
Putting it all together
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think

Previous Chapter

Using heuristics to detect malicious pages

Next Chapter

Feature extraction

Data for the analysis

We will gather data from different sources, and will be able to create a dataset with approximately 1,000 URLs. These URLs are prelabelled in their respective classes: benign, spam, and malicious. The following screenshot is a snippet from our URL dataset:

Previous Chapter

Using heuristics to detect malicious pages

Next Chapter

Feature extraction

Table of Contents for Hands-On Machine Learning for Cybersecurity

Table of Contents for
Hands-On Machine Learning for Cybersecurity