Title Page Copyright and Credits Hands-On Machine Learning for Cybersecurity About Packt Why subscribe? Packt.com Contributors About the authors About the reviewers Packt is searching for authors like you Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews Basics of Machine Learning in Cybersecurity What is machine learning? Problems that machine learning solves Why use machine learning in cybersecurity? Current cybersecurity solutions Data in machine learning Structured versus unstructured data Labelled versus unlabelled data Machine learning phases Inconsistencies in data Overfitting Underfitting Different types of machine learning algorithm Supervised learning algorithms Unsupervised learning algorithms  Reinforcement learning Another categorization of machine learning Classification problems Clustering problems Regression problems Dimensionality reduction problems Density estimation problems Deep learning Algorithms in machine learning Support vector machines Bayesian networks Decision trees Random forests Hierarchical algorithms Genetic algorithms Similarity algorithms ANNs The machine learning architecture Data ingestion Data store The model engine Data preparation  Feature generation Training Testing Performance tuning Mean squared error Mean absolute error Precision, recall, and accuracy How can model performance be improved? Fetching the data to improve performance Switching machine learning algorithms Ensemble learning to improve performance Hands-on machine learning Python for machine learning Comparing Python 2.x with 3.x  Python installation  Python interactive development environment Jupyter Notebook installation Python packages NumPy SciPy Scikit-learn  pandas Matplotlib Mongodb with Python Installing MongoDB PyMongo Setting up the development and testing environment Use case Data Code Summary Time Series Analysis and Ensemble Modeling What is a time series? Time series analysis Stationarity of a time series models Strictly stationary process Correlation in time series Autocorrelation Partial autocorrelation function Classes of time series models Stochastic time series model Artificial neural network time series model  Support vector time series models Time series components Systematic models Non-systematic models Time series decomposition Level  Trend  Seasonality  Noise  Use cases for time series Signal processing Stock market predictions Weather forecasting Reconnaissance detection Time series analysis in cybersecurity Time series trends and seasonal spikes Detecting distributed denial of series with time series Dealing with the time element in time series Tackling the use case Importing packages Importing data in pandas Data cleansing and transformation Feature computation Predicting DDoS attacks ARMA ARIMA ARFIMA Ensemble learning methods Types of ensembling Averaging Majority vote Weighted average Types of ensemble algorithm Bagging Boosting Stacking Bayesian parameter averaging Bayesian model combination Bucket of models Cybersecurity with ensemble techniques Voting ensemble method to detect cyber attacks Summary Segregating Legitimate and Lousy URLs Introduction to the types of abnormalities in URLs URL blacklisting Drive-by download URLs Command and control URLs Phishing URLs Using heuristics to detect malicious pages Data for the analysis Feature extraction Lexical features Web-content-based features Host-based features Site-popularity features Using machine learning to detect malicious URLs  Logistic regression to detect malicious URLs Dataset Model TF-IDF SVM to detect malicious URLs Multiclass classification for URL classification One-versus-rest Summary Knocking Down CAPTCHAs Characteristics of CAPTCHA Using artificial intelligence to crack CAPTCHA Types of CAPTCHA reCAPTCHA No CAPTCHA reCAPTCHA Breaking a CAPTCHA Solving CAPTCHAs with a neural network Dataset  Packages Theory of CNN Model Code Training the model Testing the model  Summary Using Data Science to Catch Email Fraud and Spam Email spoofing  Bogus offers Requests for help Types of spam emails Deceptive emails CEO fraud Pharming  Dropbox phishing Google Docs phishing Spam detection Types of mail servers  Data collection from mail servers Using the Naive Bayes theorem to detect spam Laplace smoothing Featurization techniques that convert text-based emails into numeric values Log-space TF-IDF N-grams Tokenization Logistic regression spam filters Logistic regression Dataset Python Results Summary Efficient Network Anomaly Detection Using k-means Stages of a network attack Phase 1 – Reconnaissance  Phase 2 – Initial compromise  Phase 3 – Command and control  Phase 4 – Lateral movement Phase 5 – Target attainment  Phase 6 – Ex-filtration, corruption, and disruption  Dealing with lateral movement in networks Using Windows event logs to detect network anomalies Logon/Logoff events  Account logon events Object access events Account management events Active directory events Ingesting active directory data Data parsing Modeling Detecting anomalies in a network with k-means Network intrusion data Coding the network intrusion attack Model evaluation  Sum of squared errors Choosing k for k-means Normalizing features Manual verification Summary Decision Tree and Context-Based Malicious Event Detection Adware Bots Bugs Ransomware Rootkit Spyware Trojan horses Viruses Worms Malicious data injection within databases Malicious injections in wireless sensors Use case The dataset Importing packages  Features of the data Model Decision tree  Types of decision trees Categorical variable decision tree Continuous variable decision tree Gini coeffiecient Random forest Anomaly detection Isolation forest Supervised and outlier detection with Knowledge Discovery Databases (KDD) Revisiting malicious URL detection with decision trees Summary Catching Impersonators and Hackers Red Handed Understanding impersonation Different types of impersonation fraud  Impersonators gathering information How an impersonation attack is constructed Using data science to detect domains that are impersonations Levenshtein distance Finding domain similarity between malicious URLs Authorship attribution AA detection for tweets Difference between test and validation datasets Sklearn pipeline Naive Bayes classifier for multinomial models Identifying impersonation as a means of intrusion detection  Summary Changing the Game with TensorFlow Introduction to TensorFlow Installation of TensorFlow TensorFlow for Windows users Hello world in TensorFlow Importing the MNIST dataset Computation graphs What is a computation graph? Tensor processing unit Using TensorFlow for intrusion detection Summary Financial Fraud and How Deep Learning Can Mitigate It Machine learning to detect financial fraud Imbalanced data Handling imbalanced datasets Random under-sampling Random oversampling Cluster-based oversampling Synthetic minority oversampling technique Modified synthetic minority oversampling technique Detecting credit card fraud Logistic regression Loading the dataset Approach Logistic regression classifier – under-sampled data Tuning hyperparameters  Detailed classification reports Predictions on test sets and plotting a confusion matrix Logistic regression classifier – skewed data Investigating precision-recall curve and area Deep learning time Adam gradient optimizer Summary Case Studies Introduction to our password dataset Text feature extraction Feature extraction with scikit-learn Using the cosine similarity to quantify bad passwords Putting it all together Summary Other Books You May Enjoy Leave a review - let other readers know what you think