©  Manohar Swamynathan 2017

Manohar Swamynathan, Mastering Machine Learning with Python in Six Steps, 10.1007/978-1-4842-2866-1_7

7. Conclusion

Manohar Swamynathan

(1)Bangalore, Karnataka, India

Summary

I hope you have enjoyed the six-step simplified machine learning expedition. You started your learning journey with step 1, getting started in Python, where you learned the core philosophy and key concepts of the Python programming language. In step 2, you learned about machine learning history, high-level categories (supervised/unsupervised/reinforcement learning), and three important frameworks for building ML systems (SEMMA, CRISP-DM, KDD Data Mining process), primary data analysis packages (NumPy, Pandas, Matplotlib) and their key concepts, comparison of different core machine learning libraries. In step 3, fundamentals of machine learning, you learned different data types, key data quality issues and how to handle them, exploratory analysis, core methods of supervised / unsupervised learning and their implementation with an example. In step 4, model diagnosis and tuning, you learned the various techniques for model diagnosis, bagging for over-fitting, boosting for under-fitting, and ensemble techniques, hyperparameter tuning (grid / random search) for building efficient models. In step 5, text mining and recommender systems, you learned about the text mining process overview, data assemble, data preprocessing, data exploration or visualization, and various models that can be built. You also learned how to build collaborative/content-based recommender systems to personalize the user experience. In step 6, deep and reinforcement learning, you learned about Artificial Neural Network through Perceptron, Convolution Neural Network (CNN) for image analytics, and Recurrent Neural Network (RNN) for text analytics, and a simple toy example for learning the reinforcement learning concept. These are the advanced topics that have seen great development in the last few years.

Overall, you have learned a broad range of commonly used machine learning topics, and each of them come with a number of parameters to control and tune the model performance. To keep it simple throughout the book, I have either used the default parameters or you were introduced only to the key parameters (in some places). The default options for parameters have been carefully chosen by the creators of the packages to give decent results to get you started. So, to start with, you can go with the default parameters. However I recommend that you explore the other parameters and play with them using manual / grid / random searches to ensure a robust model. Table 7-1, below, is a summary of various possible problem types, example use cases, and the potential machine learning algorithms that you can use. Note that this is a sample list only, not an exhaustive list.

Table 7-1. Problem types vs. potential ML algorithms

Problem Type

Example Use Case(s)

Potential ML Algorithms

Predicting a continuous number

What will be store daily/weekly sales?

Linear Regression or Polynomial regression

Predicting a count type continuous number

How many staffs are required for a shift? How many number of car parking spaces are required for a new store?

Generalized Linear Model with Poisson distribution

Predict probability of event (True/False)

What is the probability of a transaction being fraud?

Binary Classification models (Logistic regression, Decision tree models, Boosting models, KNN, etc.)

Predict probability of event out of many possible events (Multi class)

What is the probability of a transaction being high risk/medium risk/low risk?

Multiclass Classification models (Logistic regression, Decision tree models, Boosting models, KNN, etc.)

Group the contents based on similarity

Group similar customers?

Group similar categories?

K-means clustering, Hierarchical clustering

Dimension reduction

What are the important dimensions that hold maximum percentage of information

Principal Component Analysis (PCA), Singular Value Decomposition (SVD)

Topic Modeling

Group documents based on topics or thematic structure?

Latent Dirichlet Allocation, Non-negative matrix factorization

Opinion Mining

Predict the sentiment associated with text?

Natural Language Tool Kit(NLTK)

Recommend systems

What products/items to be marketed to a user?

Content-based filtering, Collaborative filtering

Text Classification

Predict probability of document being part of a known class?

Recurrent Neural Network (RNN), Binary or Multiclass classification models

Image Classification

Predict probability of image being part of a known class?

Convolution Neural Network (CNN), Binary or Multiclass classification models

Tips

Building an efficient model can be a challenging task for a starter. Now that you have learned which algorithm to use, I would like to give my 2 cents list of things to remember while you get started on the model building activity.

Start with Questions/Hypothesis Then Move to Data!

A434293_1_En_7_Fig1_HTML.jpg
Figure 7-1. Questions/Hypothesis to Data

Don’t jump into understanding the data before formulating the objective to be achieved using data. It is a good practice to start with a good list of questions, and work closely with domain experts to understand core issues and frame the problem statement. This will help you in choosing the right machine learning algorithm (supervised vs. unsupervised) then move onto understanding different data sources.

Don’t Reinvent the Wheels from Scratch

A434293_1_En_7_Fig2_HTML.jpg
Figure 7-2. Don’t reinvent the wheel

Machine learning open source community is very active, there are plenty of efficient tools available, and lot more are being developed/released often, so do not try to reinvent the wheel in terms of solutions/algorithms/tools unless required. Try to understand what solutions exist in the market before venturing into building something from scratch.

Start with Simple Models

A434293_1_En_7_Fig3_HTML.jpg
Figure 7-3. Start with simple model

Always start with simple models (such as regressions), as these can be explained easily in layman terms to any non-techie people. This will help you and the subject matter experts to understand the variable relationships and gain confidence in the model. Further it will significantly help you to create the right features. Move to complex models only if you see a noteworthy increase in the model performance.

Focus on Feature Engineering

A434293_1_En_7_Fig4_HTML.jpg
Figure 7-4. Feature engineering is an art

Relevant features lead to efficient models, not more features! Note that including a large number of features might lead to an over-fitting problem. Including relevant features in the model is the key to building an efficient model. Remember that the feature engineering part is talked about as an art form and is the key differentiator in competitive machine learning. The right ingredients mixed to the right quantity are the secret for tasty food, similarly passing the relevant/right features to the machine learning algorithm is the secret to efficient model.

Beware of Common ML Imposters

Carefully handle some of the common machine learning imposters such as data quality issues (such as missing data, outliers, categorical data, scaling), imbalanced dataset for classification, over-fitting, and under-fitting. Use the appropriate techniques discussed in Chapter 3 for handling data quality issues and techniques discussed in Chapter 4, model diagnosis and tuning, such as ensemble techniques, and hyperparameter tuning to improve the model performance. To get started on real life use cases, I encourage you to try using the dataset or the problem statements provided by various online forums such as UCI Machine Learning Repository, Kaggle etc. As it goes “if you want to go fast go alone, if you want to go far go together”, so remember that using a single machine learning algorithm you can solve a given problem quickly, however using ensemble or stacking techniques will give you the edge in achieving the greatest results possible.

Happy Machine Learning

I hope this expedition of machine learning in simplified six steps has been worthwhile, and I hope this helps you to start a new journey of applying them on real-world problems. I wish you all the very best and success for your further quests.