Chapter 14. Conclusion

Artificial intelligence is having a hype cycle not seen in the tech world since the advent of the internet age 20 years ago.1 However, that does not mean the hype is not warranted or - to some degree - justified.

While the artificial intelligence and machine learning work in prior decades was mostly theoretical and academic in nature - with few successful commercial applications - the work in this space over the past decade has been much more applied and industry-focused, led by the likes of Google, Facebook, Amazon, Microsoft, and Apple.

The focus on developing machine learning applications for narrowly defined tasks (i.e., weak or narrow AI) rather than on more ambitious tasks (i.e., strong or general AI) has made the field much more attractive to investors that want to achieve good returns on a shorter 7-to-10 year time frame.

More attention and capital from investors, in turn, has made the field more successful, both in progress towards narrow AI as well as in laying the building blocks for strong AI.

Of course, capital is not the only catalyst. The rise of big data, the advancements in computer hardware (especially the rise of GPUs, led by Nvidia, for training deep neural networks), and the breakthroughs in algorithm research and development have played equally meaningful roles in contributing to the recent successes of artificial intelligence.

Like all hype cycles, the current cycle may lead to some disappointment eventually, but so far the progress in the field has astonished many in the science community and has captured the imagination of an increasingly mainstream audience.

Supervised Learning

To date, supervised learning has been responsibile for the majority of the commercial successes in machine learning. These successes can be broken down by data type.

With images, we have optical character recongition, image classification, and facial recognition, to name a few. For example, Facebook automatically tags faces in new photographs based on how similar the faces look to previously labeled faces, leveraging Facebook’s database of existing photographs.

With video, we have self-driving cars, which are already operating on roads across the United States today. Major players such as Google, Tesla, and Uber have invested very heavily into autonomous vehicles.

With speech, we have speech recognition, fueled by assistants such as Siri, Alexa, Google Assistant, and Cortana.

With text, we have the classic example of email spam filtering but also machine translation (i.e., Google Translate), sentiment analysis, syntax analysis, entity recognition, language detection, and question answering. On the back of these successes, we have seen a proliferation of chatbots in the past few years.

Supervised learning also performs well at time series prediction, which has many applications in fields such as finance, healthcare, and ad tech.

Of course, supervised learning applications are not restricted to working with only one data type at a time. For example, video captioning systems apply machine learning on videos to generate text captions.

Unsupervised Learning

Unsupervised learning has not had nearly as many successes to date as supervised learning has had. But, its potential is immense. Most of the world’s data is unlabeled, not labeled. To apply machine learning at scale to tasks that are more ambitious in scope than the ones supervised learning has already solved, we will need to work with both labeled and unlabeled data.

Unsupervised learning is very good at finding hidden patterns by learning the underlying structure in unlabeled data. Once the hidden patterns are uncovered, unsupervised learning is able to group the hidden patterns based on similarity such that similar patterns are grouped together.

Once the patterns are grouped in this way, humans can sample a few patterns per group and provide meaningful labels. If the groups are well-defined (i.e., the members are homogenous and distinctly different from members in other groups), the few labels that the humans provide by hand can be applied to the other (yet unlabeled) members of the group. This process leads to very fast and efficient labeling of previously unlabeled data.

In other words, unsupervised learning enables the successful application of supervised learning methods. This synergy between unsupervised leanring and supervised learning - also known as semi-supervised learning - may fuel the next wave in successful machine learning applications.

Scikit-Learn

These themes from unsupervised learning should be very familar to you by now. But, let’s review everything we’ve covered so far.

In Chapter 3, we explored how to use dimensionality reduction algorithms to reduce the dimensionality of data by learning the underlying structure, keeping only the most salient features, and mapping the features into a lower dimensional space.

Once the data is mapped to a lower dimensional space, it becomes much easier to uncover the hidden patterns in the data. In Chapter 4, we demonstrated this by building an anomaly detection system, separating normal credit card transactions from abnormal ones.

In this lower dimensional space, it is also easier to group similar points together; this is known as clustering, which we explored in Chapter 5. A successful application of clustering is group segmentation, separating items based on how similar they are to one another and how different they are to others. We performed this on borrowers filing loan applications in Chapter 6.

In Chapter 13, we expanded clustering to time series data for the first time and explored various time series clustering methods. We performed many experiments and highlighted just how important it is to have a wide arsenal of machine learning methods available because no one method works best for all datasets.

Chapter 3 through Chapter 6 concluded the unsupervised learning using Scikit-Learn portion of the book.

TensorFlow and Keras

Chapter 7 through Chapter 12 explored unsupervised learning using TensorFlow and Keras.

First, we introduced neural networks and the concept of representation learning. In Chapter 7, we used autoencoders to learn new, more condensed representations from original data - this is yet another way unsupervised learning learns the underlying structure in data to extract insight.

In Chapter 8, we applied autoencoders to the credit card transaction dataset to build a fraud detection solution. And, very importantly, in Chapter 9, we combined an unsupervised approach with a supervised approach to improve the standalone unsupervised learning-based credit card fraud detection solution we built in Chapter 8, highlighting the potential synergy between unsupervised and supervised learning models.

In Chapter 10, we introduced generative models for the first time, starting with the restricted Boltzmann machine. We used these to build a movie recommender system, a very light version of the type of recommender systems used by the likes of Netflix and Amazon.

In Chapter 11, we moved into from shallow to deep neural networks, and we built a more advanced generative model by stacking multiple restricted Boltzmann machines together. With this so-called deep belief network, we generated synthetic images of digits to augment the existing MNIST dataset and build a better image classification system. Again, this highlights the potential of using unsupervised learning to improve a supervised solution.

In Chapter 12, we moved to another class of generative models - the one most in vogue today - called generative adversarial networks. We used these to generate more synthetic images of digits similar to those in the MNIST image dataset.

Reinforcement Learning

In this book, we did not cover reinforcement learning in any detail, but it is yet another area of machine learning that is receiving increased attention, especially after its recent successes in fields such as board and video game playing.

Most notably, Google DeepMind introduced its Go software AlphaGo to the world a few years ago, and AlphaGo’s historic victory against the then-world champion Go player Lee Sedol in March 2016 - a feat many expected would take another entire decade for AI to achieve - helped capture the world’s attention to just how much progress had been made in the field of artificial intelligence.

More recently, Google DeepMind has blended reinforcement learning with unsupervised learning to develop an even better version of its AlphaGo software. Called AlphaGo Zero, this software does not use data from human games at all.

Such successes from marrying different branches of machine learning corroborates a major theme of this book - the next wave of successes in machine learning will be led by finding ways to work with unlabeled data to improve existing machine learning solutions that today rely heavily on labeled datasets.

Most Promising Areas of Unsupervised Learning Today

We will conclude this book with the present and possible future state of unsupervised learning. Today, unsupervised learning has several successful applications in industry; at the top of this list are anomaly detection, dimensionality reduction, clustering, efficient labeling of unlabeled datasets, and data augmentation.

Unsupervising learning excels in identifying newly emerging patterns, especially when future patterns look very different from past patterns; in some fields, labels of past patterns have limited value in catching future patterns of interest.

For example, anomaly detection is used for identifying fraud of all times - credit card, debit card, wire, online, insurance, etc. - and for flagging suspicious transactions related to money laundering, terrorist financing, and human trafficking.

Anomaly detection is also used in cybersecurity solutions to identify and stop cyber attacks. Rules-based systems struggle to catch new types of cyber attacks so unsupervised learning is becoming a staple in this field.

Anomaly detection also excels at highlighting data quality issues; with anomaly detection, data analysts are able to pinpoint and address bad data capture much more efficiently.

Unsupervised learning also helps address one of the major challenges in machine learning: the curse of dimensionality. Data scientists typically have to select a subset of features to use in analyzing data and in building machine learning models because the full set of features is too large, making computation difficult if not intractable.

Unsupervised learning enables data scientists to not only work with the original feature set but also to supplement it with additional feature engineering - without fear of running into major computational challenges during model building.

Once the original plus engineerred feature set is ready, data scientists apply dimensionality reduction to remove redundant features and keep the most salient, uncorrelated ones for analysis and model building. This type of data compression is also useful as a pre-processing step in supervised machine learning systems (especially with video and images).

Unsupervised learning also helps data scientists and business people surface patterns to answer questions such as which customers are behaving in the most uncommon ways (i.e., in a way that is very different from the majority of the customers).

This insight comes from clustering similar points together, helping analysts perform group segmentation. Once distinct groups are identified, humans can explore what makes the groups special and distinctly different from other groups. Insight from this exercise could be applied to gain a deeper business understanding of what is happening and improve corporate strategy.

Clustering makes labeling unlabeled data considerably more efficient. Because similar data is grouped together, a human needs to label only a few of the points per cluster. Once a few points within each cluster are labeled, the other not yet labeled points could adopt the labels from the labeled points.

Finally, generative models can generate synthetic data to supplement existing datasets. We demonstrated this with our work on the MNIST dataset. The ability to create lots of new synthetic data - of many different data types such as images and text - is very powerful and is just beginning to be explored earnestly.

The Future of Unsupervised Learning

We are still very early in the current AI wave. Of course there have been major successes to date, but a lot of the AI world is built on hype and promise. There is a lot of potential that is yet to be realized.

The successes to date have been in mostly narrowly defined tasks led by supervised learning. As the current wave of AI matures, the hope is that we move from narrow AI tasks (such as image classification, machine translation, speech recognition, question and answering bots) to more ambitious strong AI (chatbots that can understand meaning in human language and converse naturally in the way a human would, robots that make sense of the physical world around them and operate in it without relying heavily on labeled data, self-driving cars that develop super-human driving performance, and AI that can exhibit human-level reasoning and creativity).

Many regard unsupervised learning as the key to develop AI of the strong type. Otherwise, AI will be shackled by the limits of how much labeled data we have.

One thing humans excel in - from birth - is learning to perform tasks without requiring many examples. For instance, a toddler is able to differentiate a cat from a dog with just a handful of examples. Today’s AI needs many more examples/labels.

Ideally, the AI could learn to separate images of different classes (i.e., a cat vs. a dog) with as few labels as possible, perhaps as little as one or none. To perform this type of one shot or zero shot learning requires more progress in the realm of unsupervised learning.

Also, most AI today is not creative. It is merely optimizing pattern recognition based on labels it has trained on. To build AI that is intuitive and creative, researchers will need to build AI that can make sense of lots of unlabeled data to find patterns that even humans would have not previously found.

Fortunately, there are some promising signs that AI is advancing ever so gradually to a stronger AI type.

Google DeepMind’s AlphaGo software is a case in point. The first version of AlphaGo to beat a human professional Go player - in October 2015 - relied on data from past Go games played by humans and machine learning methods that include reinforcement learning (such as the ability to look many, many moves ahead and determine which move improves the odds of winning most significantly).

This version of AlphaGo was very impressive, beating one of the world’s best Go players, Lee Sedol, in a high-profile best of five series in Seoul, South Korea in March 2016. But, the latest version of AlphaGo is even more remarkable.

The original AlphaGo relied on data and human expertise. The latest version of AlphaGo, called AlphaGo Zero, learned how to play and win Go from scratch, purely through self play.2 In other words, AlphaGo Zero did not rely on any human knowledge and achieved superhuman performance, beating the previous AlphaGo version one hundred to zero.3

Starting from knowing nothing about Go, AlphaGo Zero accumulated thousands of years of human knowledge in Go play in a matter of days. But, then it progressed further, beyond the realm of human-level proficiency. AlphaGo Zero discovered new knowledge and developed new unconventional winning strategies.

In other words, AlphaGo exercised creativity.

If AI continues to advance, fueled by the ability to learn from little to no prior knowledge (i.e. little to no labeled data), we will be able to develop AI that is capable of creativity, reasoning, and complex decisionmaking, areas that have been the sole domain of humans up until now.4

Final Words

We have just scratched the surface of unsupervised learning and its potential, but, I hope you have a better appreciation of what unsupervised learning is capable of and how it could be applied to machine learning systems you design.

At the very least, you should have a conceptual understanding of and hands-on experienced using unsupervised learning to uncover hidden patterns, gain deeper business insight, detect anomalies, cluster groups based on similarity, perform automatic feature extraction, and generate synthetic datasets from unlabeled datasets.

The future of AI is full of promise. Go build it.

1 According to PitchBook, venture capital investors invested over $10.8 billion in AI and machine learning companies in 2017, up from $500 million in 2010 and nearly double the $5.7 billion invested in 2016.

2 Here is more on AlphaGo Zero.

3 Please view the paper in Nature for more.

4 OpenAI has also had some notable successes in applying unsupervised learning for sentiment analysis and language understanding, both of which are essential building blocks for strong AI.