The Awakening of Artificial Intelligence (AI)
Factors that have helped renew progress in AI are faster, cheaper and more powerful computers. Progress came also with Big Data, with exponential growth, availability of data, and growing understanding of the potential value of such data – images, text, mapping data, and so on. With these computing breakthroughs, neural networks were revisited, and they could be made huge.
ImageNet Large Scale Visual Recognition Challenge (ILSVRC), set up to encourage computer-vision breakthroughs, is the world’s top computer vision contest. To compare models, ImageNet examines how often the model fails to predict the correct answer in their top five guesses (the top-5 error rate), in descending order of confidence. ILSVRC 2012 brought a small research group led by Geoffrey Hinton at the University of Toronto to everyone’s attention. The group had parallelised its convolutional neural network AlexNet on Nvidia GPUs, and won the contest by getting an error rate of 16.4% with the given data for the top-5 guesses, while the error rate of the second-best system was around 26%. ILSVRC 2012 marks the turning point for neural nets research, heralding the abandonment of feature engineering and the adoption of feature learning in the form of deep learning. Since then, remarkable progress has been made by the community and the pinnacle was reached by Microsoft in 2015 when it achieved a top-5 error rate of 3.57%.
Deep Learning & Neural Networks
Deep learning is nothing but a rebranded name for a family of deep neural networks – complex mathematical systems that can learn tasks by analysing vast amounts of data. Deep learning thus is a class of learning procedures based on the neural network model. It has facilitated image recognition, object detection, video labelling, and activity recognition, and is making significant progress into other areas of perception, such as audio, speech, language translation and natural language processing (NLP). According to Schmidhuber (2015), Rina Dechter (1986) introduced the expression ‘Deep Learning’ to the machine learning community at the conference of the Association for the Advancement of Artificial Intelligence (AAAI). Later, it became widely accepted when Igor Aizenberg et al. (2000) introduced it to ANNs. (see
Deep learning can circumvent the challenges of feature engineering that are critical for symbolic-based machine learning. The remarkable thing about deep learning is that no human is required to program a computer because the deep learning models are capable of learning the features automatically by themselves. Therefore, programmers just need to feed the computer a learning algorithm, expose it to terabytes of input data to train it, and then allow the computer to figure out itself how to recognise the desired objects. In short, such computers can now learn by themselves. This makes deep learning an extremely powerful tool for modern machine learning. Deep learning methods are beating traditional symbolic-based machine learning approaches on virtually every metric.
Further evidence that deep learning is on the rise is the amount of capital being invested, the number of people who are choosing it as their area of study, and the number of leading technology companies that are making AI the core of their strategic plans. It is revolutionising many areas of machine perception, with the potential to impact people’s everyday experiences. Some even believe that AI could be used to mimic human common sense someday.
Great Achievements in Deep Learning
One of the landmark achievements of deep learning is Google DeepMind’s AlphaGo beating the world legendary Go champion Lee Sedol four games to one in 2016. The game of Go was invented in China more than 2,500 years ago and is believed to be the oldest board game still played today. Its simple rules and deep strategies have intrigued everyone from emperors to peasants for generations. The goal is to gain more territory than the opponent, but it is very complex and possesses more possibilities than the total number of atoms in the universe. The AlphaGo computer program uses deep neural networks, reinforcement learning and a Monte Carlo tree search to find its moves based on knowledge previously learned by an artificial neural network through extensive training, both from human and computer.
Deep learning can be trained with supervised learning, unsupervised learning, or reinforcement learning. Unsupervised learning is the ultimate goal for the future and reinforcement learning is gaining more ground, especially in gaming and robotics. But supervised learning is the champion today: it works by showing the network a bunch of things with labels saying what they are, and getting the network to learn and classify future things without labels.
Deep neural networks generally work as a two-stage process (see figure below).
- A neural network is trained with its parameters determined using labelled examples of inputs and desired output.
- The network is deployed to run inference, using its previously trained parameters, to classify, recognise, and process new inputs.
When receiving an input image, the network translates it into a hierarchical level of features, and the neurons in each layer of the network are tuned to recognise certain patterns in the features. Low-level neurons recognise things like edges or basic shapes, then pass the data to the next layer. This layer of neurons does its own task, and passes processed data on. Neurons in high-level layers can ‘see’ objects – say, a cat or a dog. Each layer communicates forward with the one next to it, and as information travels down the network, some feature extraction processes take place automatically. At the end, the network comes up with an output – a prediction of what is in the image.
The Wrong Way Road Sign
Return to the Wrong Way sign example in Part I: the image is split into a number of tiles that are inputted into the first layer of the neural network. The neurons examine each tile’s attributes: for the Wrong Way sign, its rectangular shape, red colour, eight letters, and its size. Each neuron assigns a weighting to its input, where the weight tells how correct or incorrect it is relative to the task being performed. As data are passed forwards, the neural network’s task is to predict with some probability, based on the total of the weightings, whether the sign is Wrong Way or not. Perhaps the network is 90% confident the image is a Wrong Way sign, 7% confident it is a Bicycle Wrong Way sign, and 2% confident it is a Danish flag on a flagpole, and so on.
While the neural network is being trained, the odds are in favour of it predicting incorrectly. Therefore, it needs lots of training, using millions of images, until the weightings of the neuron inputs are tuned so precisely that it gets the answer right almost every time. At that point the neural network has taught itself what a Wrong Way sign looks like.
Image Classification, Localisation, Detection, and Segmentation
For humans, a quick glance at an image is sufficient to point out and describe an immense amount of details about the visual scene. When we look at an image we are immediately able to characterise the objects and give each a label. These skills at quickly recognising patterns, generalising from prior knowledge, and adapting to different image environments are ones that computers do not easily share with us. However, the success of the AlexNet in 2012 spurred research and great achievements in object localisation, detection and segmentation.
- Classification: The examples in the introductory image address image classification, which is the task of taking an input image and outputting its class (cat, dog, etc.) or a probability of classes that best describes the image. It works well when the image contains only one object.
- Localisation: Object localisation not only produces a class label but also a bounding box that describes where the object is in the image.
- Detection: In the task of object detection, localisation needs to be done on all of the objects in the picture, resulting in multiple bounding boxes and multiple class labels.
- Segmentation: Finally, we also have object segmentation where the task is to output a class label as well as an outline of every object in the input image.
We end by referring the reader to Karpathy and Li (2015), who combine Convolutional Neural Networks (CNNs) and bidirectional Recurrent Neural Networks (RNNs) to generate natural language descriptions of different image regions. Basically, their model is able to take in an image, and output a concept, as demonstrated in the previous image.
References: Part I and Part II
Copeland M 2016 What’s the Difference Between Artificial Intelligence, Machine Learning, and Deep Learning? https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/. Accessed April 23, 2017.
Dechter R 1986 Learning while searching in constraint-satisfaction-problems: AAAI-86 Proceedings 178-183.
Dorrington K P and C A Link 2004 Genetic algorithm/neural-network approach to seismic attribute selection for well-log prediction: Geophysics 69 212-221.
Hof R D 2017 Deep Learning: MIT Technology Review https://www.technologyreview.com/s/513696/deep-learning/. Accessed May 4, 2017.
- Karpathy A and F-F Li 2015 Deep Visual-Semantic Alignments for Generating Image Descriptions: https://arxiv.org/pdf/1412.2306v2.pdf
Krizhevsky A, I Sutskever and G E Hinton 2012 ImageNet Classification with Deep Convolutional Neural Networks: https://pdfs.semanticscholar.org/8abe/0abc1a549504f4002b3e66b5f821de820abb.pdf?_ga=1.22625915.1555563582.1493141855
Roden R and D Sacrey 2016 Seismic Interpretation with Machine Learning: GeoExpro 13 (6).
Schmidhuber J 2015 Critique of Paper by "Deep Learning Conspiracy" (Nature 521 p 436): https://plus.google.com/100849856540000067209/posts/9BDtGwCDL7D.
Silver D et al 2016 Mastering the game of Go with deep neural networks and tree search: Nature 529 484-489.