A.2 Artificial Intelligence
Artificial Intelligence research began shortly after World War II [
22
]. Early work was based on knowledge of the structure of the brain, propositional logic, and Turing’s theory of computation. Warren McCulloch and Walter Pitts created a mathematical formulation for neural networks based on threshold logic. This allowed neural network research to be split into two approaches. One centered on biological processes in the brain and the other on the application of neural networks to artificial intelligence. It was demonstrated that any function could be implemented through a set of such neurons and that a neural net could learn. In 1948, Weiner’s book, “Cybernetics,” was published, which described concepts in control, communications, and statistical signal processing. The next major step in neural networks was Hebb’s book in 1949, ”The Organization of Behavior,” connecting connectivity with learning in the brain. His book became a source of learning and adaptive systems. Marvin Minsky and Dean Edmonds built the first neural computer in 1950.
In 1956, Allen Newell and Herbert Simon designed a reasoning program, the Logic Theorist (LT), which worked non-numerically. The first version was hand simulated using index cards. It could prove mathematical theorems and even improve on human derivations. It solved 38 of the 52 theorems in Principia Mathematica. LT employed a search tree with heuristics to limit the search. LT was implemented on a computer using IPL, a programming language that led to Lisp, a programming language that is discussed below.
Blocks World was one of the first attempts to demonstrate general computer reasoning. The blocks world was a micro world. A set of blocks would sit on a table, some sitting on other blocks. The AI systems could rearrange blocks in certain ways. Blocks under other blocks could not be moved until the block on top was moved. This is not unlike the Towers of Hanoi problem. Blocks World was a spectacular advancement as it showed that a machine could reason at least in a limited environment. Blocks World was an early example of the use of machine vision. The computer had to process an image of Blocks World and determine what was a block and where they were located.
Blocks World and Newell’s and Simons’ LT was followed up by the General Problem Solver (GPS). It was designed to imitate human problem solving methods. Within its limited class of puzzles it could solve them much like a human. Although GPS solved simple problems such as the Towers of Hanoi, Figure
A.1
, it could not solve real-world problems because the search was lost in a combinatorial explosion that represented the enumeration of all choices in a vast decision space.
Figure A.1Towers of Hanoi. The disks must be moved from the first peg to the last without ever putting a bigger diameter disk on top of a smaller diameter disk.
In 1959, Herman Gelernter wrote the Geometry Theorem prover, which could prove theorems that were quite tricky. The first game-playing programs were written at this time. In 1958, John McCarthy invented the language Lisp (LISt Processing), which was to become the main AI language. It is now available as Scheme and Common Lisp. Lisp was introduced only one year after FORTRAN. A typical Lisp expression is:
(defun
sqrt
-iter (guess x)
(
if
(good-enough-p guess x)
guess
(
sqrt
-iter (improve guess x) x)))
This computes a square root through recursion. Eventually, dedicated Lisp machines were built, but they went out of favor when general purpose processors became faster.
Time sharing was invented at MIT to facilitate AI research. Professor McCarthy created a hypothetical computer program, Advice Taker, a complete AI system that could embody general world information. It would have used a formal language such as predicate calculus. For example, it could come up with a route to the airport from simple rules. Marvin Minsky arrived at MIT in 1958 and began working on micro-worlds. Within these limited domains AI could solve problems, such as closed-form integrals in calculus. Minsky wrote the book “Perceptrons” (with Seymour Papert), which was fundamental in the analysis of artificial neural networks. The book contributed to the movement toward symbolic processing in AI. The book noted that single neurons could not implement some logical functions such as exclusive-or and erroneously implied that multi-layer networks would have the same issue. It was later found that three-layer networks could implement such functions.
More challenging problems were tried in the 1960s. Limitations in the AI techniques became evident. The first language translation programs had mixed results. Trying to solve problems by working through massive numbers of possibilities (such as in chess) ran into computation problems. Human chess play has many forms. Some involve memorization of patterns, including openings where the board positions are well-defined and end games when the number of pieces is relatively low. Positional play involves seeing patterns on the board through the human brain’s ability to process patterns. Someone who is a good positional player will arrange her pieces on the board so that the other player’s options are restricted. Localized pattern recognition is seen in mate-in-n problems. Human approaches are not really used in computer chess. Computer chess programs have become very capable primarily because of faster processors and the ability to store openings and end games. Multi-layer neural networks were discovered in the 1960s, but not really studied until the 1980s.
In the 1970s, self-organizing maps using competitive learning were introduced [
12
]. A resurgence in neural networks happened in the 1980s. Knowledge-based systems were also introduced in the 1980s. From Jackson [
14
],
An expert system is a computer program that represents and reasons with knowledge of some specialized subject with a view to solving problems or giving advice.
This included expert systems that could store massive amounts of domain knowledge. These could also incorporate uncertainty in their processing. Expert systems are applied to medical diagnoses and other problems. Unlike AI techniques up to this time, expert systems could deal with problems of realistic complexity and attain high performance. They also explain their reasoning. This last feature is critical in their operational use. Sometimes these are called knowledge-based systems. A well-known open source expert system is CLIPS (C Language Integrated Production System).
Back propagation for neural networks was re-invented in the 1980s leading to renewed progress in this field. Studies began both of human neural networks (i.e., the human brain) and the creation of algorithms for effective computational neural networks. This eventually led to deep learning networks in machine learning applications.
Advances were made in the 1980s as AI researchers began to apply rigorous mathematical and statistical analysis to develop algorithms. Hidden Markov Models were applied to speech. A Hidden Markov Model is a model with unobserved (i.e., hidden) states. Combined with massive databases, they have resulted in vastly more robust speech recognition. Machine translation has also improved. Data mining, the first form of machine learning as it is known today, was developed. Chess programs improved initially through the use of specialized computers, such as IBM’s Deep Blue. With the increase in processing power, powerful chess programs that are better than most human players are now available on personal computers.
The Bayesian network formalism was invented to allow for the rigorous application of uncertainty in reasoning problems. In the late 1990s, intelligent agents were introduced. Search engines, bots, and web site aggregators are examples of intelligent agents used on the Internet. Figure
A.2
gives a timeline of selected events in the history of autonomous systems.
Figure A.2Artificial intelligence timeline.
Today, the state of the art in AI includes autonomous cars, speech recognition, planning and scheduling, game playing, robotics, and machine translation. All of these are based on AI technology. They are in constant use today. You can take a PDF document and translate it into any language using Google translate. The translations are not perfect. One certainly would not use them to translate literature.
Recent advances in AI include IBM’s Watson. Watson is a question-answering computing system with advanced natural language processing and information retrieval from massive databases. It defeated champion Jeopardy players in 2011. It is currently being applied to medical problems and many other complex problems.
A.3 Learning Control
Adaptive or intelligent control was motivated in the 1950s [
2
] by the problems of aircraft control. Control systems of that time worked very well for linear systems. Aircraft dynamics could be linearized about a particular speed. For example, a simple equation for total velocity in level flight is:
This says the mass
m
times the change in velocity per time,

, equals the thrust
T
, the force from the aircraft engine, minus the drag.
C
D
is the aerodynamic drag coefficient and
S
is the wetted area (i.e., the area that causes drag such as the wings and fuselage). The thrust is used for control. This is a nonlinear equation in velocity
v
because of the
v
2
term. We can linearize it around a particular velocity
v
s
so that
v
=
v
δ
+
v
s
and get:
This equation is linear in
v
δ
. We can control velocity with a simple thrust control law:
where

.
c
is the damping coefficient.
ρ
is the atmospheric density and is a nonlinear function of altitude. For the linear control to work the control must be adaptive. If we want to guarantee a certain damping value, which is the quantity in parentheses:
we need to know
ρ
,
C
D
,
S
, and
v
s
. This approach leads to a gain scheduling control system where we measure the flight condition – altitude, velocity – and schedule the linear gains based on the where the aircraft is in the gain schedule.
In the 1960s, progress was made on adaptive control. State Space theory was developed, which made it easier to design multi-loop control systems, that is, control systems that controlled more than one state at a time with different control loops. The general space controller is:
where
A
,
B
,
C
, and
D
are matrices,
x
is the state,
y
is the measurement, and
u
is the control input. A state is a quantity that changes with time that is needed to define what the system is doing. For a point mass that can only move in one direction, the position and velocity make up the two states. If
A
completely models the system and
y
contains all of the information about the state vector
x
, then this system is stable. Full state feedback would be
x
= −
Kx
, where
K
can be computed to have guaranteed phase and gain margins (that is, tolerance to delays and tolerance to amplification errors). This was a major advance in control theory. Before this, multi-loop systems had to be designed separately and combined very carefully.
Learning control and adaptive control were found to be realizable from a common framework. The Kalman Filter, also known as linear quadratic estimation, was introduced.
Spacecraft required autonomous control, since they were often out of contact with the ground or the time delays were too long for effective ground supervision. The first digital autopilots were on the Apollo spacecraft, which first flew in 1968 on Apollo 7. Don Eyles book [
9
] gives the history of the Lunar Module Digital Autopilot. Geosynchronous communications satellites were automated to the point where one operator could fly a dozen satellites.
Advances in system identification, the process of just determining parameters of a system (such as the drag coefficient above) were made. Adaptive control was applied to real problems. Autopilots have progressed from fairly simple mechanical pilot augmentation systems to sophisticated control systems than can takeoff, cruise, and land under computer control.
In the 1970s, proof of adaptive control stability was made. Stability of linear control systems was well established, but adaptive systems are inherently nonlinear. Universally stabilizing controllers were studied. Progress was made in the robustness of adaptive control. Robustness is the ability of a system to deal with changes in parameters that were assumed to be known, sometimes due to failures in the systems. It was in the 1970s that digital control became widespread, replacing traditional analog circuits composed of transistors and operational amplifiers.
Adaptive controllers started to appear commercially in the 1980s. Most modern single-loop controllers have some form of adaptation. Adaptive techniques were also found to be useful for tuning controllers.
More recently, there has been a melding of artificial intelligence and control. Expert systems have been proposed that determine what algorithms (not just parameters) to use depending on the environment. For example, during a winged reentry of a glider, the control system would use one system in orbit, a second at high altitudes, a third during high Mach (Mach is the ratio of the velocity to the speed of sound) flight, and a fourth at low Mach numbers and during landing. An F3D Skynight used the Automatic Carrier Landing System on 12 August 1957. This was the first shipboard test of the landing system designed to land aircraft on board autonomously. Naira Hovakimyan (U of IL U-C), and also Nahn Nguyen (NASA) were pioneers in this area. Adaptive control was demonstrated on sub-scale F-18s, which controlled and landed the aircraft after most of one wing was lost!
A.4 Machine Learning
Machine learning started as a branch of artificial intelligence. However, many techniques are much older. Thomas Bayes created Bayes theorem in 1763. Bayes theorem is:
which is just the probability of
A
i
given
B
. This assumes that
P
(
B
)≠0. In the Bayesian interpretation, the theorem introduces the effect of evidence on belief. One technique, regression, was discovered by Legendre in 1805 and Gauss in 1809.
As noted in the section on artificial intelligence, modern machine learning began with data mining, which is the process of getting new insights from data. In the early days of AI, there was considerable work on machines learning from data. However, this lost favor and in the 1990s it was reinvented as the field of machine learning. The goal was to solve practical problems of pattern recognition using statistics. This was greatly aided by the massive amounts of data available online along with the tremendous increase in processing power available to developers. Machine learning is closely related to statistics.
In the early 1990s, Vapnik and co-workers invented a computationally powerful class of supervised learning networks known as support vector machines (SVMs). These networks could solve problems of pattern recognition, regression, and other machine learning problems.
A growing application of machine learning is autonomous driving. Autonomous driving makes use of all aspects of autonomous learning, including controls, artificial intelligence and machine learning. Machine vision is used in most systems as cameras are inexpensive and provide more information than lidar, radar or sonar (which are also useful). It isn’t possible to build really safe autonomous driving systems without learning through experience. Thus, designers of such systems put their cars on the roads and collect experiences that are used to fine tune the system.
Other applications include high-speed stock trading and algorithms to guide investments. These are under rapid development and are now available to the consumer. Data mining and machine learning are in use to predict events, both human and natural. Searches on the internet have been used to track disease outbreaks. If there are a lot of data, and the internet makes gathering massive data easy, then you can be sure that machine learning techniques are being applied to mine the data.