Algorithms for Walking, Running, Swimming, Flying, and Manipulation

© Russ Tedrake, 2016

How to cite these notes

**Note:** These are working notes used for a course taught at MIT and a free online course on MITx. They will be updated throughout the Spring 2016 semester.

This book is about building robots that move with speed, efficiency, and grace. I believe that this can only be achieve through a tight coupling between mechanical design, passive dynamics, and nonlinear control synthesis. Therefore, these notes contain selected material from dynamical systems theory, as well as linear and nonlinear control.

These notes also reflect a deep belief in computational algorithms playing an essential role in finding and optimizing solutions to complex dynamics and control problems. Algorithms play an increasingly central role in modern control theory; these days even rigorous mathematicians consider finding convexity in a problem (therefore making it amenable to an efficient computational solution) almost tantamount to an analytical result. Therefore, the notes necessarily also cover selected material from optimization theory, motion planning, and machine learning.

Although the material in the book comes from many sources, the presentation is targeted very specifically at a handful of robotics problems. Concepts are introduced only when and if they can help progress the capabilities we are trying to develop. Many of the disciplines that I am drawing from are traditionally very rigorous, to the point where the basic ideas can be hard to penetrate for someone that is new to the field. I've made a conscious effort in these notes to keep a very informal, conversational tone even when introducing these rigorous topics, and to reference the most powerful theorems but only to prove them when that proof would add particular insights without distracting from the mainstream presentation. I hope that the result is a broad but reasonably self-contained and readable manuscript that will be of use to any enthusiastic roboticist.

The material in these notes is organized into a few main parts. "Model Systems" introduces a series of increasingly complex dynamical systems and overviews some of the relevant results from the literature for each system. "Nonlinear Planning and Control" introduces quite general computational algorithms for reasoning about those dynamical systems, with optimization theory playing a central role. Many of these algorithms treat the dynamical system as known and deterministic until the last chapters in this part which introduce stochasticity and robustness. "Estimation and Learning" follows this up with techniques from statistics and machine learning which capitalize on this viewpoint to introduce additional algorithms which can operate with less assumptions on knowing the model or having perfect sensors. The book closes with chapters on "Applications and Extensions" and an "Appendix".

The order of the chapters was chosen to make the book valuable as a reference. When teaching the course, however, I take a spiral trajectory through the material, introducing robot dynamics and control problems one at a time, and introducing only the techniques that are required to solve that particular problem.

All of the examples and algorithms in this book, plus many more, are
now available as a part of our open-source software project:

Robots today move far too conservatively, and accomplish only a fraction of the tasks and achieve a fraction of the performance that they are mechanically capable of. In many cases, we are still fundamentally limited by control technology which matured on rigid robotic arms in structured factory environments. The study of underactuated robotics focuses on building control systems which use the natural dynamics of the machines in an attempt to achieve extraordinary performance in terms of speed, efficiency, or robustness.

Let's start with some examples, and some videos.

The world of robotics changed when, in late 1996, Honda Motor Co. announced that they had been working for nearly 15 years (behind closed doors) on walking robot technology. Their designs have continued to evolve, resulting in a humanoid robot they call ASIMO (Advanced Step in Innovative MObility). Honda's ASIMO is widely considered to be state of the art in walking robots, although there are now many robots with designs and performance very similar to ASIMO's. We will dedicate effort to understanding a few of the details of ASIMO when we discuss algorithms for walking... for now I just want you to become familiar with the look and feel of ASIMO's movements [watch the asimo video below now].

I hope that your first reaction is to be incredibly impressed with the
quality and versatility of ASIMO's movements. Now take a second look. Although
the motions are very smooth, there is something a little unnatural about ASIMO's
gait. It feels a little like an astronaut encumbered by a heavy space suit. In
fact this is a reasonable analogy... ASIMO is walking a bit like somebody that
is unfamiliar with his/her dynamics. Its control system is using high-gain
feedback, and therefore considerable joint torque, to cancel out the natural
dynamics of the machine and strictly follow a desired trajectory. This control
approach comes with a stiff penalty. ASIMO uses roughly 20 times the energy
(scaled) that a human uses to walk on the flat (measured by cost of
transport)

For contrast, let's now consider a very different type of walking robot,
called a passive dynamic walker (PDW). This "robot" has no motors, no
controllers, no computer, but is still capable of walking stably down a small
ramp, powered only by gravity [watch videos above now]. Most people will agree
that the passive gait of this machine is more natural than ASIMO's; it is
certainly more efficient. Passive walking machines have a long history - there
are patents for passively walking toys dating back to the mid 1800's. We will
discuss, in detail, what people know about the dynamics of these machines and
what has been accomplished experimentally. This most impressive passive dynamic
walker to date was built by Steve Collins in Andy Ruina's lab at
Cornell

Passive walkers demonstrate that the high-gain, dynamics-cancelling feedback approach taken on ASIMO is not a necessary one. In fact, the dynamics of walking is beautiful, and should be exploited - not cancelled out.

The story is surprisingly similar in a very different type of machine. Modern airplanes are extremely effective for steady-level flight in still air. Propellers produce thrust very efficiently, and today's cambered airfoils are highly optimized for speed and/or efficiency. It would be easy to convince yourself that we have nothing left to learn from birds. But, like ASIMO, these machines are mostly confined to a very conservative, low angle-of-attack flight regime where the aerodynamics on the wing are well understood. Birds routinely execute maneuvers outside of this flight envelope (for instance, when they are landing on a perch), and are considerably more effective than our best aircraft at exploiting energy (eg, wind) in the air.

As a consequence, birds are extremely efficient flying machines; some are
capable of migrating thousands of kilometers with incredibly small fuel
supplies. The wandering albatross can fly for hours, or even days, without
flapping its wings - these birds exploit the shear layer formed by the wind over
the ocean surface in a technique called dynamic soaring. Remarkably, the
metabolic cost of flying for these birds is indistinguishable from the baseline
metabolic cost

Birds are also incredibly maneuverable. The roll rate of a highly acrobatic
aircraft (e.g, the A-4 Skyhawk) is approximately 720 deg/sec

Although many impressive statistics about avian flight have been recorded,
our understanding is partially limited by experimental accessibility - it is
quite difficult to carefully measure birds (and the surrounding airflow) during
their most impressive maneuvers without disturbing them. The dynamics of a
swimming fish are closely related, and can be more convenient to study.
Dolphins have been known to swim gracefully through the waves alongside ships
moving at 20 knots

Classical control techniques for robotics are based on the idea that feedback can be used to override the dynamics of our machines. These examples suggest that to achieve outstanding dynamic performance (efficiency, agility, and robustness) from our robots, we need to understand how to design control system which take advantage of the dynamics, not cancel them out. That is the topic of this course.

Surprisingly, there are relatively few formal control ideas that consider "exploiting" the dynamics. In order to convince a control theorist to consider the dynamics (efficiency arguments are not enough), you have to do something drastic, like taking away his control authority - remove a motor, or enforce a torque-limit. These issues have created a formal class of systems, the underactuated systems, for which people have begun to more carefully consider the dynamics of their machines in the context of control.

According to Newton, the dynamics of mechanical systems are second order ($F = ma$). Their state is given by a vector of positions, $\bq$ (also known as the configuration vector), and a vector of velocities, $\dot{\bq}$, and (possibly) time. The general form for a second-order controllable dynamical system is: $$\ddot{\bq} = {\bf f}(\bq,\dot{\bq},\bu,t),$$ where $\bu$ is the control vector. As we will see, the dynamics for many of the robots that we care about turn out to be affine in commanded torque, so let's consider a slightly constrained form: \begin{equation}\ddot{\bq} = {\bf f}_1(\bq,\dot{\bq},t) + {\bf f}_2(\bq,\dot{\bq},t)\bu \label{eq:f1_plus_f2}.\end{equation}

Notice that whether or not a control system is underactuated may depend on
the state of the system or even on time, although for most systems (including all of the systems
in this book) underactuation is a global property of the system. We will refer to a system as
underactuated if it is underactuated in *any* state or time.

Consider the simple robot manipulator illustrated above. As described in Appendix A, the equations of motion for this system are quite simple to derive, and take the form of the standard "manipulator equations": $${\bf H}(\bq)\ddot\bq + {\bf C}(\bq,\dot\bq)\dot\bq + {\bf G}(\bq) = {\bf B}(\bq)\bu.$$ It is well known that the inertial matrix, ${\bf H}(\bq)$ is (always) uniformly symmetric and positive definite, and is therefore invertible. Putting the system into the form of equation \ref{eq:f1_plus_f2} yields: \begin{align*}\ddot{\bq} =& -{\bf H}^{-1}(\bq)\left[ {\bf C}(\bq,\dot\bq) \dot\bq + {\bf G}(\bq) \right]\\ &+ {\bf H}^{-1}(\bq) {\bf B}(\bq) \bu.\end{align*} Because ${\bf H}^{-1}(\bq)$ is always full rank, we find that a system described by the manipulator equations is fully-actuated if and only if ${\bf B}(\bq)$ is full row rank. For this particular example, $\bq = [\theta_1,\theta_2]^T$ and $\bu = [\tau_1,\tau_2]^T$, and ${\bf B}(\bq) = {\bf I}_{2 \times 2}$. The system is fully actuated.

I personally learn best when I can experiment and get some physical
intuition. The companion software for the
course should make it easy for you to see this system in action. To try
it, make sure you've installed `addpath_drake`

(possibly from your `startup.m`

) in the
`drake`

directory, then try the following lines (one at a time) from
the MATLAB command line:

```
% a platform independent way to cd into the example directory
cd(fullfile(getDrakePath,'examples','SimpleDoublePendulum'));
% first construct the "robot" or "plant" object
plant = DoublePendPlant; % check out the code in DoublePendPlant.m
% a visualizer will draw the robot
visualizer = DoublePendVisualizer(plant);
visualizer.inspector(); % (optional) simple gui to examine the joints
% simulate the robot from time=0 to time=5 seconds from random
% initial conditions taken from a Gaussian distribution (using randn).
% note that the first simulation you run will take longer because
% simulink needs to be started
trajectory = simulate(plant, [0 5], randn(4,1));
% play back the simulation using the cpu clock to get the timing right
visualizer.playback(trajectory);
% if you want to access the manipulator equations programmatically,
% you can use, e.g.:
[H,C_times_v,G,B] = plant.manipulatorEquations()
% which outputs using the abbreviations v for qdot, s1 for sin(q1), etc
```

Note that you don't have to write the dynamics yourself. The software supports a high-level description format for describing robots. We could have alternatively done:

```
cd(fullfile(getDrakePath,'examples','SimpleDoublePendulum'));
% construct the robot from a URDF high-level description file
plant = RigidBodyManipulator('SimpleDoublePendulum.urdf');
visualizer = plant.constructVisualizer();
% all of the other commands will work, e.g.:
trajectory = simulate(plant, [0 5], randn(4,1));
visualizer.playback(trajectory);
[H,C_times_v,G,B] = plant.manipulatorEquations()
```

Make sure you take a peek at the file `SimpleDoublePendulum.urdf`

.
Documentation for the input format is available on the drake wiki; it supports even quite complex
robots.

While the basic double pendulum is fully actuated, imagine the somewhat bizarre case that we have a motor to provide torque at the elbow, but no motor at the shoulder. In this case, we have $\bu = \tau_2$, and ${\bf B}(\bq) = [0,1]^T$. This system is clearly underactuated. While it may sound like a contrived example, it turns out that it is almost exactly the dynamics we will use to study as our simplest model of walking later in the class.

The matrix ${\bf f}_2$ in equation \ref{eq:f1_plus_f2} always has dim$[\bq]$ rows, and dim$[\bu]$ columns. Therefore, as in the example, one of the most common cases for underactuation, which trivially implies that ${\bf f}_2$ is not full row rank, is dim$[\bu] < $ dim$[\bq]$. This is the case when a robot has joints with no motors. But this is not the only case. The human body, for instance, has an incredible number of actuators (muscles), and in many cases has multiple muscles per joint; despite having more actuators than position variables, when I jump through the air, there is no combination of muscle inputs that can change the ballistic trajectory of my center of mass (barring aerodynamic effects). My control system is underactuated.

For completeness, let's generalize the definition of underactuation to systems beyond the second-order control affine systems.

It is easy to see that equation \ref{eq:underactuated_def} is a sufficient condition for underactuation. This definition can also be extended to discrete-time systems and/or differential inclusions.

A quick note about notation. When describing the dynamics of rigid-body systems in this class, I will use $\bq$ for configurations (positions), $\dot{\bq}$ for velocities, and use $\bx$ for the full state ($\bx = [\bq^T,\dot{\bq}^T]^T$). There is an important limitation to this convention, described in the Appendix, but it will keep the notes cleaner. Unless otherwise noted, vectors are always treated as column vectors. Vectors and matrices are bold (scalars are not).

Fully-actuated systems are dramatically easier to control than underactuated systems. The key observation is that, for fully-actuated systems with known dynamics (e.g., ${\bf f}_1$ and ${\bf f}_2$ are known for a second-order control-affine system), it is possible to use feedback to effectively change a nonlinear control problem into a linear control problem. The field of linear control is incredibly advanced, and there are many well-known solutions for controlling linear systems.

The trick is called feedback linearization. When ${\bf f}_2$ is full row
rank, it is invertible. Consider the nonlinear feedback law: $$\bu = {\bf
\pi}(\bq,\dot\bq,t) = {\bf f}_2^{-1}(\bq,\dot\bq,t) \left[ \bu' - {\bf
f}_1(\bq,\dot\bq,t) \right],$$ where $\bu'$ is some additional control input.
Applying this feedback controller to equation \ref{eq:f1_plus_f2} results in the
linear, decoupled, second-order system: $$\ddot{\bq} = \bu'.$$ In other words,
if ${\bf f}_1$ and ${\bf f}_2$ are known and ${\bf f}_2$ is invertible, then we
say that the system is "feedback equivalent" to $\ddot{\bq} = \bu'$. There are
a number of strong results which generalize this idea to the case where ${\bf
f}_1$ and ${\bf f}_2$ are estimated, rather than known (e.g,

Let's say that we would like our simple double pendulum to act like a simple
single pendulum (with damping), whose dynamics are given by: \begin{align*}
\ddot \theta_1 &= -\frac{g}{l}\cos\theta_1 -b\dot\theta_1 \\ \ddot\theta_2 &= 0.
\end{align*} This is easily achieved using

Since we are embedding a nonlinear dynamics (not a linear one), we refer to this as "feedback cancellation", or "dynamic inversion". This idea can, and does, make control look easy - for the special case of a fully-actuated deterministic system with known dynamics. For example, it would have been just as easy for me to invert gravity. Observe that the control derivations here would not have been any more difficult if the robot had 100 joints.

```
cd(fullfile(getDrakePath,'examples','SimpleDoublePendulum'));
runSimplePend; % feedback linearization to simulate a pendulum
runSimplePendInv; % and a gravity-inverted pendulum
```

The underactuated systems are not feedback linearizable. Therefore, unlike fully-actuated systems, the control designer has no choice but to reason about the nonlinear dynamics of the plant in the control design. This dramatically complicates feedback controller design.

Although the dynamic constraints due to missing actuators certainly embody the spirit of this course, many of the systems we care about could be subject to other dynamic constraints as well. For example, the actuators on our machines may only be mechanically capable of producing some limited amount of torque, or there may be a physical obstacle in the free space with which we cannot permit our robot to come into contact with.

In practice it can be useful to separate out constraints which depend only on the input, e.g. $\phi(\bu)\le0$, such as actuator limits, as they can often be easier to handle than state constraints. An obstacle in the environment might manifest itself as one or more constraints that depend only on position, e.g. $\phi(\bq)\le0$.

By our generalized definition of underactuation, we can see that input and state constraints can also cause a system to be underactuated.

Input and state constraints can complicate control design in similar ways to having an insufficient number of actuators, (i.e., further limiting the set of the feasible trajectories), and often require similar tools to find a control solution.

You might have heard of the term "nonholonomic system" (see e.g.

Contrast the wheeled robot example with a robot on train tracks. The train tracks correspond to a holonomic constraint: the track constraint can be written directly in terms of the configuration ${\bf q}$ of the system, without using the velocity ${\bf \dot{q}}$. Even though the track constraint could also be written as a differential constraint on the velocity, it would be possible to integrate this constraint to obtain a constraint on configuration. The track restrains the possible configurations of the system.

A nonholonomic constraint like the no-side-slip constraint on the wheeled vehicle certainly results in an underactuated system. The converse is not necessarily true—the double pendulum system which is missing an actuator is underactuated but would not typically be called a nonholonomic system. Note that the Lagrangian equations of motion are a constraint of the form \[{\bf \phi}({\bf q},{\bf \dot{q}},{\bf \ddot{q}},t) = 0,\] so do not qualify as a nonholonomic constraint.

The control of underactuated systems is an open and interesting problem in controls. Although there are a number of special cases where underactuated systems have been controlled, there are relatively few general principles. Now here's the rub... most of the interesting problems in robotics are underactuated:

- Legged robots are underactuated. Consider a legged machine with $N$ internal joints and $N$ actuators. If the robot is not bolted to the ground, then the degrees of freedom of the system include both the internal joints and the six degrees of freedom which define the position and orientation of the robot in space. Since $\bu \in \Re^N$ and $\bq \in \Re^{N+6}$, equation \ref{eq:underactuated_def} is satisfied.
- (Most) Swimming and flying robots are underactuated. The story is the same here as for legged machines. Each control surface adds one actuator and one DOF. And this is already a simplification, as the true state of the system should really include the (infinite-dimensional) state of the flow.
- Robot manipulation is (often) underactuated. Consider a fully-actuated robotic arm. When this arm is manipulating an object with degrees of freedom (even a brick has six), it can become underactuated. If force closure is achieved, and maintained, then we can think of the system as fully-actuated, because the degrees of freedom of the object are constrained to match the degrees of freedom of the hand. That is, of course, unless the manipulated object has extra DOFs.

Even control systems for fully-actuated and otherwise unconstrained systems can be improved using the lessons from underactuated systems, particularly if there is a need to increase the efficiency of their motions or reduce the complexity of their designs.

This course is based on the observation that there are new computational tools from optimization theory, control theory, motion planning, and even machine learning which can be used to design feedback control for underactuated systems. The goal of this class is to develop these tools in order to design robots that are more dynamic and more agile than the current state-of-the-art.

The target audience for the class includes both computer science and mechanical/aero students pursuing research in robotics. Although I assume a comfort with linear algebra, ODEs, and MATLAB, the course notes aim to provide most of the material and references required for the course.

Our goals for this chapter are modest: we'd like to understand the dynamics of a pendulum.

Why a pendulum? In part, because the dynamics of a majority of our multi-link robotics manipulators are simply the dynamics of a large number of coupled pendula. Also, the dynamics of a single pendulum are rich enough to introduce most of the concepts from nonlinear dynamics that we will use in this text, but tractable enough for us to (mostly) understand in the next few pages.

The Lagrangian derivation of the equations of motion (as described in the appendix) of the simple pendulum yields: \begin{equation*} m l^2 \ddot\theta(t) + mgl\sin{\theta(t)} = Q. \end{equation*} We'll consider the case where the generalized force, $Q$, models a damping torque (from friction) plus a control torque input, $u(t)$: $$Q = -b\dot\theta(t) + u(t).$$

Let us first consider the dynamics of the pendulum if it is driven in a
particular simple way: a torque which does not vary with time: \begin{equation}
ml^2 \ddot\theta + b\dot\theta + mgl \sin\theta = u_0. \end{equation}
You can experiment with this system in

```
cd(fullfile(getDrakePath,'examples','Pendulum'));
open_system('constantTorqueDemo');
% hit play to start the simulation, and click on the boxes labeled as
% 'slider' to control the torque and damping.
% if you get an error about not keeping up with real time, then
% double-click on the realtime block and reduce the realtime factor
% (e.g. to .5).
```

These are relatively simple differential equations, so if I give you $\theta(0)$ and $\dot\theta(0)$, then you should be able to integrate them to obtain $\theta(t)$... right? Although it is possible, integrating even the simplest case ($b = u = 0$) involves elliptic integrals of the first kind; there is relatively little intuition to be gained here.

This is in stark contrast to the case of linear systems, where much of our understanding comes from being able to explicitly integrate the equations. For instance, for a simple linear system we have $$\dot{q} = a q \quad \rightarrow \quad q(t) = q(0) e^{at},$$ and we can immediately understand that the long-term behavior of the system is a (stable) decaying exponential if $a<0$, an (unstable) growing exponential if $a>0$, and that the system does nothing if $a=0$. Here we are with certainly one of the simplest nonlinear systems we can imagine, and we can't even solve this system?

All is not lost. If what we care about is the long-term behavior of the
system, then there are a number of techniques we can apply. In this chapter, we
will start by investigating graphical solution methods. These methods are
described beautifully in a book by Steve Strogatz

Let's start by studying a special case -- intuitively when $b\dot\theta \gg ml^2\ddot\theta$ -- which via dimensional analysis (using the characteristic time constant $\sqrt{\frac{l}{g}}$ to match units) occurs when $b \sqrt\frac{g}{l} \gg ml^2$. This is the case of heavy damping, for instance if the pendulum was moving in molasses. In this case, the damping term dominates the acceleration term, and we have: $$ml^2 \ddot\theta + b\dot\theta \approx b\dot\theta = u_0 - mgl\sin\theta.$$ In other words, in the case of heavy damping, the system looks approximately first order. This is a general property of heavily damped systems, such as fluids at very low Reynolds number.

I'd like to ignore one detail for a moment: the fact that $\theta$ wraps around on itself every $2\pi$. To be clear, let's write the system without the wrap-around as: \begin{equation}b\dot{x} = u_0 - mgl\sin{x}.\label{eq:overdamped_pend_ct}\end{equation} Our goal is to understand the long-term behavior of this system: to find $x(\infty)$ given $x(0)$. Let's start by plotting $\dot{x}$ vs $x$ for the case when $u_0=0$:

The first thing to notice is that the system has a number of *fixed
points* or *steady states*, which occur whenever $\dot{x} = 0$. In
this simple example, the zero-crossings are $x^* = \{..., -\pi, 0, \pi, 2\pi, ...\}$.
When the system is in one of these states, it will never leave that state. If
the initial conditions are at a fixed point, we know that $x(\infty)$ will be at
the same fixed point.

Next let's investigate the behavior of the system in the local vicinity of
the fixed points. Examining the fixed point at $x^* = \pi$, if the system starts
just to the right of the fixed point, then $\dot{x}$ is positive, so the system
will move away from the fixed point. If it starts to the left, then $\dot{x}$
is negative, and the system will move away in the opposite direction. We'll
call fixed-points which have this property *unstable*. If we look at the
fixed point at $x^* = 0$, then the story is different: trajectories starting to
the right or to the left will move back towards the fixed point. We will call
this fixed point *locally stable*. More specifically, we'll distinguish
between three types of local stability:

- Locally stable
*in the sense of Lyapunov*(i.s.L.). A fixed point, $x^*$ is locally stable i.s.L. if for every small $\epsilon > 0$, I can produce a $\delta > 0$ such that if $\| x(0) - x^* \| < \delta$ then $\forall t$ $\| x(t) - x^*\| < \epsilon$. In words, this means that for any ball of size $\epsilon$ around the fixed point, I can create a ball of size $\delta$ which guarantees that if the system is started inside the $\delta$ ball then it will remain inside the $\epsilon$ ball for all of time. - Locally
*asymptotically stable*. A fixed point is locally asymptotically stable if $x(0) = x^* + \epsilon$ implies that $\lim_{t\rightarrow \infty} x(t) = x^*$. - Locally
*exponentially stable*. A fixed point is locally exponentially stable if $x(0) = x^* + \epsilon$ implies that $\| x(t) - x^* \| < Ce^{-\alpha t}$, for some positive constants $C$ and $\alpha$.

An initial condition near a fixed point that is stable in the sense of Lyapunov
may never reach the fixed point (but it won't diverge), near an asymptotically
stable fixed point will reach the fixed point as $t \rightarrow \infty$, and
near an exponentially stable fixed point will reach the fixed point with a
bounded rate. An exponentially stable fixed point is also an asymptotically
stable fixed point, but the converse is not true. Asymptotic stability and Lyapunov stability, however, are distinct notions --
it is actually possible to have a system that is asymptotically stable but not stable i.s.L.†*finite-time stability*; we
will see examples of this later in the book, but it is a difficult topic to
penetrate with graphical analysis. Rigorous nonlinear system analysis is rich with
subtleties and surprises.

Our graph of $\dot{x}$ vs. $x$ can be used to convince ourselves of i.s.L. and asymptotic stability by visually inspecting $\dot{x}$ in the vicinity of a fixed point. Even exponential stability can be inferred if the function can be bounded away from the origin by a negatively-sloped line through the fixed point, since it implies that the nonlinear system will converge at least as fast as the linear system represented by the straight line. I will graphically illustrate unstable fixed points with open circles and stable fixed points (i.s.L.) with filled circles.

Next, we need to consider what happens to initial conditions which begin farther from the fixed points. If we think of the dynamics of the system as a flow on the $x$-axis, then we know that anytime $\dot{x} > 0$, the flow is moving to the right, and $\dot{x} < 0$, the flow is moving to the left. If we further annotate our graph with arrows indicating the direction of the flow, then the entire (long-term) system behavior becomes clear:

For instance, we can see that any initial condition $x(0) \in (\pi,\pi)$ will
result in $\lim_{t\rightarrow \infty} x(t) = 0$. This region is called the *basin of
attraction* of the fixed point at $x^* = 0$. Basins of attraction of two
fixed points cannot overlap, and the manifold separating two basins of
attraction is called the *separatrix*. Here the unstable fixed points,
at $x^* = \{.., -\pi, \pi, 3\pi, ...\}$ form the separatrix between the basins
of attraction of the stable fixed points.

As these plots demonstrate, the behavior of a first-order one dimensional system on a line is relatively constrained. The system will either monotonically approach a fixed-point or monotonically move toward $\pm \infty$. There are no other possibilities. Oscillations, for example, are impossible. Graphical analysis is a fantastic analysis tool for many first-order nonlinear systems (not just pendula); as illustrated by the following example:

One last piece of terminology. In the neuron example, and in many dynamical
systems, the dynamics were parameterized; in this case by a single parameter,
$w$. As we varied $w$, the fixed points of the system moved around. In fact,
if we increase $w$ through $w=1$, something dramatic happens - the system goes
from having one fixed point to having three fixed points. This is called a
*bifurcation*. This particular bifurcation is called a pitchfork
bifurcation. We often draw bifurcation diagrams which plot the fixed points of
the system as a function of the parameters, with solid lines indicating stable
fixed points and dashed lines indicating unstable fixed points, as seen in the
figure:

Our pendulum equations also have a (saddle-node) bifurcation when we change
the constant torque input, $u_0$. Finally, let's return to the original
equations in $\theta$, instead of in $x$. Only one point to make: because of
the wrap-around, this system will *appear* have oscillations. In fact,
the graphical analysis reveals that the pendulum will turn forever whenever
$|u_0| > \frac{mgl}{b}$, but now you understand that this is not an oscillation, but an
instability with $\theta \rightarrow \pm \infty$.

Consider again the system $$ml^2 \ddot\theta = u_0 - mgl \sin\theta -
b\dot\theta,$$ this time with $b = 0$. This time the system dynamics are truly
second-order. We can always think of any second-order system as (coupled)
first-order system with twice as many variables. Consider a general, autonomous
(not dependent on time), second-order system, $$\ddot{q} = f(q,\dot q,u).$$ This
system is equivalent to the two-dimensional first-order system \begin{align*}
\dot x_1 =& x_2 \\ \dot x_2 =& f(x_1,x_2,u), \end{align*} where $x_1 = q$ and
$x_2 = \dot q$. Therefore, the graphical depiction of this system is not a
line, but a vector field where the vectors $[\dot x_1, \dot x_2]^T$ are plotted
over the domain $(x_1,x_2)$. This vector field is known as the *phase
portrait* of the system.

In this section we restrict ourselves to the simplest case when $u_0 = 0$. Let's sketch the phase portrait. First sketch along the $\theta$-axis. The $x$-component of the vector field here is zero, the $y$-component is $-mgl\sin\theta.$ As expected, we have fixed points at $\pm \pi, ...$ Now sketch the rest of the vector field. Can you tell me which fixed points are stable? Some of them are stable i.s.L., none are asymptotically stable.

You might wonder how we drew the black contour lines in the figure above. We could have obtained them by simulating the system numerically, but those lines can be easily obtained in closed-form. Directly integrating the equations of motion is difficult, but at least for the case when $u_0 = 0$, we have some additional physical insight for this problem that we can take advantage of. The kinetic energy, $T$, and potential energy, $U$, of the pendulum are given by $$T = \frac{1}{2}I\dot\theta^2, \quad U = -mgl\cos(\theta),$$ where $I=ml^2$ and the total energy is $E(\theta,\dot\theta) = T(\dot\theta)+U(\theta)$. The undamped pendulum is a conservative system: total energy is a constant over system trajectories. Using conservation of energy, we have: \begin{gather*} E(\theta(t),\dot\theta(t)) = E(\theta(0),\dot\theta(0)) = E_0 \\ \frac{1}{2} I \dot\theta^2(t) - mgl\cos(\theta(t)) = E_0 \\ \dot\theta(t) = \pm \sqrt{\frac{2}{I}\left[E_0 + mgl\cos\left(\theta(t)\right)\right]} \end{gather*} Using this, if you tell me $\theta$ I can determine one of two possible values for $\dot\theta$, and the solution has all of the richness of the black countour lines from the plot. This equation has a real solution when $\cos(\theta) > \cos(\theta_{max})$, where $$\theta_{max} = \begin{cases} \cos^{-1}\left( \frac{E}{mgl} \right), & E < mgl \\ \pi, & \text{otherwise}. \end{cases}$$ Of course this is just the intuitive notion that the pendulum will not swing above the height where the total energy equals the potential energy. As an exercise, you can verify that differentiating this equation with respect to time indeed results in the equations of motion.

Now what happens if we add a constant torque? If you visualize the bifurcation diagram, you'll see that the fixed points come together, towards $q = \frac{\pi}{2}, \frac{5\pi}{2}, ...$, until they disappear. One fixed-point is unstable, and one is stable.

Before we continue, let me know give you the promised example of a system that is asymptotically stable, but not stable i.s.L.. We can accomplish this with a very pendulum-like example (written here in polar coordinates):

Take a minute to draw the vector field of this (you can draw each coordinate independently, if it helps) to make sure you understand. Note that to wrap-around rotation is convenient but not essential -- we could have written the same dynamical system in cartesian coordinates without wrapping.

Now let's add damping back. You can still add torque to move the fixed points (in the same way).

With damping, the downright fixed point of the pendulum now becomes asymptotically stable (in addition to stable i.s.L.).
Is it also exponentially stable? How can we tell? One technique is to linearize the system at the fixed point. A smooth, time-invariant, nonlinear
system that is locally exponentially stable *must* have a stable linearization; we'll discuss linearization more in the next chapter.

Here's a thought exercise. If $u$ is no longer a constant, but a function
$\pi(q,\dot{q})$, then how would you choose $\pi$ to stabilize the vertical
position. Feedback linearization is the trivial solution, for example: $$u =
\pi(q,\dot{q}) = 2mgl\sin\theta.$$ But these plots we've been making
tell a different story. How would you shape the natural dynamics - at each
point pick a $u$ from the stack of phase plots - to stabilize the vertical fixed
point *with minimal torque effort*? This is exactly the way that I would
like you to think about control system design. And we'll give you your first solution techniques -- using dynamic programming -- in the next lecture.

The simple pendulum is fully actuated. Given enough torque, we can produce any number of control solutions to stabilize the originally unstable fixed point at the top (such as designing a feedback law to effectively invert gravity).

The problem begins to get interesting (a.k.a. becomes underactuated) if we impose a torque-limit constraint, $|u|\le u_{max}$. Looking at the phase portraits again, you can now visualize the control problem. Via feedback, you are allowed to change the direction of the vector field at each point, but only by a fixed amount. Clearly, if the maximum torque is small (smaller than $mgl$), then there are some states which cannot be driven directly to the goal, but must pump up energy to reach the goal. Furthermore, if the torque-limit is too severe and the system has damping, then it may be impossible to swing up to the top. The existence of a solution, and number of pumps required to reach the top, is a non-trivial function of the initial conditions and the torque-limits.

Although this system is very simple, its solution requires much of the same reasoning necessary for controlling much more complex underactuated systems; this problem will be a work-horse for us as we introduce new algorithms throughout this book.

A great deal of work in the control of underactuated systems has been done in the context of low-dimensional model systems. These model systems capture the essence of the problem without introducing all of the complexity that is often involved in more real-world examples. In this chapter we will focus on two of the most well-known and well-studied model systems--the Acrobot and the Cart-Pole. After we have developed some tools, we will see that they can be applied directly to other model systems; we will give a number of examples using Quadrotors. All of these systems are trivially underactuated, having less actuators than degrees of freedom.

The Acrobot is a planar two-link robotic arm in the vertical plane
(working against gravity), with an actuator at the elbow, but no
actuator at the shoulder. It was first
described in detail in

The Acrobot is representative of the primary challenge in underactuated robots. In order to swing up and balance the entire system, the controller must reason about and exploit the state-dependent coupling between the actuated degree of freedom and the unactuated degree of freedom. It is also an important system because, as we will see, it closely resembles one of the simplest models of a walking robot.

Figure 3.1 illustrates the model parameters used in our analysis. $\theta_1$ is the shoulder joint angle, $\theta_2$ is the elbow (relative) joint angle, and we will use $\bq = [\theta_1,\theta_2]^T$, $\bx = [\bq,\dot\bq]^T$. The zero configuration is with both links pointed directly down. The moments of inertia, $I_1,I_2$ are taken about the pivots. The task is to stabilize the unstable fixed point $\bx = [\pi,0,0,0]^T$.

We will derive the equations of motion for the Acrobot using the method of Lagrange. The kinematics are given by: \begin{equation} \bx_1 = \begin{bmatrix} l_1 s_1 \\ -l_1 c_1 \end{bmatrix}, \quad \bx_2 = \bx_1 + \begin{bmatrix} l_2 s_{1+2} \\ - l_2 c_{1+2} \end{bmatrix} . \end{equation} The energy is given by: \begin{gather} T = T_1 + T_2, \quad T_1 = \frac{1}{2} I_1 \dot{q}_1^2 \\ T_2 = \frac{1}{2} ( m_2 l_1^2 + I_2 + 2 m_2 l_1 l_{c2} c_2 ) \dot{q}_1^2 + \frac{1}{2} I_2 \dot{q}_2^2 + (I_2 + m_2 l_1 l_{c2} c_2 ) \dot{q}_1 \dot{q}_2 \\ % from expanding sum of point masses U = -m_1 g l_{c1} c_1 - m_2 g (l_1 c_1 + l_{c2} c_{1+2}) \end{gather} Entering these quantities into the Lagrangian yields the equations of motion: \begin{gather} (I_1 + I_2 + m_2 l_1^2 + 2m_2 l_1 l_{c2} c_2) \ddot{q}_1 + (I_2 + m_2 l_1 l_{c2} c_2)\ddot{q}_2 - 2m_2 l_1 l_{c2} s_2 \dot{q}_1 \dot{q}_2 \\ \quad -m_2 l_1 l_{c2} s_2 \dot{q}_2^2 + m_1 g l_{c1}s_1 + m_2 g (l_1 s_1 + l_{c2} s_{1+2}) = 0 \\ (I_2 + m_2 l_1 l_{c2} c_2) \ddot{q}_1 + I_2 \ddot{q}_2 + m_2 l_1 l_{c2} s_2 \dot{q}_1^2 + m_2 g l_{c2} s_{1+2} = \tau \end{gather} In standard, manipulator equation form, we have: \begin{gather} {\bf H}(\bq) = \begin{bmatrix} I_1 + I_2 + m_2 l_1^2 + 2m_2 l_1 l_{c2} c_2 & I_2 + m_2 l_1 l_{c2} c_2 \\ I_2 + m_2 l_1 l_{c2} c_2 & I_2 \end{bmatrix},\label{eq:Hacrobot}\\{\bf C}(\bq,\dot{\bq}) = \begin{bmatrix} -2 m_2 l_1 l_{c2} s_2 \dot{q}_2 & -m_2 l_1 l_{c2} s_2 \dot{q}_2 \\ m_2 l_1 l_{c2} s_2 \dot{q}_1 & 0 \end{bmatrix}, \\ {\bf G}(\bq) = \begin{bmatrix} m_1 g l_{c1}s_1 + m_2 g (l_1 s_1 + l_{c2}s_{1+2}) \\ m_2 g l_{c2} s_{1+2} \end{bmatrix}, \quad {\bf B} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}. \end{gather}

You can experiment with the Acrobot dynamics in

```
cd(fullfile(getDrakePath,'examples','Acrobot'));
plant=AcrobotPlant();
[H,C_times_v,G,B] = plant.manipulatorEquations()
runPassive(); % simulate with torque=0 (and display the results)
```

The other model system that we will investigate here is the cart-pole system, in which the task is to balance a simple pendulum around its unstable equilibrium, using only horizontal forces on the cart. Balancing the cart-pole system is used in many introductory courses in control, including 6.003 at MIT, because it can be accomplished with simple linear control (e.g. pole placement) techniques. In this chapter we will consider the full swing-up and balance control problem, which requires a full nonlinear control treatment.

The figure shows our parameterization of the system. $x$ is the horizontal position of the cart, $\theta$ is the counter-clockwise angle of the pendulum (zero is hanging straight down). We will use $\bq = [x,\theta]^T$, and $\bx = [\bq,\dot\bq]^T$. The task is to stabilize the unstable fixed point at $\bx = [0,\pi,0,0]^T.$

The kinematics of the system are given by \begin{equation}\bx_1 = \begin{bmatrix} x \\ 0 \end{bmatrix}, \quad \bx_2 = \begin{bmatrix} x + l\sin\theta \\ -l\cos\theta \end{bmatrix}. \end{equation} The energy is given by \begin{align} T=& \frac{1}{2} (m_c + m_p)\dot{x}^2 + m_p \dot{x}\dot\theta l \cos{\theta} + \frac{1}{2}m_p l^2 \dot\theta^2 \\ U =& -m_p g l \cos\theta. \end{align} The Lagrangian yields the equations of motion: \begin{gather} (m_c + m_p)\ddot{x} + m_p l \ddot\theta \cos\theta - m_p l \dot\theta^2 \sin\theta = f \\ m_p l \ddot{x} \cos\theta + m_p l^2 \ddot\theta + m_p g l \sin\theta = 0 \end{gather} In standard form, using $\bq = [x,\theta]^T$, $\bu = f$: $${\bf H}(\bq)\ddot\bq + {\bf C}(\bq,\dot\bq)\dot\bq + {\bf G}(\bq) = {\bf B}\bu,$$ where \begin{gather*} {\bf H}(\bq) = \begin{bmatrix} m_c + m_p & m_p l \cos\theta \\ m_p l \cos\theta & m_p l^2 \end{bmatrix}, \quad {\bf C}(\bq,\dot{\bq}) = \begin{bmatrix} 0 & -m_p l \dot\theta \sin\theta \\ 0 & 0 \end{bmatrix}, \\ {\bf G}(\bq) = \begin{bmatrix} 0 \\ m_p gl \sin\theta \end{bmatrix}, \quad {\bf B} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \end{gather*} In this case, it is particularly easy to solve directly for the accelerations: \begin{align} \ddot{x} =& \frac{1}{m_c + m_p \sin^2\theta}\left[ f+m_p \sin\theta (l \dot\theta^2 + g\cos\theta)\right] \label{eq:ddot_x}\\ \ddot{\theta} =& \frac{1}{l(m_c + m_p \sin^2\theta)} \left[ -f \cos\theta - m_p l \dot\theta^2 \cos\theta \sin\theta - (m_c + m_p) g \sin\theta \right] \label{eq:ddot_theta} \end{align} In some of the analysis that follows, we will study the form of the equations of motion, ignoring the details, by arbitrarily setting all constants to 1: \begin{gather} 2\ddot{x} + \ddot\theta \cos\theta - \dot\theta^2 \sin\theta = f \label{eq:simple}\\ \ddot{x}\cos\theta + \ddot\theta + \sin\theta = 0. \label{eq:simple2} \end{gather}

You can experiment with the Cart-Pole dynamics in

```
cd(fullfile(getDrakePath,'examples','CartPole'));
plant=CartPolePlant();
[H,C_times_v,G,B] = plant.manipulatorEquations()
runPassive(); % simulate with force=0 (and display the results)
```

For both the Acrobot and the Cart-Pole systems, we will begin by designing a linear controller which can balance the system when it begins in the vicinity of the unstable fixed point. To accomplish this, we will linearize the nonlinear equations about the fixed point, examine the controllability of this linear system, then using linear quadratic regulator (LQR) theory to design our feedback controller.

Although the equations of motion of both of these model systems are relatively tractable, the forward dynamics still involve quite a few nonlinear terms that must be considered in any linearization. Let's consider the general problem of linearizing a system described by the manipulator equations.

We can perform linearization around a fixed point, $(\bx^*, \bu^*)$, using a Taylor expansion: \begin{equation} \dot\bx = {\bf f}(\bx,\bu) \approx {\bf f}(\bx^*,\bu^*) + \left[ \pd{\bf f}{\bx}\right]_{\bx=\bx^*,\bu=\bu^*} (\bx - \bx^*) + \left[ \pd{\bf f}{\bu}\right]_{\bx=\bx^*,\bu=\bu^*} (\bu - \bu^*) \end{equation} Let us consider the specific problem of linearizing the manipulator equations around a (stable or unstable) fixed point. In this case, ${\bf f}(\bx^*,\bu^*)$ is zero, and we are left with the standard linear state-space form: \begin{align} \dot\bx =& \begin{bmatrix} \dot\bq \\ {\bf H}^{-1}(\bq) \left[ {\bf B}(\bq)\bu - {\bf C}(\bq,\dot\bq)\dot\bq - {\bf G}(\bq) \right] \end{bmatrix},\\ \approx& {\bf A}_{lin} (\bx-\bx^*) + {\bf B}_{lin} (\bu - \bu^*), \end{align} where ${\bf A}_{lin}$, and ${\bf B}_{lin}$ are constant matrices. Let us define $\bar\bx = \bx - \bx^*, \bar\bu = \bu - \bu^*$, and write $$\dot{\bar\bx} = {\bf A}_{lin}\bar\bx + {\bf B}_{lin}\bar\bu.$$ Evaluation of the Taylor expansion around a fixed point yields the following, very simple equations, given in block form by: \begin{align} {\bf A}_{lin} =& \begin{bmatrix} {\bf 0} & {\bf I} \\ -{\bf H}^{-1} \pd{\bf G}{\bq} + \sum_{j} {\bf H}^{-1}\pd{{\bf B}_j}{\bq} u_j & -{\bf H}^{-1} {\bf C} \end{bmatrix}_{\bx=\bx^*,\bu=\bu^*} \\ {\bf B}_{lin} =& \begin{bmatrix} {\bf 0} \\ {\bf H}^{-1} {\bf B} \end{bmatrix}_{\bx=\bx^*, \bu=\bu^*} \end{align} where ${\bf B}_j$ is the $j$th column of ${\bf B}$. Note that the term involving $\pd{{\bf H}^{-1}}{q_i}$ disappears because ${\bf B}\bu - {\bf C}\dot{\bq} - {\bf G}$ must be zero at the fixed point. Many of the ${\bf C}\dot\bq$ derivatives drop out, too, because $\dot{\bq}^* = 0$. In many cases, including both the Acrobot and Cart-Pole systems (but not the Quadrotors), the matrix ${\bf B}(\bq)$ is a constant, so the $\pd{\bf B}{\bq}$ terms also drop out.

Linearizing around the (unstable) upright point, we have: \begin{gather} {\bf C}(\bq,\dot\bq)_{\bx=\bx^*} = {\bf 0},\\ \left[\pd{\bf G}{\bq}\right]_{\bx=\bx^*} = \begin{bmatrix} -g (m_1 l_{c1} + m_2 l_1 + m_2 l_{c2}) & -m_2 g l_{c2} \\ -m_2 g l_{c2} & -m_2 g l_{c2} \end{bmatrix} \end{gather} The linear dynamics follow directly from these equations and the manipulator form of the Acrobot equations.

Linearizing around the unstable fixed point in this system, we have: \begin{gather} {\bf C}(\bq,\dot\bq)_{\bx=\bx^*} = {\bf 0},\quad \left[\pd{\bf G}{\bq}\right]_{\bx=\bx^*} = \begin{bmatrix} 0 & 0 \\ 0 & -m_p g l \end{bmatrix} \end{gather} Again, the linear dynamics follow simply.

For the linear system $$\dot{\bx} = {\bf A}\bx + {\bf B}u,$$ we can integrate this linear system in closed form, so it is possible to derive the exact conditions of controllability. In particular, for linear systems it is sufficient to demonstrate that there exists a control input which drives any initial condition to the origin.

Let us first examine a special case, which falls short as a general tool but may be more useful for understanding the intuition of controllability. Let's perform an eigenvalue analysis of the system matrix ${\bf A}$, so that: $${\bf A}{\bf v}_i = \lambda_i {\bf v}_i,$$ where $\lambda_i$ is the $i$th eigenvalue, and ${\bf v}_i$ is the corresponding (right) eigenvector. There will be $n$ eigenvalues for the $n \times n$ matrix ${\bf A}$. Collecting the (column) eigenvectors into the matrix ${\bf V}$ and the eigenvalues into a diagonal matrix ${\bf \Lambda}$, we have $${\bf A}{\bf V} = {\bf V}{\bf \Lambda}.$$ Here comes our primary assumption: let us assume that each of these $n$ eigenvalues takes on a distinct value (no repeats). With this assumption, it can be shown that the eigenvectors ${\bf v}_i$ form a linearly independent basis set, and therefore ${\bf V}^{-1}$ is well-defined.

We can continue our eigenmodal analysis of the linear system by defining the modal coordinates, ${\bf r}$, with: $$\bx = {\bf V}{\bf r},\quad \text{or}\quad {\bf r} = {\bf V}^{-1}\bx.$$ In modal coordinates, the dynamics of the linear system are given by $$\dot{\bf r} = {\bf V}^{-1} {\bf A} {\bf V} {\bf r} + {\bf V}^{-1} {\bf B} \bu = {\bf \Lambda} {\bf r} + {\bf V}^{-1}{\bf B} \bu.$$ This illustrates the power of modal analysis; in modal coordinates, the dynamics diagonalize yielding independent linear equations: $$\dot{r}_i = \lambda_i r_i + \sum_j \beta_{ij} u_j,\quad {\bf \beta} = {\bf V}^{-1} {\bf B}.$$

Now the concept of controllability becomes clear. Input $j$ can
influence the dynamics in modal coordinate $i$ if and only if ${\bf
\beta}_{ij} \neq 0$. In the special case of non-repeated
eigenvalues, having control over each individual eigenmode is
sufficient to (in finite time) regulate all of the
eigenmodes

A more general solution to the controllability issue, which removes
our assumption about the eigenvalues, can be obtained by examining the
time-domain solution of the linear equations. The solution of this
system is $$\bx(t) = e^{{\bf A}t} \bx(0) + \int_0^{t} e^{{\bf A}(t -
\tau)} {\bf B} u(\tau) d\tau.$$ Without loss of generality, lets
consider the that the final state of the system is zero. Then we
have: $$\bx(0) = - \int_0^{t_f} e^{-{\bf A}\tau}{\bf B}u(\tau)
d\tau.$$ You might be wondering what we mean by $e^{{\bf A}t}$; a
scalar raised to the power of a matrix..? Recall that $e^{z}$ is
actually defined by a convergent infinite sum: $$e^{z} =1 + z +
\frac{1}{2} z^2 + \frac{1}{6} z^3 + ... .$$ The notation $e^{{\bf
A}t}$ uses the same definition: $$e^{{\bf A}t} = {\bf I} + {\bf A}t +
\frac{1}{2}({\bf A}t)^2 + \frac{1}{6}({\bf A}t)^3 + ... .$$ Not
surprisingly, this has many special forms. For instance, if ${\bf A}$
is diagonalizable, $e^{{\bf
A}t} = {\bf V}e^{{\bf\Lambda}t}{\bf V}^{-1},$ where ${\bf A} = {\bf V
\Lambda V}^{-1}$ is the eigenvalue decomposition of ${\bf A}$

Although we only treated the case of a scalar $u$, it is possible to
extend the analysis to a vector $\bu$ of size $m$, yielding the
condition $$\text{rank} \begin{bmatrix} {\bf B} & {\bf AB} & {\bf
A}^2{\bf B} & \cdots & {\bf A}^{n-1}{\bf B} \end{bmatrix}_{n \times
(nm)} = n.$$ In Matlab (using the Control Systems Toolbox),
you can obtain the controllability matrix using `Cm = ctrb(A,B)`

,
and evaluate its rank with `rank(Cm)`

.

Note that a
linear feedback to change the eigenvalues of the eigenmodes is not
sufficient to accomplish our goal of getting to the goal in finite
time. In fact, the open-loop control to reach the goal is easily
obtained with a final-value LQR problem, and (for ${\bf
R}={\bf I}$) is actually a simple function of the controllability
Grammian

Analysis of the controllability of both the Acrobot and Cart-Pole systems reveals that the linearized dynamics about the upright are, in fact, controllable. This implies that the linearized system, if started away from the zero state, can be returned to the zero state in finite time. This is potentially surprising - after all the systems are underactuated. For example, it is interesting and surprising that the Acrobot can balance itself in the upright position without having a shoulder motor.

The controllability of these model systems demonstrates an extremely
important, point: *An underactuated system is not necessarily
an uncontrollable system.* Underactuated systems cannot follow
arbitrary trajectories, but that does not imply that they cannot
arrive at arbitrary points in state space. However, the trajectory
required to place the system into a particular state may be arbitrarily
complex.

The controllability analysis presented here is for linear time-invariant (LTI) systems. A comparable analysis exists for linear time-varying (LTV) systems. We will even see extensions to nonlinear systems; although it will often be referred to by the synonym of "reachability" analysis.

Controllability tells us that a trajectory to the fixed point exists, but does not tell us which one we should take or what control inputs cause it to occur. Why not? There are potentially infinitely many solutions. We have to pick one.

The tools for controller design in linear systems are very advanced. In
particular, as we describe in the linear optimal control chapter, one can easily
design an optimal feedback controller for a regulation task like balancing, so
long as we are willing to linearize the system around the operating point and
define optimality in terms of a quadratic cost function: $$J(\bx_0) =
\int_0^\infty \left[ \bx^T(t) {\bf Q} \bx(t) + \bu^T(t) {\bf R} \bu(t) \right]dt,
\quad \bx(0)=\bx_0, {\bf Q}={\bf Q}^T>0, {\bf R}={\bf R}^T>0.$$ The linear
feedback matrix ${\bf K}$ used as $$\bu(t) = - {\bf K}\bx(t),$$ is the so-called
optimal linear quadratic regulator (LQR). Even without understanding the
detailed derivation, we can quickly become practitioners of LQR. Conveniently,
Matlab has a function, `K = lqr(A,B,Q,R)`

. Therefore, to use LQR, one
simply needs to obtain the linearized system dynamics and to define the
symmetric positive-definite cost matrices, ${\bf Q}$ and ${\bf R}$. In their
most common form, ${\bf Q}$ and ${\bf R}$ are positive diagonal matrices, where
the entries $Q_{ii}$ penalize the relative errors in state variable $x_i$
compared to the other state variables, and the entries $R_{ii}$ penalize actions
in $u_i$.

Take a minute to play around with the LQR controller for the Acrobot

```
cd(fullfile(getDrakePath,'examples','Acrobot'));
runLQR();
```

and Cart-Pole
```
cd(fullfile(getDrakePath,'examples','CartPole'));
runLQR();
```

Make sure that you take a minute to look at the code which runs during these
examples. Can you set the ${\bf Q}$ and ${\bf R}$ matrices differently, to improve the performance?
Simulation of the closed-loop response with LQR feedback shows that the task is indeed completed - and in an impressive manner. Oftentimes the state of the system has to move violently away from the origin in order to ultimately reach the origin. Further inspection reveals the (linearized) closed-loop dynamics are in fact non-minimum phase (acrobot has 3 right-half zeros, cart-pole has 1).

LQR works essentially out of the box for Quadrotors, if linearized around
a nominal fixed point (where the non-zero thrust from the propellers is balancing
gravity). We have a few Quadrotor models in

```
cd(fullfile(getDrakePath,'examples','Quadrotor2D'));
runLQR();
```

but you can also do this with a full 3D quadrotor model:
```
cd(fullfile(getDrakePath,'examples','Quadrotor'));
runLQR();
```

Try to change these ${\bf Q}$ and ${\bf R}$ costs here, too.
Note that LQR, although it is optimal for the linearized system, is
not necessarily the best linear control solution for maximizing basin
of attraction of the fixed-point. The theory of *robust
control*(e.g.,

In the introductory chapters, we made the point that the underactuated
systems are not feedback linearizable. At least not completely.
Although we cannot linearize the full dynamics of the system, it is
still possible to linearize a portion of the system dynamics. The
technique is called *partial feedback linearization*.

Consider the cart-pole example. The dynamics of the cart are affected by the motions of the pendulum. If we know the model, then it seems quite reasonable to think that we could create a feedback controller which would push the cart in exactly the way necessary to counter-act the dynamic contributions from the pendulum - thereby linearizing the cart dynamics. What we will see, which is potentially more surprising, is that we can also use a feedback law for the cart to feedback linearize the dynamics of the passive pendulum joint.

We'll use the term *collocated* partial feedback linearization to
describe a controller which linearizes the dynamics of the actuated
joints. What's more surprising is that it is often possible to
achieve *non-collocated* partial feedback linearization - a
controller which linearizes the dynamics of the unactuated joints.
The treatment presented here follows from

Starting from the equations \ref{eq:simple} and \ref{eq:simple2}, we have \begin{gather*} \ddot\theta = -\ddot{x}c - s \\ % \ddot\theta = -\frac{1}{l} (\ddot{x} c + g s) \ddot{x}(2-c^2) - sc - \dot\theta^2 s = f \end{gather*} Therefore, applying the feedback control law \begin{equation} f = (2 - c^2) \ddot{x}^d - sc - \dot\theta^2 s % f = (m_c + m_p) u + m_p (u c + g s) c - m_p l \dot\theta^2 s \end{equation} results in \begin{align*} \ddot{x} =& \ddot{x}^d \\ \ddot{\theta} =& -\ddot{x}^dc - s, \end{align*} which are valid globally.

Starting again from equations \ref{eq:simple} and \ref{eq:simple2}, we have \begin{gather*} \ddot{x} = -\frac{\ddot\theta + s}{c} \\ \ddot\theta(c - \frac{2}{c}) - 2 \tan\theta - \dot\theta^2 s = f \end{gather*} Applying the feedback control law \begin{equation} f = (c - \frac{2}{c}) \ddot\theta^d - 2 \tan\theta - \dot\theta^2 s \end{equation} results in \begin{align*} \ddot\theta =& \ddot\theta^d \\ \ddot{x} =& -\frac{1}{c} \ddot\theta^d - \tan\theta. \end{align*} Note that this expression is only valid when $\cos\theta \neq 0$. This is not surprising, as we know that the force cannot create a torque when the beam is perfectly horizontal.

For systems that are trivially underactuated (torques on some joints, no torques on other joints), we can, without loss of generality, reorganize the joint coordinates in any underactuated system described by the manipulator equations into the form: \begin{align} {\bf H}_{11} \ddot{\bq}_1 + {\bf H}_{12} \ddot{\bq}_2 + {\bf \phi}_1 = 0, \label{eq:passive_dyn}\\ {\bf H}_{21} \ddot{\bq}_1 + {\bf H}_{22} \ddot{\bq}_2 + {\bf \phi}_2 = {\bf \tau}, \label{eq:active_dyn} \end{align} with $\bq \in \Re^n$, $\bq_1 \in \Re^m$, $\bq_2 \in \Re^l$, $l=n-m$. $\bq_1$ represents all of the passive joints, and $\bq_2$ represents all of the actuated joints, and the $\phi$ terms capture all of the Coriolis and gravitational terms, and $${\bf H}(\bq) = \begin{bmatrix} {\bf H}_{11} & {\bf H}_{12} \\ {\bf H}_{21} & {\bf H}_{22} \end{bmatrix}.$$ Fortunately, because ${\bf H}$ is uniformly (e.g. for all $\bq$) positive definite, ${\bf H}_{11}$ and ${\bf H}_{22}$ are also positive definite.

Performing the same substitutions into the full manipulator equations,
we get:
\begin{gather}
\ddot\bq_1 = -{\bf H}_{11}^{-1} \left[ {\bf H}_{12} \ddot\bq_2 + {\bf \phi}_1
\right] \\
({\bf H}_{22} - {\bf H}_{21} {\bf H}_{11}^{-1} {\bf H}_{12})
\ddot\bq_2 + {\bf \phi_2} - {\bf H}_{21} {\bf H}_{11}^{-1} {\bf
\phi}_1 = {\bf \tau}
\end{gather}
It can be easily shown that the matrix $({\bf H}_{22} - {\bf H}_{21}
{\bf H}_{11}^{-1} {\bf H}_{12})$ is invertible

\begin{gather} \ddot\bq_2 = -{\bf H}_{12}^+ \left[ {\bf H}_{11} \ddot\bq_1 + {\bf \phi}_1 \right] \\ ({\bf H}_{21} - {\bf H}_{22} {\bf H}_{12}^+ {\bf H}_{11}) \ddot\bq_1 + {\bf \phi_2} - {\bf H}_{22} {\bf H}_{12}^+ {\bf \phi}_1 = {\bf \tau} \end{gather} where ${\bf H}_{12}^+$ is a Moore-Penrose pseudo-inverse. This inverse provides a unique solution when the rank of ${\bf H}_{12}$ equals $l$, the number of passive degrees of freedom in the system (it cannot be more, since the matrix only has $l$ rows). This rank condition is sometimes called the property of "Strong Inertial Coupling". It is state dependent. A system has Global Strong Inertial Coupling if it exhibits Strong Inertial Coupling in every state.

In general, we can define some combination of active and passive
joints that we would like to control. This combination is sometimes
called a "task space".

If the actuated joints are commanded so that \begin{equation} \ddot\bq_2 = \bar{\bf J}^+ \left [{\bf v} - \dot{\bf J}\dot\bq + {\bf J}_1 {\bf H}_{11}^{-1}{\bf \phi}_1 \right], \label{eq:q2cmd} \end{equation} where $\bar{{\bf J}} = {\bf J}_2 - {\bf J}_1 {\bf H}_{11}^{-1} {\bf H}_{12}.$ and $\bar{\bf J}^+$ is the right Moore-Penrose pseudo-inverse, $$\bar{\bf J}^+ = \bar{\bf J}^T (\bar{\bf J} \bar{\bf J}^T)^{-1},$$ then we have \begin{equation} \ddot{\bf y} = {\bf v}.\end{equation} subject to \begin{equation}\text{rank}\left(\bar{{\bf J}} \right) = p, \label{eq:rank_condition}\end{equation}

**Proof Sketch.**
Differentiating the output function we have
\begin{gather*}
\dot{\bf y} = {\bf J} \dot\bq \\
\ddot{\bf y} = \dot{\bf J} \dot\bq + {\bf J}_1 \ddot\bq_1 + {\bf J}_2 \ddot\bq_2.
\end{gather*}
Solving \ref{eq:passive_dyn} for the dynamics of the unactuated joints
we have:
\begin{equation}
\ddot\bq_1 = - {\bf H}_{11}^{-1} ({\bf H}_{12} \ddot\bq_2 + {\bf \phi}_1) \label{eq:q1cmd}
\end{equation}
Substituting, we have
\begin{align}
\ddot{\bf y} =& \dot{\bf J} \dot\bq - {\bf J}_1 {\bf H}_{11}^{-1}({\bf
H}_{12}\ddot\bq_2 + {\bf \phi}_1) + {\bf J}_2 \ddot\bq_2 \\
=& \dot{\bf J} \dot\bq + \bar{{\bf J}} \ddot\bq_2 - {\bf J}_1 {\bf H}_{11}^{-1}{\bf \phi}_1 \\
=& {\bf v}
\end{align}
Note that the last line required the rank condition
($\ref{eq:rank_condition}$) on $\bar{\bf J}$ to ensure that the rows
of $\bar{{\bf J}}$ are linearly independent, allowing $\bar{\bf J}
\bar{\bf J}^+ = {\bf I}$.

In order to execute a task space trajectory one could command $${\bf
v} = \ddot{\bf y}^d + {\bf K}_d (\dot{\bf y}^d - \dot{\bf y}) + {\bf K}_p ({\bf y}^d -{\bf y}).$$ Assuming the internal dynamics are stable, this yields
converging error dynamics, $({\bf y}_d - {\bf y})$, when ${\bf K}_p,{\bf K}_d
> 0$

Consider the task of trying to track a desired kinematic trajectory with the endpoint of pendulum in the cart-pole system. With one actuator and kinematic constraints, we might be hard-pressed to track a trajectory in both the horizontal and vertical coordinates. But we can at least try to track a trajectory in the vertical position of the end-effector.

Using the task-space PFL derivation, we have: \begin{gather*} y = f(\bq) = -l \cos\theta \\ \dot{y} = l \dot\theta \sin\theta \end{gather*} If we define a desired trajectory: $$y^d(t) = \frac{l}{2} + \frac{l}{4} \sin(t),$$ then the task-space controller is easily implemented using the derivation above.

The task space derivation above provides a convenient generalization
of the partial feedback linearization (PFL)

If we choose ${\bf y} = \bq_1$ (non-collocated), we have $${\bf J}_1 =
{\bf I}, {\bf J}_2
= 0, \dot{\bf J} = 0, \bar{{\bf J}}=-{\bf H}_{11}^{-1}{\bf H}_{12}.$$ The rank
condition ($\ref{eq:rank_condition}$) requires that
$\text{rank}({\bf H}_{12}) = l$, in which case we can write
$\bar{{\bf J}}^+=-{\bf H}_{12}^+{\bf H}_{11}$, reducing the rank condition to
precisely the "Strong Inertial Coupling" condition described in

Recall the phase portraits that we used to understand the dynamics of
the undamped, unactuated, simple pendulum ($u=b=0$). The orbits of this phase plot were defined by
contours of constant energy. One very special orbit, known as a
*homoclinic* orbit, is the orbit which passes through the
unstable fixed point. In fact, visual inspection will reveal that any
state that lies on this homoclinic orbit must pass into the unstable
fixed point. Therefore, if we seek to design a nonlinear feedback
control policy which drives the simple pendulum from any initial
condition to the unstable fixed point, a very reasonable strategy
would be to use actuation to regulate the energy of the pendulum to
place it on this homoclinic orbit, then allow the system dynamics to
carry us to the unstable fixed point.

This idea turns out to be a bit more general than just for the simple pendulum. As we will see, we can use similar concepts of `energy shaping' to produce swing-up controllers for the acrobot and cart-pole systems. It's important to note that it only takes one actuator to change the total energy of a system.

Although a large variety of swing-up controllers have been proposed
for these model
systems (c.f.

Recall the equations of motion for the undamped simple pendulum were given by $$ml^2 \ddot\theta + mgl\sin\theta = u.$$ The total energy of the simple pendulum is given by $$E = \frac{1}{2} m l^2 \dot\theta^2 - mgl\cos\theta.$$ To understand how to control the energy, observe that \begin{align*} \dot{E} =& ml^2\dot\theta \ddot\theta + \dot\theta mgl\sin\theta \\ =& \dot\theta \left[ u - mgl\sin\theta \right] + \dot\theta mgl\sin\theta \\ =& u\dot\theta. \end{align*} In words, adding energy to the system is simple - simply apply torque in the same direction as $\dot\theta$. To remove energy, simply apply torque in the opposite direction (e.g., damping).

To drive the system to the homoclinic orbit, we must regulate the energy of the system to a particular desired energy, $$E^d = mgl.$$ If we define $\tilde{E} = E - E^d$, then we have $$\dot{\tilde{E}} = \dot{E} = u\dot\theta.$$ If we apply a feedback controller of the form $$u = -k \dot\theta \tilde{E},\quad k>0,$$ then the resulting error dynamics are $$\dot{\tilde{E}} = - k \dot\theta^2 \tilde{E}.$$ These error dynamics imply an exponential convergence: $$\tilde{E} \rightarrow 0,$$ except for states where $\dot\theta=0$. The essential property is that when $E > E^d$, we should remove energy from the system (damping) and when $E < E^d$, we should add energy (negative damping). Even if the control actions are bounded, the convergence is easily preserved.

This is a nonlinear controller that will push all system trajectories
to the unstable equilibrium. But does it make the unstable
equilibrium locally stable? *No.* Small perturbations may cause
the system to drive all of the way around the circle in order to once
again return to the unstable equilibrium. For this reason, once
trajectories come into the vicinity of our swing-up controller, we
prefer to switch to our LQR balancing controller to performance to
complete the task.

Having thought about the swing-up problem for the simple pendulum,
let's try to apply the same ideas to the cart-pole system. The basic
idea, from *of the pendulum* (a unit mass, unit length, simple
pendulum in unit gravity) is given by: $$E(\bx) = \frac{1}{2}
\dot\theta^2 - \cos\theta.$$
The desired
energy, equivalent to the energy at the desired fixed-point, is
$$E^d = 1.$$
Again defining $\tilde{E}(\bx) = E(\bx) - E^d$, we now observe
that
\begin{align*}
\dot{\tilde{E}}(\bx) =& \dot{E}(\bx) = \dot\theta \ddot\theta +
\dot\theta s \\
% \dot\tilde{E} = ml^2 \dot\theta \ddot\theta + mgl s
=& \dot\theta [ -uc - s] + \dot\theta s \\
% - ml \dot\theta [ u c + g s ] + mgl s
=& - u\dot\theta \cos\theta.
% - ml u \dot\theta c
\end{align*}
Therefore, if we design a controller of the form
$$u = k \dot\theta\cos\theta \tilde{E},\quad k>0$$
the result is
$$\dot{\tilde{E}} = - k \dot\theta^2 \cos^2\theta \tilde{E}.$$
This guarantees that $| \tilde{E} |$ is
non-increasing, but isn't quite enough to guarantee that it will go
to zero. For example, if $\theta = \dot\theta = 0$, the system will
never move. However, if we have that $$\int_0^t \dot\theta^2(t')
\cos^2 \theta(t') dt' \rightarrow \infty,\quad \text{as } t
\rightarrow \infty,$$ then we have $\tilde{E}(t) \rightarrow 0$. This
condition is satisfied for all but the trivial constant trajectories at fixed points.

Now we return to the full system dynamics (which includes the
cart).

Swing-up control for the acrobot can be accomplished in much the same
way.

Use collocated PFL. ($\ddot{q}_2 = \ddot{q}_2^d$). $$E(\bx) = \frac{1}{2}\dot\bq^T{\bf H}\dot{\bq} + U(\bx).$$ $$ E_d = U(\bx^*).$$ $$\bar{u} = \dot{q}_1 \tilde{E}.$$ $$\ddot{q}_2^d = - k_1 q_2 - k_2 \dot{q}_2 + k_3 \bar{u},$$

Extra PD terms prevent proof of asymptotic convergence to homoclinic
orbit. Proof of another energy-based controller in

The energy shaping controller for swing-up presented here is a pretty faithful representative from the field of nonlinear underactuated control. Typically these control derivations require some clever tricks for simplifying or canceling out terms in the nonlinear equations, then some clever Lyapunov function to prove stability. In these cases, PFL was used to simplify the equations, and therefore the controller design.

These controllers are important, representative, and relevant. But clever tricks with nonlinear equations seem to be fundamentally limited. Most of the rest of the material presented in this book will emphasize more general computational approaches to formulating and solving these and other control problems.

The acrobot and cart-pole systems are just two of the model systems used heavily in underactuated control research. Other examples include:

- Pendubot
- Inertia wheel pendulum
- Furuta pendulum (horizontal rotation and vertical pendulum)
- Hovercraft

Practical legged locomotion is one of the fundamental unsolved problems in robotics. Many challenges are in mechanical design - a walking robot must carry all of its actuators and power, making it difficult to carry ideal force/torque - controlled actuators. But many of the unsolved problems are because walking robots are underactuated control systems.

In this chapter we'll introduce some of the simple models of walking and robots, the control problems that result, and a very brief summary of some of the control solutions described in the literature. Compared to the robots that we have studied so far, our investigations of legged locomotion will require additional tools for thinking about limit cycle dynamics and dealing with impacts.

One of the earliest models of walking was proposed by
McMahon

McGeer

The most elementary model of passive dynamic walking, first used in
the context of walking by

- Collisions with ground are inelastic and impulsive (only angular momentum is conserved around the point of collision).
- The stance foot acts as a pin joint and does not slip.
- The transfer of support at the time of contact is instantaneous (no double support phase).
- $0 \le \gamma < \frac{\pi}{2}$, $0 < \alpha < \frac{\pi}{2}$, $l > 0$.

Note that the coordinate system used here is slightly different than for the simple pendulum ($\theta=0$ is at the top, and the sign of $\theta$ has changed).

The most comprehensive analysis of the rimless wheel was done by

The dynamics of the system when one leg is on the ground are given by $$\ddot\theta = \frac{g}{l}\sin(\theta).$$ If we assume that the system is started in a configuration directly after a transfer of support ($\theta(0^+) = \gamma-\alpha$), then forward walking occurs when the system has an initial velocity, $\dot\theta(0^+) > \omega_1$, where $$\omega_1 = \sqrt{ 2 \frac{g}{l} \left[ 1 - \cos \left (\gamma-\alpha \right) \right]}.$$ $\omega_1$ is the threshold at which the system has enough kinetic energy to vault the mass over the stance leg and take a step. This threshold is zero for $\gamma = \alpha$ and does not exist for $\gamma > \alpha$. The next foot touches down when $\theta(t) = \gamma+\alpha$, at which point the conversion of potential energy into kinetic energy yields the velocity $$\dot\theta(t^-) = \sqrt{ \dot\theta^2(0^+) + 4\frac{g}{l} \sin\alpha \sin\gamma }.$$ $t^-$ denotes the time immediately before the collision.

The angular momentum around the point of collision at time $t$ just before the next foot collides with the ground is $$L(t^-) = -ml^2\dot\theta(t^-) \cos(2\alpha).$$ The angular momentum at the same point immediately after the collision is $$L(t^+) = -ml^2\dot\theta(t^+).$$ Assuming angular momentum is conserved, this collision causes an instantaneous loss of velocity: $$\dot\theta(t^+) = \dot\theta(t^-) \cos(2\alpha).$$

Given the stance dynamics, the collision detection logic ($\theta = \gamma \pm \alpha$), and the collision update, we can numerically simulate this simple model. Doing so reveals something remarkable... the wheel starts rolling, but then one of two things happens: it either runs out of energy and stands still, or it quickly falls into a stable periodic solution. Convergence to this stable periodic solution appears to happen from a huge range of initial conditions. Try it yourself.

```
cd(fullfile(getDrakePath,'examples','RimlessWheel'));
RimlessWheelPlant.run();
```

Optionally you can specify the initial conditions. Here are a few good values to try. (You can
either run the code yourself or click on the links below to see the animation).
One of the fantastic things about the rimless wheel is that the dynamics are exactly the undamped simple pendulum that we've thought so much about. So you will recognize the phase portraits of the system below - they are centered around the unstable fixed point of the simple pendulum. Phase plots of the trajectories from the various initial conditions recommended above are plotted below.

Take a minute to make sure you can follow these trajectories through state space for each of the initial conditions. It's non-trivial, but worth the effort.

A limit cycle is an asymptotically stable or unstable periodic
orbit†

$$\ddot{q} + \mu (q^2 - 1)\dot{q} + q = 0, \quad \mu>0$$ One can think of this system as almost a simple spring-mass-damper system, except that it has nonlinear damping. In particular, the velocity term dissipates energy when $|q|>1$, and adds energy when $|q|<1$. Therefore, it is not terribly surprising to see that the system settles into a stable oscillation from almost any initial conditions (the exception is the state $q=0,\dot{q}=0$). This can be seen nicely in the phase portrait below.

However, as you can see from the plot of system trajectories in the time domain, then a slightly different picture emerges. Although the phase portrait clearly reveals that all trajectories converge to the same orbit, the time domain plot reveals that these trajectories do not necessarily synchronize in time.

The Van der Pol oscillator clearly demonstrates what we would think of as a stable limit cycle, but also exposes the subtlety in defining this limit cycle stability. Neighboring trajectories do not necessarily converge on a stable limit cycle. In contrast, defining the stability of a particular trajectory (parameterized by time) is relatively easy.

Let's make a few quick points about the existence of closed-orbits.
If we can define a closed region of phase space which does not contain
any fixed points, then it must contain a
closed-orbit

One definition for the stability of a limit cycle uses the method of
Poincaré. Let's consider an $n$ dimensional dynamical system,
$\dot{\bx} = {\bf f}(\bx).$ Define an $n-1$ dimensional *surface
of section*, $S$. We will also require that $S$ is transverse to the
flow (i.e., all trajectories starting on $S$ flow through $S$, not
parallel to it). The Poincaré map (or return map) is a mapping from
$S$ to itself: $$\bx_p[n+1] = {\bf P}(\bx_p[n]),$$ where $\bx_p[n]$
is the state of the system at the $n$th crossing of the surface of
section. Note that we will use the notation $\bx_p$ to distinguish the
state of the discrete-time system from the continuous time system;
they are related by $\bx_p[n] = \bx(t_c[n])$, where $t_c[n]$ is the
time of the $n$th crossing of $S$.

Since the full system is two dimensional, the return map dynamics are one dimensional. One dimensional maps, like one dimensional flows, are amenable to graphical analysis. To define a Poincaré section for the Van der Pol oscillator, let $S$ be the line segment where $q = 0$, $\dot{q} > 0$.

If ${\bf P}(\bx_p)$ exists for all $\bx_p \in S$, then this method turns the stability analysis for a limit cycle into the stability analysis of a fixed point on a discrete map. In practice it is often difficult or impossible to find ${\bf P}$ analytically, but it can be obtained quite reasonably numerically. Once ${\bf P}$ is obtained, we can infer local limit cycle stability with an eigenvalue analysis. There will always be a single eigenvalue of 0 - corresponding to perturbations along the limit cycle which do not change the state of first return. The limit cycle is considered locally exponentially stable if all remaining eigenvalues, $\lambda_i$, have magnitude less than one, $|\lambda_i|<1$.

In fact, it is often possible to infer more global stability
properties of the return map by examining ${\bf P}$.
*unimodal* maps.

A particularly graphical method for understanding the dynamics of a one-dimensional iterated map is with the staircase method. Sketch the Poincaré map and also the line of slope one. Fixed points are the crossings with the unity line. Asymptotically stable if $|\lambda| < 1$. Unlike one dimensional flows, one dimensional maps can have oscillations (happens whenever $\lambda < 0$).

We can now derive the angular velocity at the beginning of each
stance phase as a function of the angular velocity of the previous
stance phase. First, we will handle the case where $\gamma \le
\alpha$ and $\dot\theta_n^+ > \omega_1$. The "step-to-step return
map", factoring losses from a single collision, the resulting map is:
$$\dot\theta^{+}_{n+1} = \cos(2\alpha) \sqrt{
({\dot\theta_n}^{+})^{2} + 4\frac{g}{l}\sin\alpha \sin\gamma}.$$
where the $\dot{\theta}^{+}$ indicates the velocity just
*after* the energy loss at impact has occurred.

Using the same analysis for the remaining cases, we can complete the return map. The threshold for taking a step in the opposite direction is $$\omega_2 = - \sqrt{2 \frac{g}{l} [1 - \cos(\alpha + \gamma)]}.$$ For $\omega_2 < \dot\theta_n^{+} < \omega_1,$ we have $$\dot\theta_{n+1}^{+} = -\dot\theta_n^{+} \cos(2\alpha).$$ Finally, for $\dot\theta_n^{+} < \omega_2$, we have $$\dot\theta_{n+1}^{+} = - \cos(2\alpha)\sqrt{(\dot\theta_n^{+})^2 - 4 \frac{g}{l} \sin\alpha \sin\gamma}.$$

Notice that the return map is undefined for $\dot\theta_n = \{ \omega_1, \omega_2 \}$, because from these configurations, the wheel will end up in the (unstable) equilibrium point where $\theta = 0$ and $\dot\theta = 0$, and will therefore never return to the map.

This return map blends smoothly into the case where $\gamma > \alpha$. In this regime, $$\dot\theta_{n+1}^{+} = \begin{cases} \cos(2\alpha) \sqrt{(\dot\theta_n^{+})^2 + 4 \frac{g}{l} \sin\alpha \sin\gamma}, & \text{ } 0 \le \dot\theta_n^{+} \\ -\dot\theta_n^{+} \cos(2\alpha), & \text{ } \omega_2 < \dot\theta_n^{+} < 0 \\ -\cos(2\alpha) \sqrt{(\dot\theta_n^{+})^2 - 4\frac{g}{l} \sin\alpha \sin\gamma}, & \dot\theta_n^{+} \le w_2 \end{cases}.$$ Notice that the formerly undefined points at $\{\omega_1,\omega_2\}$ are now well-defined transitions with $\omega_1 = 0$, because it is kinematically impossible to have the wheel statically balancing on a single leg.

For a fixed point, we require that $\dot\theta_{n+1}^{+} = \dot\theta_n^{+} = \omega^*$. Our system has two possible fixed points, depending on the parameters: $$ \omega_{stand}^* = 0,~~~ \omega_{roll}^* = \cot(2 \alpha) \sqrt{4 \frac{g}{l} \sin\alpha\sin\gamma}.$$ The limit cycle plotted above illustrates a state-space trajectory in the vicinity of the rolling fixed point. $\omega_{stand}^*$ is a fixed point whenever $\gamma < \alpha$. $\omega_{roll}^*$ is a fixed point whenever $\omega_{roll}^* > \omega_1$. It is interesting to view these bifurcations in terms of $\gamma$. For small $\gamma$, $\omega_{stand}$ is the only fixed point, because energy lost from collisions with the ground is not compensated for by gravity. As we increase $\gamma$, we obtain a stable rolling solution, where the collisions with the ground exactly balance the conversion of gravitational potential to kinetic energy. As we increase $\gamma$ further to $\gamma > \alpha$, it becomes impossible for the center of mass of the wheel to be inside the support polygon, making standing an unstable configuration.

The rimless wheel models only the dynamics of the stance leg, and simply assumes that there will always be a swing leg in position at the time of collision. To remove this assumption, we take away all but two of the spokes, and place a pin joint at the hip. To model the dynamics of swing, we add point masses to each of the legs. For actuation, we first consider the case where there is a torque source at the hip - resulting in swing dynamics equivalent to an Acrobot (although in a different coordinate frame).

In addition to the modeling assumptions used for the rimless wheel,
we also assume that the swing leg retracts in order to clear the
ground without disturbing the position of the mass of that leg. This
model, known as the compass gait, is well studied in the literature
using numerical methods

The state of this robot can be described by 4 variables:
$\theta_{st},\theta_{sw},\dot\theta_{st}$, and $\dot\theta_{sw}$.
The abbreviation $st$ is shorthand for the stance leg and $sw$ for
the swing leg. Using $\bq = [ \theta_{sw}, \theta_{st} ]^T$ and $\bu =
\tau$, we can
write the dynamics as $${\bf H}(\bq)\ddot\bq + {\bf
C}(\bq,\dot\bq)\dot\bq + {\bf G}(\bq) = {\bf B}\bu,$$ with \begin{gather*}
{\bf H} = \begin{bmatrix}
mb^2~~ & ~~-mlb\cos(\theta_{st}-\theta_{sw}) \\
-mlb\cos(\theta_{st}-\theta_{sw})~~ & ~~(m_h+m)l^2 + ma^2
\end{bmatrix} \\
{\bf C} = \begin{bmatrix} 0~~ & ~~mlb\sin(\theta_{st}-\theta_{sw})\dot\theta_{st} \\ mlb\sin(\theta_{st}-\theta_{sw})\dot\theta_{sw}~~ & ~~0 \end{bmatrix} \\
{\bf G} = \begin{bmatrix} mbg\sin(\theta_{sw}) \\ -(m_hl + ma +
ml)g\sin(\theta_{st}) \end{bmatrix},\\
{\bf B} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}
\end{gather*}
and $l=a+b$. These equations come straight out of

The foot collision is an instantaneous change of velocity governed by the conservation of angular momentum around the point of impact: $${\bf Q}^+(\alpha) \dot{\bq}^+ = {\bf Q}^-(\alpha)\dot{\bq}^-,$$ where \begin{gather*} {\bf Q}^-(\alpha) = \begin{bmatrix} -mab~~ & ~~-mab + (m_hl^2 + 2mal) \cos(2\alpha)\\ 0~~ & ~~-mab \end{bmatrix} \\ {\bf Q}^+(\alpha) = \begin{bmatrix} mb(b - l\cos(2\alpha))~~ & ~~ml(l-b\cos(2\alpha) + ma^2 + m_hl^2 \\ mb^2~~ & ~~-mbl\cos(2\alpha) \end{bmatrix} \end{gather*} and $\alpha = \frac{\theta_{sw} - \theta_{st}}{2}$.

Numerical integration of these equations reveals a stable limit
cycle, plotted below. The cycle
is composed of a swing phase (top) and a stance phase (bottom),
punctuated by two instantaneous changes in velocity which correspond
to the ground collisions. The dependence of this limit cycle on the
system parameters has been studied extensively in

The basin of attraction of the stable limit cycle is a narrow band of states surrounding the steady state trajectory. Although the simplicity of this model makes it analytically attractive, this lack of stability makes it difficult to implement on a physical device.

To achieve a more anthropomorphic gait, as well as to acquire better
foot clearance and ability to walk on rough terrain, we want to
model a walker that includes knee

At the beginning of each step, the system is modeled as a three-link
pendulum, like the ballistic
walker

After this collision, the knee is locked and we switch to the compass gait model with a different mass distribution. In other words, the system becomes a two-link pendulum. Again, the heel-strike is modeled as an inelastic collision. After the collision, the legs switch instantaneously. After heel-strike then, we switch back to the ballistic walker's three-link pendulum dynamics. This describes a full step cycle of the kneed walker, which is shown above.

By switching between the dynamics of the continuous three-link and two-link pendulums with the two discrete collision events, we characterize a full point-feed kneed walker walking cycle. After finding appropriate parameters for masses and link lengths, a stable gait is found. As with the compass gait's limit cycle, there is a swing phase (top) and a stance phase (bottom). In addition to the two heel-strikes, there are two more instantaneous velocity changes from the knee-strikes as marked in the figure. This limit cycle is traversed clockwise.

The passive dynamic walkers and hopping robots described in the last chapter capture the fundamental dynamics of legged locomotion -- dynamics which are fundamentally nonlinear and punctuated by impulsive events due to making and breaking contact with the environment. But if you start reading the literature on humanoid robots, or many-legged robots like quadrupeds, then you will find a quite different set of ideas taking center stage: ideas like the "zero-moment point" and footstep planning. The goal of this chapter is to penetrate that world.

I'd like to start the discussion with a model that might seem quite far from the world of legged robots, but I think it's a very useful way to think about the problem.

Imagine you have a flying vehicle modeled as a single rigid body in a gravity field with some number of force "thrusters" attached. We'll describe the configuration of the vehicle by its orientation, $\theta$ and the location of its center of mass $(x,z)$. The vector from the center of mass to thruster $i$ is given by $r_i$, yielding the equations of motion: \begin{align*} m\ddot{x} =& \sum_i F_{i,x},\\ m\ddot{z} =& \sum_i F_{i,z} - mg,\\ I\ddot\theta =& \sum_i \left[ r_i \times F_i \right]_y, \end{align*} where I've used $F_{i,x}$ to represent the component of the force in the $x$ direction and have taken just the $y$-axis component of the cross product on the last line.

Our goal is to move the hovercraft to a desired position/orientation and to keep it there. If we take the inputs to be $F_i$, then the dynamics are affine (linear plus a constant term). As such, we can stabilize a stabilizable fixed point using a change of coordinates plus LQR or even plan/track a desired trajectory using time-varying LQR. If I were to add additional linear constraints, for instance constraining $F_{min} \le F_i \le F_{max}$, then I can still use linear model-predictive control (MPC) to plan and stabilize a desired motion. By most accounts, this is a relatively easy control problem. (Note that it would be considerably harder if my control input or input constraints depend on the orientation, $\theta$, but we don't need that here yet).

Now imagine that you just a small number of thrusters (let's say two) each with severe input constraints. To make things more interesting, let us say that you're allowed to move the thrusters, so $\dot{r}_i$ becomes an additional control input, but with some important limitations: you have to turn the thrusters off when they are moving (e.g. $|F_i||\dot{r}_i| = 0$) and there is a limit to how quickly you can move the thrusters $|\dot{r}|_i \le v_{max}$. This problem is now a lot more difficult, due especially to the fact that constraints of the form $u_1 u_2 = 0$ are non-convex.

I find this a very useful thought exercise; somehow our controller needs to make an effectively discrete decision to turn off a thruster and move it. The framework of optimal control should support this - you are sacrificing a short-term loss of control authority for the long-term benefit of having the thruster positioned where you would like, but we haven't developed tools yet that deal well with this discrete plus continuous decision making. We'll need to address that here.

Unfortunately, although the problem is already difficult, we need to continue to add constraints. Now let's say additionally, that the thrusters can only be turned on in certain locations, as cartooned here:

The union of these regions need not form a convex set. Furthermore, these locations could be specified in the robot frame, but may also be specified in the world frame, which I'll call ${\bf p}_i$: $${\bf p}_i = {\bf r}_i - \begin{bmatrix} x \\ 0 \\ z \end{bmatrix}.$$ This problem still feels like it should be tractable, but it's definitely getting hard.

In my view, the hovercraft problem above is a central component of the walking problem. If we consider a walking robot with massless legs, then the feet are exactly like movable thrusters. As above, they are highly constrained - they can only produce force when they are touching the ground, and (typically) they can only produce forces in certain directions, for instance as described by a "friction cone" (you can push on the ground but not pull on the ground, and with Coulomb friction the forces tangential to the ground must be smaller than the forces normal to the ground, as described by a coefficient of friction, e.g. $|F_{\parallel}| < \mu |F_{\perp}|$).

The constraints on where you can put your feet / thrusters will depend on the kinematics of your leg, and the speed at which you can move them will depend on the full dynamics of the leg -- these are difficult constraints to deal with. But the actual dynamics of the rigid body are actually still affine, and still simple!

We don't actually need to have massless legs for this discussion to make sense.
If we use the coordinates $x,z$ to describe the location of the center of mass (CoM) of the
entire robot, and $m$ to represent the entire mass of the robot, then the first two equations
remain unchanged. The center of mass is a configuration dependent point, and does not have an orientation, but one important generalization of the orientation dynamics is given by the centroidal momentum matrix, $A(\bq)$, where $A(\bq)\dot{\bq}$ captures the linear and angular momentum of the robot around the center of mass

In the previous chapter we devoted relatively a lot of attention to dynamics of impact, characterized for instance by a guard that resets dynamics in a hybrid dynamical system. In those models we used impulsive ground-reaction forces (these forces are instantaneously infinite, but doing finite work) in order to explain the discontinuous change in velocity required to avoid penetration with the ground. This story can be extended naturally to the dynamics of the center of mass.

For an articulated robot, though, there are a number of possible strategies for the legs which can effect the dynamics of the center of mass. For example, if the robot hits the ground with a stiff leg like the rimless wheel, then the angular momentum about the point of collision will be conserved, but any momentum going into the ground will be lost. However, if the robot has a spring leg and a massless toe like the SLIP model, then no energy need be lost.

One strategy that many highly-articulated legged robots use is to keep their center of mass at a constant height, $$z = c \quad \Rightarrow \quad \dot{z} = \ddot{z} = 0,$$ and minimize their angular momentum about the center of mass (here $\ddot\theta=0$). Using this strategy, if the swinging foot lands roughly below the center of mass, then even with a stiff leg there is no energy dissipated in the collision - all of the momentum is conserved. This often (but does not always) justify ignoring the impacts in the center of mass dynamics of the system.

While not the only important case, it is extremely common for our robots to be walking over flat, or nearly flat terrain. In this situation, even if the robot is touching the ground in a number of places (e.g., two heels and two toes), the constraints on the center of mass dynamics can be summarized very efficiently.

First, on flat terrain $F_{i,z}$ represents the force that is normal to the surface at contact point $i$. If we assume that the robot can only push on the ground (and not pull), then this implies $$\forall i, F_{i,z} \ge 0 \Rightarrow \sum_i F_{i,z} \ge 0 \Rightarrow \ddot{z} \ge -g.$$ In other words, if I cannot pull on the ground, then my center of mass cannot accelerate towards the ground faster than gravity.

Furthermore, if we use a Coulomb model of friction on the ground, with friction coefficient $\mu$, then $$\forall i, |F_{i,x}| \le \mu F_{i,z} \Rightarrow \sum_i |F_{i,x}| \le \mu \sum_i F_z \Rightarrow |\ddot{x}| \le \mu (\ddot{z} + g).$$ For instance, if I keep my center of mass at a constant height, then $\ddot{z}=0$ and $|\ddot{x}| \le \mu g$; this is a nice reminder of just how important friction is if you want to be able to move around in the world.

Even better, let us define the "center of pressure" (CoP) as the point on the ground where $$x_{cop} = \frac{\sum_i p_{i,x} F_{i,z}}{\sum_i F_{i,z}},$$ and since all $p_{i,z}$ are equal on flat terrain, $z_{cop}$ is just the height of the terrain. It turns out that the center of pressure is a "zero-moment point" (ZMP) -- a property that we will demonstrate below -- and moment-balance equation gives us a very important relationship between the location of the CoP and the dynamics of the CoM: \[ (m\ddot{z} + mg) (x_{cop} - x) = (z_{cop} - z) m\ddot{x} - I\ddot\theta. \] If we use the strategy proposed above for ignoring collision dynamics, $\ddot{z} = \ddot{\theta} = 0$, then we have $z_{cop} - z$ is a constant height $h$, and the result is the famous "ZMP equations": \[ \ddot{x} = -\frac{g}{h} (x_{cop}-x). \] So the location of the center of pressure completely determines the acceleration of the center of mass, and vice versa! What's more, this relationship is affine -- a property that we can exploit in a number of ways.

As an example, we can easily relate constraints on the CoP to constraints on $\ddot{x}$. It is easy to verify from the definition that the CoP must live inside the convex hull of the ground contact points. Therefore, if we use the $\ddot{z}=\ddot\theta=0$ strategy, then this directly implies strict bounds on the possible accelerations of the center of mass given the locations of the ground contacts.

The zero-moment point (ZMP) is discussed very frequently in the current literature
on legged robots. It also has an unfortunate tendency to be surrounded by some confusion; a number of people have defined the ZMP is slightly different ways (see e.g.

First let us recall that for rigid body systems I can always summarize the
contributions from many external forces as a single *wrench* (force and torque) on the object -- this is simply because the $F_i$ terms enter our equations of motion linearly. Specifically, given any point in space, $r$, in coordinates relative to $(x,z)$:

I can re-write the equations of motion as \begin{align*} m\ddot{x} =& \sum_i F_{i,x} = F_{net,x},\\ m\ddot{z} =& \sum_i F_{i,z} - mg = F_{net,z},\\ I\ddot\theta =& \sum_i \left[ r_i \times F_i \right]_y = ({\bf r} \times {\bf F}_{net})_y + \tau_{net}, \end{align*} where ${\bf F}_{net} = \sum_i {\bf F}_i$ and the value of $\tau_{net}$ depends on the location ${\bf r}$. For some choices of ${\bf r}$, we can make $\tau_{net}=0$. Examining \[ ({\bf r} \times {\bf F}_{net})_y = r_z F_{net,x} - r_x F_{net,z} = \left[ r_i \times F_i \right]_y, \] we can see that when $F_{net,z} > 0$ there is an entire line of solutions, $r_x = a r_z + b$, including one that will intercept the terrain. For walking robots, it is this point on the ground from which the external wrench can be described by a single force vector (and no moment) that is the famous "zero-moment point" (ZMP). Substituting the back in to replace $F_{net}$, we can see that \[ r_x = \frac{r_z m \ddot{x} - I \ddot\theta}{m\ddot{z} + mg}. \] If we assume that $\ddot{z}=\ddot{theta}=0$ and replace the relative coordinates with the global coordinates ${\bf r} = {\bf p} - [x,0,z]^T$, then we arrive at precisely the equation presented above.

Furthermore, since
\[\left[ r_i \times F_i \right]_y = \sum_i \left( r_{i,z} F_{i,x} - r_{i,x} F_{i,z} \right), \] and for *flat terrain* we have \[ r_z F_{net,x} = \sum_i r_{i,z} F_{i,x}, \] then we can see that this ZMP is exactly the CoP: \[ r_x = \frac{\sum_i r_{i,x} F_{i,z}}{ F_{net,z} }. \]

In three dimensions, we solve for the point on the ground where $\tau_{net,y} = \tau_{net,x} = 0$, but allow $\tau_{net,z} \ne 0$ to extract the analogous equations in the $y$-axis: \[ r_y = \frac{r_z m \ddot{y} - I \ddot\theta}{m\ddot{z} + mg}. \]

Coming soon. For a description of our approach with Atlas, see

Coming soon. For a description of our approach with Atlas, see

Coming soon. For a description of our approach with Atlas, see

My goals for this chapter are to build intuition for the beautiful and rich behavior of nonlinear dynamical system that are subjected to random (noise/disturbance) inputs. So far we have focused primarily on systems described by \[ \dot{\bx}(t) = f(\bx(t),\bu(t)) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n]). \] In this chapter, I would like to broaden the scope to think about \[ \dot{\bx}(t) = f(\bx(t),\bu(t),\bw(t)) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n],\bw[n]), \] where this additional input $\bw$ is the (vector) output of some random process. In other words, we can begin thinking about stochastic systems by simply understanding the dynamics of our existing ODEs subjected to an additional random input.

This form is extremely general as written. $\bw(t)$ can represent time-varying random disturbances (e.g. gusts of wind), or even constant model errors/uncertainty. One thing that we are not adding, yet, is measurement uncertainty. There is a great deal of work on observability and state estimation that study the question of how you can infer the true state of the system given noise sensor readings. For this chapter we are assuming perfect measurements of the full state, and are focused instead on the way that "process noise" shapes the long-term dynamics of the system.

I will also stick primarily to discrete time dynamics for this chapter, simply because it is
easier to think about the output of a discrete-time random process, $\bw[n]$, than a $\bw(t)$.
But you should know that all of the ideas work in continuous time, too. Also, most of our examples
will take the form of *additive noise*:
\[ \bx[n+1] = f(\bx[n],\bu[n]) + \bw[n], \]
which is a particular useful and common specialization of our general form. And this form
doesn't give up much -- the disturbance on step $k$ can pass through the nonlinear function $f$ on step $k+1$
giving rich results -- but is often much easier to work with.

Let's start by looking at some simple examples.

Let's consider (a time-reversed version of) one of my favorite one-dimensional systems:
\[ \dot{x} = x - x^3. \]

A reasonable discrete-time approximation of these dynamics, now with additive noise, is
\[ x[n+1] = x[n] + h (x[n] - x[n]^3 + w[n]). \]
When $w[n]=0$, this system has the same fixed points and stability properties as the continuous time system.
But let's examine the system when $w[n]$ is instead the result of a *zero-mean Gaussian white noise process*, defined by:
\begin{gather*}
\forall n, E\left[ w[n] \right] = 0,\\
E\left[ w[i]w[j] \right] = \begin{cases} \sigma^2, & \text{ if } i=j,\\ 0, & \text{ otherwise.} \end{cases}
\end{gather*}
Here $\sigma$ is the standard deviation of the Gaussian.

When you simulate this system for small values of $\sigma$, you will see trajectories move roughly towards one of the two fixed points (for the deterministic system), but every step is modified by the noise. In fact, even if the trajectory were to arrive exactly at what was once a fixed point, it is almost surely going to move again on the very next step. In fact, if we plot many runs of the simulation from different initial conditions all on the same plot, we will see something like the figure below.

You can run this example yourself

```
cd(fullfile(getDrakePath,'examples','test'));
options.plot_histogram = false;
options.time_reversed_system = true;
simpleCTParticleDemo(options);
```

During any individual simulation, the state jumps around randomly for all time, and could even transition from the vicinity of one fixed point to the other fixed point. Another way to visualize this output is to animate a histogram of the particles over time.

```
cd(fullfile(getDrakePath,'examples','test'));
options.plot_histogram = true;
options.time_reversed_system = true;
simpleCTParticleDemo(options);
```

Let's take a moment to appreciate the implications of what we've just observed.
Every time that we've analyzed a system to date, we've asked questions like "given x[0], what is the long-term behavior of the system, $\lim_{n\rightarrow\infty} x[n]$?",
but now $x[n]$ is a *random variable*. The trajectories of this system do not converge, and the system does not exhibit any form of stability that we've introduced so far.

All is not lost. If you watch the animation closely, you might notice the *distribution* of this random variable is actually very well behaved. This is the key idea for this chapter.

Let us use $p_n(x)$ to denote the probability density function over the random variable $x$ at time $n$.
Note that this density function must satisfy \[ \int_{-\infty}^\infty p_n(x) dx = 1.\]
It is actually possible to write the *dynamics of the probability density* with the simple relation
\[ p_{n+1}(x) = \int_{-\infty}^\infty p(x|x') p_n(x') dx', \]
where $p(x|x')$ encodes the stochastic dynamics as a conditional distribution of the next state (here $x$) as a function
of the current state (here $x'$). Dynamical systems that can be encoded in this way are known as *continuous-state Markov Processes*,
and the governing equation above is often referred to as the "master equation" for the stochastic process. In fact this update is even linear(!) ; a fact that can enable
closed-form solutions to some impressive long-term statistics, like mean time to failure or first passage times

The slightly more general form of the master equation, which works for multivariable distributions with state-domain ${\cal X}$, and systems with control inputs $\bu$, is \[ p_{n+1}(\bx) = \int_{\cal X} p(\bx|\bx',\bu) p_n(\bx') d\bx'. \] This is a (continuous-state) Markov Decision Process.

In the example above, the histogram is our numerical approximation of the probability density. If you simulate
it a few times, you will probably believe that, although the individual trajectories of the system
*do not* converge, the probability distribution actually *does* converge to what's known
as a *stationary distribution* -- a fixed point of the master equation. Instead of thinking about
the dynamics of the trajectories, we need to start thinking about the dynamics of the distribution.

Let's consider the one-dimensional linear system with additive noise: \[ x[n+1] = a x[n] + w[n]. \] When $w[n]=0$, the system is stable (to the origin) for $-1 < a < 1$. Let's make sure that we can understand the dynamics of the master equation for the case when $w[n]$ is Gaussian white noise with standard deviation $\sigma$.

First, recall that the probability density function of a Gaussian with mean $\mu$ is given by \[ p(w) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(w-\mu)^2}{2\sigma^2}}. \] When conditioned on $x[n]$, the distribution given by the dynamics subjected to mean-zero Gaussian white noise is simply another Gaussian, with the mean given by $ax[n]$: \[ p(x[n+1]|x[n]) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x[n+1]-ax[n])^2}{2\sigma^2}}, \] yielding the master equation \[ p_{n+1}(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty e^{-\frac{(x-ax')^2}{2\sigma^2}} p_n(x') dx'. \]

Now here's the magic. Let's push a distribution, $p_n(x)$, which is zero-mean, with standard deviation $\sigma_n$, through the master equation: \begin{align*} p_{n+1}(x) =& \frac{1}{\sqrt{2 \pi \sigma^2}}\frac{1}{\sqrt{2 \pi \sigma_n^2}} \int_{-\infty}^\infty e^{-\frac{(x-ax')^2}{2\sigma^2}} e^{-\frac{(x')^2}{2\sigma_n^2}} dx',\\ =& \frac{1}{\sqrt{2 \pi (\sigma^2+a^2 \sigma_n^2)}} e^{-\frac{x^2}{2(\sigma^2 + a^2 \sigma_n^2)}}. \end{align*} The result is another mean-zero Gaussian with $\sigma_{n+1}^2 = \sigma^2 + a^2 \sigma_n^2$. This result generalizes to the multi-variate case, too, and might be familiar to you e.g. from the process update of the Kalman filter.

Taking it a step further, we can see that a stationary distribution for this system is given by a mean-zero Gaussian with \[ \sigma_*^2 = \frac{\sigma}{1-a^2}. \] Note that this distribution is well defined when $-1 < a < 1$ (only when the system is stable).

The stationary distribution of the linear Gaussian system reveals the fundamental and general balance between two terms in the governing equations of any stochastic dynamical system: the stability of the deterministic system is bringing trajectories together (smaller $a$ means faster convergence of the deterministic system and results in a more narrow distribution), but the noise in the system is forcing trajectories apart (larger $\sigma$ means larger noise and results in a wider distribution).

Given how rich the dynamics can be for deterministic nonlinear systems, you can probably imagine that the possible long-term dynamics of the probability are also extremely rich. If we simply flip the signs in the dynamics we examined above, we'll get our next example:

Now let's consider the discrete-time approximation of \[ \dot{x} = -x + x^3, \] again with additive noise: \[ x[n+1] = x[n] + h (-x[n] + x[n]^3 + w[n]). \] With $w[n]=0$, the system has only a single stable fixed point (at the origin), whose deterministic region of attraction is given by $x \in (-1,1)$. If we again simulate the system from a set of random initial conditions and plot the histogram, we will see something like the figure below.

Be sure to watch the animation (or run the simulation):

```
cd(fullfile(getDrakePath,'examples','test'));
options.plot_histogram = true;
options.time_reversed_system = false;
simpleCTParticleDemo(options);
```

Now try increasing the noise covariance:

```
cd(fullfile(getDrakePath,'examples','test'));
options.plot_histogram = true;
options.time_reversed_system = false;
options.noise_covariance = 2;
simpleCTParticleDemo(options);
```

Click here to see the animation

What is the stationary distribution for this system? In fact, there isn't one. Although we initially collect probability density around the stable fixed point, you should notice a slow leak -- on every step there is some probability of transitioning past unstable fixed points and getting driven by the unstable dynamics off towards infinity. If we run the simulation long enough, there won't be any probability density left at $x=0$.

One more example; this is a fun one. Let's think about one of the simplest examples that we had for a system that demonstrates limit cycle stability -- the Van der Pol oscillator -- but now we'll add Gaussian white noise, $$\ddot{q} + (q^2 - 1)\dot{q} + q = w(t),$$ Here's the question: if we start with a small set of initial conditions centered around one point on the limit cycle, then what is the long-term behavior of this distribution?

Since the long-term behavior of the deterministic system is periodic, it would be very logical to think that the state distribution for this stochastic system would fall into a stable periodic solution, too. But think about it a little more, and then watch the animation (or run the simulation yourself).

```
cd(fullfile(getDrakePath,'examples'));
VanDerPol.particleDemo();
```

Click here to see the animation (first 20 seconds)

Click here to see the particles after 2000 seconds of simulation

The explanation is simple: the periodic solution of the system is only *orbitally stable*; there
is no stability along the limit cycle. So any disturbances that push the particles along the limit
cycle will go unchecked. Eventually, the distribution will "mix" along the entire cycle.
Perhaps surprisingly, this system that has a limit cycle stability when $w=0$ eventually
reaches a stationary distribution (fixed point) in the master equation.

Coming soon. Discretize, then closed form solutions using the transition matrix. Finite-time verification with sums of squares.

Coming soon. Read the paper

In chapter 2, we spent some time thinking about the phase portrait of the
simple pendulum, and concluded with a challenge: can we design a nonlinear
controller to *reshape* the phase portrait, with a very modest amount of
actuation, so that the upright fixed point becomes globally stable? With
unbounded torque, feedback linearization solutions (e.g., invert gravity) can
work well, but can also require an unnecessarily large amount of control effort.
The energy-based swing-up control solutions presented for the acrobot and
cart-pole systems are considerably more appealing, but required some cleverness
and might not scale to more complicated systems. Here we investigate another
approach to the problem, using computational optimal control to synthesize a
feedback controller directly.

In this chapter, we will introduce optimal control - a control design
process using optimization. This approach is powerful for a number of reasons.
First and foremost, it is very general - allowing us to specify the goal of
control equally well for fully- or under-actuated, linear or nonlinear,
deterministic or stochastic, and continuous or discrete systems. Second, it
permits concise descriptions of potentially very complex desired behaviours,
specifying the goal of control as an scalar objective (plus a list of
constraints). Finally, and most importantly, optimal control is very amenable
to numerical solutions.

The fundamental idea in optimal control is to formulate the goal of
control as the *long-term* optimization of a scalar cost function.
Let's introduce the basic concepts by considering a system that is even
simpler than the simple pendulum.

Consider the double integrator system $$\ddot{q} = u, \quad |u| \le 1.$$ If you would like a mechanical analog of the system (I always do), then you can think about this as a unit mass brick moving along the x-axis on a frictionless surface, with a control input which provides a horizontal force, $u$. The task is to design a control system, $u = \pi(\bx,t)$, $\bx=[q,\dot{q}]^T$ to regulate this brick to $\bx = [0,0]^T$.

In order to formulate this control design problem using optimal control, we must define a scalar objective which scores the long-term performance of running each candidate control policy, $\pi(\bx,t)$, from each initial condition, $(\bx_0,t_0)$, and a list of constraints that must be satisfied. For the task of driving the double integrator to the origin, one could imagine a number of optimal control formulations which would accomplish the task, e.g.:

- Minimum time: $\min_\pi t_f,$ subject to $\bx(t_0) = \bx_0,$ $\bx(t_f) = {\bf 0}.$
- Quadratic cost: $\min_\pi \int_{t_0}^{\infty} \left[ \bx^T(t) {\bf Q} \bx(t) \right] dt,$ ${\bf Q}\succ0$.

where the first is a constrained optimization formulation which optimizes time, and the second accrues a penalty at every instance according to the distance that the state is away from the origin (in a metric space defined by the matrix ${\bf Q}$), and therefore encourages trajectories that go more directly towards the goal, possibly at the expense of requiring a longer time to reach the goal (in fact it will result in an exponential approach to the goal, where as the minimum-time formulation will arrive at the goal in finite time). Note that both optimization problems are only well defined if it is possible for the system to actually reach the origin, otherwise the minimum-time problem cannot satisfy the terminal constraint, and the integral in the quadratic cost would not converge to a finite value as time approaches infinity (fortunately the double integrator system is controllable, and therefore can be driven to the goal in finite time).

Note that the input limits, $|u|\le1$ are also required to make this problem well-posed; otherwise both optimizations would result in the optimal policy using infinite control input to approach the goal infinitely fast. Besides input limits, another common approach to limiting the control effort is to add an additional quadratic cost on the input (or "effort"), e.g. $\int \left[ \bu^T(t) {\bf R} \bu(t) \right] dt,$ ${\bf R}\succ0$. This could be added to either formulation above. We will examine many of these formulations in some details in the examples worked out at the end of this chapter.

Optimal control has a long history in robotics. For instance, there has been a great deal of work on the minimum-time problem for pick-and-place robotic manipulators, and the linear quadratic regulator (LQR) and linear quadratic regulator with Gaussian noise (LQG) have become essential tools for any practicing controls engineer. With increasingly powerful computers and algorithms, the popularity of numerical optimal control has grown at an incredible pace over the last few years.

For more intuition, let's do an informal derivation of the solution to the minimum time problem for the double integrator with input constraints: \begin{align*} \minimize_{\pi} \quad & t_f\\ \subjto \quad & \bx(t_0) = \bx_0, \\ & \bx(t_f) = {\bf 0}, \\ & \ddot{q}(t) = u(t), \\ & |u| \le 1. \end{align*} What behavior would you expect an optimal controller exhibit?

Your intuition might tell you that the best thing that the brick can
do, to reach the goal in minimum time with limited control input, is
to accelerate maximally towards the goal until reaching a critical
point, then hitting the brakes in order to come to a stop exactly at
the goal. This would be called a *bang-bang* control policy;
these are often optimal for systems with bounded input, and it is in
fact optimal for the double integrator, although we will not prove it
until we have developed more tools.

Let's work out the details of this bang-bang policy. First, we can figure out the states from which, when the brakes are fully applied, the system comes to rest precisely at the origin. Let's start with the case where $q(0) < 0$, and $\dot{q}(0)>0$, and "hitting the brakes" implies that $u=-1$ . Integrating the equations, we have \begin{gather*} \ddot{q}(t) = u = -1 \\\dot{q}(t) = \dot{q}(0) - t \\ q(t) = q(0) + \dot{q}(0) t - \frac{1}{2} t^2. \end{gather*} Substituting $t = \dot{q}(0) - \dot{q}$ into the solution give $\dot{q}$ reveals that the system orbits are parabolic arcs: \[ q = -\frac{1}{2} \dot{q}^2 + c_{-}, \] with $c_{-} = q(0) + \frac{1}{2}\dot{q}^2(0)$.

Similarly, the solutions for $u=1$ are $q = \frac{1}{2} \dot{q}^2 + c_{+}$, with $c_{+}=q(0)-\frac{1}{2}\dot{q}^2(0)$.

Perhaps the most important of these orbits are the ones that pass directly through the origin (e.g., $c_{-}=0$). Following our initial logic, if the system is going slower than this $\dot{q}$ for any $q$, then the optimal thing to do is to slam on the accelerator ($u=-\text{sgn}(q)$). If it's going faster than the $\dot{q}$ that we've solved for, then still the best thing to do is to brake; but inevitably the system will overshoot the origin and have to come back. We can summarize this policy with: \[ u = \begin{cases} +1 & \text{if } (\dot{q} < 0 \text{ and } q \le \frac{1}{2} \dot{q}^2) \text{ or } (\dot{q}\ge 0 \text{ and } q < -\frac{1}{2} \dot{q}^2) \\ 0 & \text{if } q=0 \text{ and } \dot{q}=0 \\ -1 & \text{otherwise} \end{cases} \]

and illustrate some of the optimal solution trajectories:And for completeness, we can compute the optimal time to the goal by solving for the amount of time required to reach the switching surface plus the amount of time spent moving along the switching surface to the goal. With a little algebra, you will find that the time to goal, $T(\bx)$, is given by \[ T(\bx) = \begin{cases} 2\sqrt{\frac{1}{2}\dot{q}^2-q} - \dot{q} & \text{for } u=+1 \text{ regime}, \\ 0 & \text{for } u=0, \\ \dot{q} + 2\sqrt{\frac{1}{2}\dot{q}^2+q} & \text{for } u=-1, \end{cases} \] plotted here:

Notice that the function is continuous (though not smooth), even though the policy is discontinuous.
As we begin to develop theoretical and algorithmic tools for optimal control,
we will see that some formulations are much easier to deal with than others.
One important example is the dramatic simplification that can come from
formulating objective functions using *additive
cost*, because they often yield recursive solutions. In the additive cost formulation, the long-term "score"
for a trajectory can be written as $$\int_0^T g(x(t),u(t)) dt,$$ where
$g()$ is the instantaneous cost, and $T$ can be either a finite
real number or $\infty$. We will call a problem specification with a finite $T$ a
"finite-horizon" problem, and $T=\infty$ an "infinite-horizon"
problem. Problems and solutions for infinite-horizon problems tend to be more elegant, but
care is required to make sure that the integral converges for the
optimal controller (typically by having an achievable goal that allows the robot to accrue zero-cost).

At first glance, our minimum-time problem formulation for the double integrator does not look like an additive cost problem. However, we can write in as an additive cost problem using an infinite horizon and the instantaneous cost $$g(x,u) = \begin{cases} 0 & \text{if } x=0, \\ 1 & \text{otherwise.} \end{cases}$$

We will examine a number of approaches to solving optimal control
problems throughout the next few chapters. For the remainder of this chapter, we will
focus on additive-cost problems and their solution via *dynamic programming*.

For systems with continuous states and continuous actions, dynamic
programming is a set of theoretical ideas surrounding additive cost optimal
control problems. For systems with a finite, discrete set of states and a
finite, discrete set of actions, dynamic programming also represents a set of very
efficient numerical *algorithm* which can compute optimal feedback
controllers. Many of you will have learned it before as a tool for graph search.

Imagine you have a directed graph with states (or nodes) $\{s_1,s_2,...\} \in S$ and "actions" associated with edges labeled as $\{a_1,a_2,...\} \in A$, as in the following trivial example:

Let us also assume that each edge has an associate weight or cost, using $g(s,a)$ to denote the cost of being in state $s$ and taking action $a$. Furthermore we will denote the transition "dynamics" using \[ s[n+1] = f(s[n],a[n]). \] For instance, in the graph above, $f(s_1,a_1) = s_2$.

There are many algorithms for finding (or approximating) the optimal path
from a start to a goal on directed graphs. In dynamic programming, the key
insight is that we can find the shortest path from every node by solving
recursively for the optimal *cost-to-go* from every node to the goal. One
such algorithm starts by initializing an estimate $\hat{J}^*=0$ for all $s_i$,
then proceeds with an iterative algorithm which sets \begin{equation}
\hat{J}^*(s_i) \Leftarrow \min_{a \in A} \left[ g(s_i,a) +
\hat{J}^*\left({f(s_i,a)}\right) \right]. \label{eq:value_update} \end{equation}
In software, $\hat{J}^*$ can be represented as a vector with dimension equal to
the number of discrete states. This algorithm, appropriately known as *value
iteration*, is guaranteed to converge to the optimal cost-to-go, $\hat{J}^*
\rightarrow J^*$ *batch* -- e.g. the estimate is updated
for all $i$ at once -- but the *asynchronous* version where states are
updated one at a time is also known to converge, so long as every state is
eventually updated infinitely often. Assuming the graph has a goal state with a
zero-cost self-transition, then this cost-to-go function represents the weighted
shortest distance to the goal.

Value iteration is an amazingly simple algorithm, but it accomplishes
something quite amazing: it efficiently computes the long-term cost of an
optimal policy from *every* state by iteratively evaluating the one-step
cost. If we know the optimal cost-to-go, then it's easy to extract the optimal
policy, $a = \pi^*(s)$: \begin{equation} \pi^*(s_i) = \argmin_a \left[ g(s_i,a) +
J^*\left( f(s_i,a) \right) \right]. \label{eq:policy_update} \end{equation}
It's a simple algorithm, but playing with an example can help our intuition.

Imagine a robot living in a grid (finite state) world. Wants to get to the goal
location. Possibly has to negotiate cells with obstacles. Actions
are to move up, down, left, right, or do nothing.

You can experiment with the grid world example in

```
cd(fullfile(getDrakePath,'examples'));
GridWorld.runValueIteration();
```

I recommend trying to edit the cost function and obstacles.

TODO: figure/text for graph approximation of a continuous state space.

You can run value iteration for the double integrator (using barycentric interpolation to interpolate between nodes) in

```
cd(fullfile(getDrakePath,'examples'));
DoubleIntegrator.runValueIteration();
```

Again, you can easily try different cost functions by editing the code yourself.

Let's take a minute to appreciate how amazing this is. Our solution to finding the optimal controller for the double integrator wasn't all that hard, but it required some mechanical intuition and solutions to differential equations. The resulting policy was non-trivial -- bang-bang control with a parabolic switching surface. The value iteration algorithm doesn't use any of this directly -- it's a simple algorithm for graph search. But remarkably, it can generate effectively the same policy with just a few moments of computation.

It's important to note that there *are* some differences between the computed
policy and the optimal policy that we derived, due to discretization errors. We
will ask you to explore these in the problems.

The real value of this numerical solution, however, is unlike our analytical solution for the double integrator, we can apply this same algorithm to any number of dynamical systems virtually without modification. Let's apply it now to the simple pendulum, which was intractable analytically.

You can run value iteration for the simple pendulum (using barycentric interpolation to interpolate between nodes) in

```
cd(fullfile(getDrakePath,'examples','Pendulum'));
runValueIteration();
```

Again, you can easily try different cost functions by editing the code yourself.

I find the graph search algorithm extremely satisfying as a first step, but also become quickly frustrated by the limitations of the discretization required to use it. In many cases, we can do better; coming up with algorithms which work more natively on continuous dynamical systems. We'll explore those extensions in this section.

It's important to understand that the value iteration equations, equations (\ref{eq:value_update}) and (\ref{eq:policy_update}),
are more than just an algorithm. They are also sufficient conditions for optimality:
if we can produce a $J^*$ and $\pi^*$ which satisfy these equations, then $\pi^*$ must
be an optimal controller. There are an analogous set of conditions for the continuous
systems. For a system $\dot{\bx} = f(\bx,\bu)$ and an infinite-horizon
additive cost $\int_0^\infty g(\bx,\bu)dt$, we have:
\begin{gather} 0 = \min_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu) \right], \label{eq:HJB} \\ \pi^*(\bx) = \argmin_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu) \right]. \end{gather}
Equation \ref{eq:HJB} is known as the *Hamilton-Jacobi-Bellman* (HJB) equation.

The following statement is adapted from Proposition 3.2.1 of

As a tool for verifying optimality, the HJB equations are actually surprisingly easy to work with: we can verify optimality for an infinite-horizon objective without doing any integration; we simply have to check a derivative condition on the optimal cost-to-go function $J^*$. Let's see this play out on the double integrator example.

Consider the problem of regulating the double integrator (this time without input limits) to the origin using a quadratic cost: $$ g(\bx,\bu) = q^2 + \dot{q}^2 + u^2. $$ I claim (without derivation) that the optimal controller for this objective is $$\pi(\bx) = -q - \sqrt{3}\dot{q}.$$ To convince you that this is indeed optimal, I have produced the following cost-to-go function: $$J(\bx) = \sqrt{3} q^2 + 2 q \dot{q} + \sqrt{3} \dot{q}^2.$$

Taking \begin{gather*} \pd{J}{q} = 2\sqrt{3} q + 2\dot{q}, \qquad \pd{J}{\dot{q}} = 2q + 2\sqrt{3}\dot{q}, \end{gather*} we can write \begin{align*} g(\bx,\bu) + \pd{J}{\bx}f(\bx,\bu) &= q^2 + \dot{q}^2 + u^2 + (2\sqrt{3} q + 2\dot{q}) \dot{q} + (2q + 2\sqrt{3}\dot{q}) u \end{align*} This is a convex quadratic function in $u$, so we can find the minimum with respect to $u$ by finding where the gradient with respect to $u$ evaluates to zero. \[ \pd{}{u} \left[ g(\bx,\bu) + \pd{J}{\bx} f(\bx,\bu) \right] = 2u + 2q + 2\sqrt{3}\dot{q}. \] Setting this equal to $0$ and solving for $u$ yields: $$u^* = -q - \sqrt{3} \dot{q},$$ thereby confirming that our policy $\pi$ is in fact the minimizer. Substituting $u^*$ back into the HJB reveals that the right side does in fact simplify to zero. I hope you are convinced!

Note that evaluating the HJB for the time-to-go of the minimum-time problem for the double integrator will also reveal that the HJB is satisfied wherever that gradient is well-defined. This is certainly mounting evidence in support of our bang-bang policy being optimal, but since $\pd{J}{\bx}$ is not defined everywhere, it does not actually satisfy the requirements of the sufficiency theorem as stated above.

We still face a few barriers to actually using the HJB in an algorithm. The first barrier is the minimization over $u$. When the action set was discrete, as in the graph search version, we could evaluate the one-step cost plus cost-to-go for every possible action, and then simply take the best. For continuous action spaces, in general we cannot rely on the strategy of evaluating a finite number of possible $\bu$'s to find the minimizer.

All is not lost. In the quadratic cost double integrator example above, we were able to solve explicitly for the minimizing $\bu$ in terms of the cost-to-go. It turns out that this strategy will actually work for a number of the problems we're interested in, even when the system (which we are given) or cost function (which we are free to pick, but which should be expressive) gets more complicated.

Recall that I've already tried to convinced you that a majority of the
systems of interest are *control affine*, e.g. I can write \[
f(\bx,\bu) = f_1(\bx) + f_2(\bx)\bu. \] We can make another dramatic
simplification by restrict ourselves to instantaneous cost functions of the form \[
g(\bx,\bu) = g_1(\bx) + \bu^T {\bf R} \bu, \qquad {\bf R}={\bf R}^T
\succ 0. \] In my view, this is not very restrictive - many of the cost
functions that I find myself choosing to write down can be expressed in this
form. Given these assumptions, we can write the HJB as \[ 0 = \min_{\bu} \left[
g_1(\bx) + \bu^T {\bf R} \bu + \pd{J}{\bx} \left[ f_1(\bx) +
f_2(\bx)\bu \right]\right]. \] Since this is a quadratic function in $\bu$, if
the system does not have any constraints on $\bu$, then we can solve in closed-form
for the minimizing $\bu$ by taking the gradient of the right-hand size:
\[ \pd{}{\bu} = 2\bu^T {\bf R} + \pd{J}{\bx} f_2(\bx) = 0, \]
and setting it equal to zero to obtain \[ \bu^* = -\frac{1}{2}{\bf R}^{-1}f_2^T(\bx) \pd{J}{\bx}^T.\]
If there are linear constraints on the input, such as torque limits, then more generally
this could be solved (at any particular $\bx$) as a quadratic program.

What happens in the case where our system is not control affine or if we
really do need to specify an instantaneous cost function on $\bu$ that is not
simply quadratic? If the goal is to produce an iterative algorithm, like value
iteration, then one common approach is to make a (positive-definite) quadratic approximation in
$\bu$ of the HJB, and updating that approximation on every iteration of the
algorithm. This broad approach is often referred to as *differential
dynamic programming* (c.f.

The other major barrier to using the HJB in a value iteration algorithm is that the estimated optimal cost-to-go function, $\hat{J}^*$, must somehow be represented with a finite set of numbers, but we don't yet know anything about the potential form it must take. In fact, knowing the time-to-goal solution for minimum-time problem with the double integrator, we see that this function might need to be non-smooth for even very simple dynamics and objectives.

One natural way to parameterize $\hat{J}^*$ -- a scalar valued-function defined over the state space -- is to define the values on a mesh. This approach then admits algorithms with close ties to the relatively very advanced numerical methods used to solve other partial differential equations (PDEs), such as the ones that arrive in finite element modeling or fluid dynamics. One important difference, however, is that our PDE lives in the dimension of the state space, while many of the mesh algorithms from the other disciplines are optimized for two or three dimensional space.

A slightly more general view of the problem would describe the mesh (and the associated interpolation functions) as just one form of representations for function approximation.

If we approximate $J^*$ with a finitely-parameterized function $\hat{J}_\balpha^*$, with parameter vector $\balpha$, then this immediately raises many important questions:

- What if the true cost-to-go function does not live in the prescribed function class; e.g., there does not exist an $\balpha$ which satisfies the sufficiency conditions for all $\bx$? (This seems very likely to be the case.)
- What update should we apply in the iterative algorithm?
- What can we say about it's convergence?

In general, many of these questions will have to go unanswered. But there are some answers available
for the special case of *linear function approximators*. A linear function approximator takes the form: \[ \hat{J}^*_\balpha(\bx) = \sum_i \alpha_i \psi_i(\bx) = \balpha^T {\bf\psi}(\bx), \]
where ${\bf \psi}(\bx)$ is a vector of potentially nonlinear features. Common examples of features include
polynomials, radial basis functions, or the most interpolation schemes used on a mesh.

One of the most amazing features of the dynamic programming, additive cost approach to optimal control is the relative ease with which it extends to optimizing stochastic systems.

For discrete systems, we can generalize our dynamics on a graph by adding action-dependent transition probabilities to the edges. This new dynamical system is known as a Markov Decision Process (MDP), and we write the dynamics completely in terms of the transition probabilities \[ \Pr(s[n+1] = s' | s[n] = s, a[n] = a). \] For discrete systems, this is simply a big lookup table. The cost that we incur for any execution of system is now a random variable, and so we formulate the goal of control as optimizing the expected cost, e.g. \[ J^*(s[0]) = \min_{a[\cdot]} E \left[ \sum_{n=0}^\infty g(s[n],a[n]) \right]. \] Note that there are many other potential objectives, such as minimizing the worst-case error, but the expected cost is special because it preserves the dynamic programming recursion: \[ J^*(s) = \min_a E \left[ g(s,a) + J^*(s')\right] = \min_a \left[ g(s,a) + \sum_{s'} \Pr(s'|s,a) J^*(s') \right].\] Remarkably, if we use these optimality conditions to construct our value iteration algorithm \[ \hat{J}(s) \Leftarrow \min_a \left[ g(s,a) + \sum_{s'} \Pr(s'|s,a) \hat{J}(s') \right],\] then this algorithm has the same strong convergence guarantees of its counterpart for deterministic systems. And it is essentially no more expensive to compute!

There is a particularly cute trick that takes advantage of the computational simplicity of the discrete MDP value iteration algorithm. Let's assume that we have discrete control inputs and discrete-time dynamics, but a continuous state space. Imagine that we use a mesh to approximate the cost-to-go function over that state space with $K$ mesh points $\hat{J}(x_k)$. We would like to perform the value iteration update: \[ \forall k, \hat{J}^*(x_k) \Leftarrow \min_{a \in A} \left[ g(x_k,a) + \hat{J}^*\left({f(x_k,a)}\right) \right], \] but must deal with the fact that $f(x_k,a)$ might not result in a next state that is directly at a mesh point. Most interpolation schemes for a mesh can be written as some weighted combination of the values at nearby mesh points, e.g. \[ \hat{J}^*(x) = \sum_i \beta_i \hat{J}^*(x_i), \quad \sum_i \beta_i = 1 \] with $\beta_i$ the relative weight of the $i$th mesh point. The resulting update is exactly the same as if we treated the system as a discrete state MDP with $\beta_i$ representing the probability of transitioning to state $x_i$!

If you looked at the code powering the numerical value iteration results above, you will see that we used precisely this trick -- the continuous problem is discretized into the corresponding MDP and then solved as a discrete system. This results in nice clean code, but also sheds light on the impact of discretization on the solutions -- discretization error here will cause a sort of diffusion corresponding to the probability of spreading across neighboring nodes.

Note that the update above could also be viewed as an approximation of the fitted value iteration algorithm for linear function approximators described above. It is missing a few terms, but is easier to implement and remarkably, is still known to converge.

While solving the dynamic programming problem for continuous systems is very hard in general, there are a few very important special cases where the solutions are very accessible. Most of these involve variants on the case of linear dynamics and quadratic cost. The simplest case, called the linear quadratic regulator (LQR), is formulated as stabilizing a time-invariant linear system to the origin.

The linear quadratic regulator is likely the most important and influential result in optimal control theory to date. In this chapter we will derive the basic algorithm and a variety of useful extensions.

Consider a linear time-invariant system in state-space form, $$\dot{\bx} = {\bf A}\bx + {\bf B}\bu,$$ with the infinite-horizon cost function given by $$J = \int_0^\infty \left[ \bx^T {\bf Q} \bx + \bu^T {\bf R} \bu \right] dt, \quad {\bf Q} = {\bf Q}^T \ge {\bf 0}, {\bf R} = {\bf R}^T > 0.$$ Our goal is to find the optimal cost-to-go function $J^*(\bx)$ which satisfies the HJB: $$\forall \bx, \quad 0 = \min_\bu \left[ \bx^T {\bf Q} \bx + \bu^T {\bf R} \bu + \pd{J^*}{\bx} \left( {\bf A}\bx + {\bf B}\bu \right) \right].$$

There is one magic step here -- it is well known that for this problem the optimal cost-to-go function is quadratic. This is easy to verify. Let us choose the form: $$J^*(\bx) = \bx^T {\bf S} \bx, \quad {\bf S} = {\bf S}^T \succeq 0.$$ The gradient of this function is $$\pd{J^*}{\bx} = 2 \bx^T {\bf S}.$$

Since we have guaranteed, by construction, that the terms inside the $\min$ are quadratic and convex (because ${\bf R} \succ 0$), we can take the minimum explicitly by finding the solution where the gradient of those terms vanishes: $$\pd{}{\bu} = 2\bu^T {\bf R} + 2 \bx^T {\bf S} {\bf B} = 0.$$ This yields the optimal policy $$\bu^* = \pi^*(\bx) = - {\bf R}^{-1} {\bf B}^T {\bf S} \bx = - {\bf K} \bx.$$

Inserting this back into the HJB and simplifying yields
$$0 = \bx^T \left[ {\bf Q} - {\bf S B R}^{-1}{\bf B}^T{\bf S} + 2{\bf SA} \right]\bx.$$
All of the terms here are symmetric except for the $2{\bf SA}$, but since
$\bx^T{\bf SA}\bx = \bx^T{\bf A}^T{\bf S}\bx$, we can write
$$0 = \bx^T \left[ {\bf Q} - {\bf S B R}^{-1}{\bf B}^T{\bf S} + {\bf SA} + {\bf A}^T{\bf S} \right]\bx.$$
and since this condition must hold for all $\bx$, it is sufficient to consider
the matrix equation
$$0 = {\bf S}
{\bf A} + {\bf A}^T {\bf S} - {\bf S} {\bf B} {\bf R}^{-1} {\bf B}^T {\bf S} +
{\bf Q}.$$ This extremely important equation is a version of the *algebraic Riccati equation*.
Note that it is quadratic in ${\bf S}$, making its solution non-trivial,
but it is well known that the equation has a single positive-definite solution if and only
if the system is controllable and
there are good numerical methods for finding that solution, even in high-dimensional
problems. Both the optimal
policy and optimal cost-to-go function are available from MATLAB's Control Systems Toolbox by calling ```
[K,S] = lqr(A,B,Q,R)
```

.

It is worth examining the form of the optimal policy more closely. Since the value function represents cost-to-go, it would be sensible to move down this landscape as quickly as possible. Indeed, $-{\bf S}\bx$ is in the direction of steepest descent of the value function. However, not all directions are possible to achieve in state-space. $-{\bf B}^T {\bf S} \bx$ represents precisely the projection of the steepest descent onto the control space, and is the steepest descent achievable with the control inputs $\bu$. Finally, the pre-scaling by the matrix ${\bf R}^{-1}$ biases the direction of descent to account for relative weightings that we have placed on the different control inputs. Note that although this interpretation is straight-forward, the slope that we are descending (in the value function, ${\bf S}$) is a complicated function of the dynamics and cost.

LQR is extremely relevant to us even though our primary interest is in nonlinear dynamics, because it can provide a local approximation of the optimal control solution for the nonlinear system. Given the nonlinear system $\dot{\bx} = f(\bx,\bu)$, and a stabilizable operating point, $(\bx_0,\bu_0)$, with $f(\bx_0,\bu_0) = 0$. We can define a relative coordinate system $$\bar\bx = \bx - \bx_0, \quad \bar\bu = \bu - \bu_0,$$ and observe that $$\dot{\bar\bx} = \dot{\bx} = f(\bx,\bu),$$ which we can approximate with a first-order Taylor expansion to $$\dot{\bar\bx} \approx f(\bx_0,\bu_0) + \pd{f(\bx_0,\bu_0)}{\bx} (\bx - \bx_0) + \pd{f(\bx_0,\bu_0)}{\bu} (\bu - \bu_0) = {\bf A}\bar{\bx} + {\bf B}\bar\bu.$$

Similarly, we can define a quadratic cost function in the error coordinates, or take a (positive-definite) second-order approximation of a nonlinear cost function about the operating point (linear and constant terms in the cost function can be easily incorporated into the derivation by parameterizing a full quadratic form for $J^*$, as seen in the Linear Quadratic Tracking derivation below).

The resulting controller takes the form $\bar\bu^* = -{\bf K}\bar\bx$ or $$\bu^* = \bu_0 - {\bf K} (\bx - \bx_0).$$

Consider
systems governed by an LTI state-space equation of the form
$$\dot{\bx} = {\bf A}\bx + {\bf B}\bu,$$ and a finite-horizon cost
function with
\begin{gather*}
h(\bx) = \bx^T {\bf Q}_f \bx,\quad {\bf Q}_f = {\bf Q}_f^T \ge {\bf 0}
\\ g(\bx,\bu,t) = \bx^T {\bf Q} \bx + \bu^T {\bf R} \bu, \quad {\bf Q} =
{\bf Q}^T \ge 0, {\bf R}={\bf R}^T>0
\end{gather*}
Writing the HJB, we have
$$ 0 = \min_\bu \left[\bx^T {\bf Q} \bx +
\bu^T {\bf R}\bu + \pd{J^*}{\bx} \left({\bf A} \bx + {\bf B} \bu
\right) + \pd{J^*}{t} \right]. $$ Due to the positive definite
quadratic form on $\bu$, we can find the minimum by setting the
gradient to zero:
\begin{gather*}
\pd{}{\bu} = 2 \bu^T {\bf R} + \pd{J^*}{\bx} {\bf B} = 0 \\
\bu^* = \pi^*(\bx,t) = - \frac{1}{2}{\bf R}^{-1} {\bf B}^T \pd{J^*}{\bx}^T
\end{gather*}
In order to proceed, we need to investigate a particular form for the
cost-to-go function, $J^*(\bx,t)$. Let's try a solution of the form:
$$J^*(\bx,t) = \bx^T {\bf S}(t) \bx, \quad {\bf S}(t) = {\bf S}^T(t) >
{\bf 0}.$$ In this case we have
$$\pd{J^*}{\bx} = 2 \bx^T {\bf S}(t), \quad \pd{J*}{t} = \bx^T
\dot{\bf S}(t) \bx,$$ and therefore
\begin{gather*}
\bu^* = \pi^*(\bx,t) = - {\bf R}^{-1} {\bf B}^T {\bf S}(t) \bx \\
0 = \bx^T \left[ {\bf Q} - {\bf S}(t) {\bf B} {\bf R}^{-1} {\bf B}^T {\bf
S}(t) + {\bf S}(t) {\bf A} + {\bf A}^T {\bf S}(t) + \dot{\bf S}(t)
\right] \bx.\end{gather*} Therefore, ${\bf S}(t)$ must satisfy the condition
(known as the continuous-time *differential Riccati equation*):
$$-\dot{\bf S}(t) = {\bf S}(t) {\bf A} + {\bf A}^T {\bf S}(t) - {\bf
S}(t) {\bf B} {\bf R}^{-1} {\bf B}^T {\bf S}(t) + {\bf Q},$$ and the
terminal condition $${\bf S}(T) = {\bf Q}_f.$$ Since we were able to
satisfy the HJB with the minimizing policy, we have met the
sufficiency condition, and have found the optimal policy and optimal
cost-to-go function.

Note that the infinite-horizon LQR solution described in the prequel is exactly the steady-state solution of this equation, defined by $\dot{\bf S}(t) = 0$. Indeed, for controllable systems this equation is stable (backwards in time), and as expected the finite-horizon solution converges on the infinite-horizon solution as the horizon time limits to infinity.

The derivation above holds even if the dynamics are given by $$\dot{\bx} = {\bf A}(t)\bx + {\bf B(t)}\bu.$$ Similarly, the cost functions ${\bf Q}$ and ${\bf R}$ can also be time-varying. This is quite surprising, as the class of time-varying linear systems is a quite general class of systems. It requires essentially no assumptions on how the time-dependence enters, except perhaps that if ${\bf A}$ or ${\bf B}$ is discontinuous in time then one would have to use the proper techniques to accurately integrate the differential equation. As we will see in the next chapter, one of the most powerful applications involves linearizing around a nominal trajectory of a nonlinear system and using LQR to provide a trajectory controller.

For completeness, we consider a slightly more general form of the linear quadratic regulator. The standard LQR derivation attempts to drive the system to zero. Consider now the problem: \begin{gather*} \dot{\bx} = {\bf A}\bx + {\bf B}\bu \\ h(\bx) = (\bx - \bx^d(T))^T {\bf Q}_f (\bx - \bx^d(T)), \quad {\bf Q}_f = {\bf Q}_f^T \ge 0 \\ g(\bx,\bu,t) = (\bx - \bx^d(t))^T {\bf Q} (\bx - \bx^d(t)) + (\bu - \bu^d(t))^T {\bf R} (\bu - \bu^d(t)),\\ {\bf Q} = {\bf Q}^T \ge 0, {\bf R}={\bf R}^T>0 \end{gather*} Now, guess a solution \begin{gather*} J^*(\bx,t) = \bx^T {\bf S_2}(t) \bx + \bx^T {\bf s_1}(t) + s_0(t), \quad {\bf S_2}(t) = {\bf S_2}^T(t) > {\bf 0}. \end{gather*} In this case, we have $$\pd{J^*}{\bx} = 2 \bx^T {\bf S_2}(t) + {\bf s_1}^T(t),\quad \pd{J^*}{t} = \bx^T \dot{\bf S_2}(t) \bx + \bx^T \dot{\bf s_1}(t) + \dot{s_0}(t).$$ Using the HJB, $$ 0 = \min_\bu \left[(\bx - \bx^d(t))^T {\bf Q} (\bx - \bx^d(t)) + (\bu - \bu^d(t))^T {\bf R} (\bu - \bu^d(t)) + \pd{J^*}{\bx} \left({\bf A} \bx + {\bf B} \bu \right) + \pd{J^*}{t} \right], $$ we have \begin{gather*} \pd{}{\bu} = 2 (\bu - \bu^d(t))^T{\bf R} + (2\bx^T{\bf S_2}(t) + {\bf s_1}^T(t)){\bf B} = 0,\\ \bu^*(t) = \bu^d(t) - {\bf R}^{-1} {\bf B}^T\left[{\bf S_2}(t)\bx + \frac{1}{2}{\bf s_1}(t)\right] \end{gather*} The HJB can be satisfied by integrating backwards \begin{align*} -\dot{\bf S_2}(t) =& {\bf Q} - {\bf S_2}(t) {\bf B} {\bf R}^{-1} {\bf B}^T {\bf S_2}(t) + {\bf S_2}(t) {\bf A} + {\bf A}^T {\bf S_2}(t)\\ -\dot{\bf s_1}(t) =& -2 {\bf Q} \bx^d(t) + \left[{\bf A}^T - {\bf S}_2 {\bf B} {\bf R}^{-1} {\bf B}^T \right]{\bf s_1}(t) + 2{\bf S_2}(t) {\bf B} \bu^d(t)\\ -\dot{s_0}(t) =& \bx^d(t)^T {\bf Q} \bx^d(t) - \frac{1}{4} {\bf s_1}^T(t) {\bf B} {\bf R}^{-1} {\bf B}^T {\bf s_1}(t) + {\bf s_1}(t)^T {\bf B} \bu^d(t), \end{align*} from the final conditions \begin{align*} {\bf S_2}(T) =& {\bf Q}_f\\ {\bf s_1}(T) =& -2 {\bf Q}_f \bx^d(T) \\ s_0(T) =& \left[\bx^d(T)\right]^T {\bf Q}_f \left[ \bx^d(T) \right]. \end{align*} Notice that the solution for ${\bf S_2}$ is the same as the simpler LQR derivation, and is symmetric (as we assumed). Note also that $s_0(t)$ has no effect on control (even indirectly), and so can often be ignored.

A quick observation about the quadratic form, which might be helpful in debugging. We know that $J(x,t)$ must be uniformly positive. This is true iff ${\bf S}_2>0$ and $s_0 > \frac{1}{4} {\bf s}_1^T {\bf S}_2^{-1} {\bf s}_1$, which comes from evaluating the function at $x_{min}$ defined by $\pd{}{x} = 0$.

The finite-horizon LQR formulation can be used to impose a strict final boundary value condition by setting an infinite ${\bf Q}_f$. % ${\bf Q}_f = \infty {\bf I}$.. is this the correct $Q_f$? However, integrating the Riccati equation backwards from an infinite initial condition isn't very practical. To get around this, let us consider solving for ${\bf P}(t) = {\bf S}(t)^{-1}$. Using the matrix relation $\frac{d {\bf S}^{-1}}{dt} = - {\bf S}^{-1} \frac{d {\bf S}}{dt} {\bf S}^{-1}$, we have: $$-\dot{\bf P}(t) = - {\bf P}(t){\bf Q P}(t) + {\bf B R}^{-1} {\bf B} - {\bf A P}(t) - {\bf P}(t){\bf A}^T,$$ with the final conditions $${\bf P}(T) = 0.$$ This Riccati equation can be integrated backwards in time for a solution.

It is very interesting, and powerful, to note that, if one chooses ${\bf Q} = 0$, therefore imposing no position cost on the trajectory before time $T$, then this inverse Riccati equation becomes a linear ODE which can be solved explicitly. % add explicit solution here These relationships are used in the derivation of the controllability Grammian, but here we use them to design a feedback law.

Optimal control provides a powerful framework for formulating control problems
using the language of optimization. But solving optimal control problems for
nonlinear systems is hard! In many cases, we don't really care about finding
the *optimal* controller, but would be satisfied with any controller that
is guaranteed to accomplish the specified task. In many cases, we still
formulate these problems using computational tools from optimization, and in
this chapter we'll learn about tools that can provide guaranteed control
solutions for systems that are beyond the complexity for which we can find
the optimal feedback.

There are many excellent books on Lyapunov analysis; for instance

Let's start with our favorite simple example.

Recall that the equations of motion of the damped simple pendulum are given by \[ ml^2 \ddot{\theta} + mgl\sin\theta = -b\dot{\theta}, \] which I've written with the damping on the right-hand side to remind us that it is an external torque that we've modeled.

These equations represent a simple second-order differential equation; in chapter 2 we discussed at some length what was known about the solutions to this differential equation--even without damping the equation for $\theta(t)$ as a function of $\theta(0)$, $\dot{\theta}(0)$ involves elliptic integrals of the first kind, and with damping we don't even have that. Since we couldn't provide a solution analytically, in chapter 2 we resorted to a graphical analysis, and confirmed the intuition that there are fixed points in the system (at $\theta = k\pi$ for every integer $k$) and that the fixed points at $\theta = 2\pi k$ are asymptotically stable with a large basin of attraction. The graphical analysis gave us this intuition, but can we actually prove this stability property? In a way that might also work for much more complicated systems?

One route forward was from looking at the total system energy (kinetic + potential), which we can write down: \[ E(\theta,\dot{\theta}) = \frac{1}{2} ml^2\dot{\theta}^2 - mgl \cos\theta. \] Recall that the contours of this energy function are the orbits of the undamped pendulum.

A natural route to proving the stability of the downward fixed points is by arguing that energy decreases for the damped pendulum (with $b>0$) and so the system will eventually come to rest at the minimum energy, $E = -mgl$, which happens at $\theta=2\pi k$. Let's make that argument slightly more precise.

Evaluating the time derivative of the energy reveals \[ \frac{d}{dt} E = - b\dot\theta^2 \le 0. \] This is sufficient to demonstrate that the energy will never increase, but it doesn't actually prove that the energy will converge to the minimum. To take that extra step, we must observe that for most of the states with $\dot{E} = 0$, which for $b>0$ is the manifold where $\dot\theta=0$, the system can't actually remain in that state, but will rather pass through and enter another region where it continues to lose energy. In fact, the fixed points are the only subset of this $\dot{E}=0$ manifold where the system can stay on the manifold. Therefore, unless the system is at a fixed point, it will continue to dissipate energy. And therefore we can conclude that as $t\rightarrow \infty$, the system will indeed come to rest at a fixed point (though it could be any fixed point with an energy less than or equal to the initial energy in the system, $E(0)$).

This is an important example. It demonstrated that we could use a relatively simple function, the system energy, to describe something about the long-term dynamics of the pendulum even though the actual trajectories of the system are (analytically) very complex. It also demonstrated one of the subtleties of using an energy-like function that is non-increasing (instead of strictly decreasing) to prove asymptotic stability.

Lyapunov functions generalize this notion of an energy function to more general
systems, which might not be stable in the sense of some mechanical energy. If
I can find any positive function, call it $V(\bx)$, that gets smaller over
time as the system evolves, then I can potentially use $V$ to make a statement
about the long-term behavior of the system. $V$ is called a *Lyapunov function*.

Recall that we defined three separate notions for stability of a fixed-point of a nonlinear system: stability i.s.L., asymptotic stability, and exponential stability. We can use Lyapunov functions to demonstrate each of these, in turn.

Given a system $\dot{\bx} = f(\bx)$, with $f$ continuous, and for some region ${\cal B}$ around the origin (specifically an open subset of $\mathbf{R}^n$ containing the origin), if I can produce a scalar, continuously-differentiable function $V(\bx)$, such that \begin{gather*} V(\bx) > 0, \forall \bx \in {\cal B} \ne 0 \quad V(0) = 0, \text{ and} \\ \dot{V}(\bx) = \pd{V}{\bx} f(\bx) \le 0, \forall \bx \in {\cal B} \ne 0 \quad \dot{V}(0) = 0, \end{gather*} then the origin $(\bx = 0)$ is stable in the sense of Lyapunov (i.s.L.).

If, additionally, we have $$\dot{V}(\bx) = \pd{V}{\bx} f(\bx) < 0, \forall \bx \in {\cal B} \ne 0 \quad \dot{V}(0) = 0,$$ then the origin is (locally) asymptotically stable. And if we have $$\dot{V}(\bx) = \pd{V}{\bx} f(\bx) \le -\alpha V(x), \forall \bx \in {\cal B} \ne 0 \quad \dot{V}(0) = 0,$$ for some $\alpha>0$, then the origin is (locally) exponentially stable.

Note that for the sequel we will use the notation $V \succ 0$ to denote a
*positive-definite function*, meaning that $V(0)=0$ and $V(\bx)>0$ for
all $\bx\ne0$ (and also $V \succeq 0$ for positive semi-definite, $V \prec 0$ for negative-definite functions).

The intuition here is exactly the same as for the energy argument we made in the pendulum example: since $\dot{V}(x)$ is always zero or negative, the value of $V(x)$ will only get smaller (or stay the same) as time progresses. Inside the subset ${\cal B}$, for every $\epsilon$-ball, I can choose a $\delta$ such that $|x(0)|^2 < \delta \Rightarrow |x(t)|^2 < \epsilon, \forall t$ by choosing $\delta$ sufficiently small so that the sublevel-set of $V(x)$ for the largest value that $V(x)$ takes in the $\delta$ ball is completely contained in the $\epsilon$ ball. Since the value of $V$ can only get smaller (or stay constant) in time, this gives stability i.s.L.. If $\dot{V}$ is strictly negative away from the origin, then it must eventually get to the origin (asymptotic stability). The exponential condition is implied by the fact that $\forall t>0, V(\bx(t)) \le V(\bx(0)) e^{-\alpha t}$.

Notice that the system analyzed above, $\dot{\bx}=f(\bx)$, did not have any control inputs. Therefore, Lyapunov analysis is used to study either the passive dynamics of a system or the dynamics of a closed-loop system (system + control in feedback). We will see generalizations of the Lyapunov functions to input-output systems later in the text.

The notion of a fixed point being stable i.s.L. is inherently a local notion of stability (defined with $\epsilon$- and $\delta$- balls around the origin), but the notions of asymptotic and exponential stability can be applied globally. The Lyapunov theorems work for this case, too, with only minor modification.

Given a system $\dot{\bx} = f(\bx)$, with $f$ continuous, if I can produce a scalar, continuously-differentiable function $V(\bx)$, such that \begin{gather*} V(\bx) \succ 0, \\ \dot{V}(\bx) = \pd{V}{\bx} f(\bx) \prec 0, \text{ and} \\ V(\bx) \rightarrow \infty \text{ whenever } ||x||\rightarrow \infty,\end{gather*} then the origin $(\bx = 0)$ is globally asymptotically stable (G.A.S.).

If additionally we have that $$\dot{V}(\bx) \preceq -\alpha V(\bx),$$ for some $\alpha>0$, then the origin is globally exponentially stable.

The new condition, on the behavior as $||\bx|| \rightarrow \infty$ is known as "radially unbounded", and is required to make sure that trajectories cannot diverge to infinity even as $V$ decreases; it is only required for global stability analysis.

Perhaps you noticed the disconnect between the statement above and the argument that we made for the stability of the pendulum. In the pendulum example, using the mechanical energy resulted in a Lyapunov function that was only negative semi-definite, but we eventually argued that the fixed points were asymptotically stable. That took a little extra work, involving an argument about the fact that the fixed points were the only place that the system could stay with $\dot{E}=0$; every other state with $\dot{E}=0$ was only transient. We can formalize this idea for the more general Lyapunov function statements--it is known as LaSalle's Theorem.

Given a system $\dot{\bx} =
f(\bx)$ with $f$ continuous. If we can produce a scalar function $V(\bx)$ with
continuous derivatives for which over an open-subset ${\cal B}\in \mathbf{R}^n$ we have
$$V(\bx) \succ 0,\quad \dot{V}(\bx) \preceq 0,$$ and $V(\bx)\rightarrow \infty$
as $||\bx||\rightarrow \infty$, then $\bx$ will converge to the largest *invariant
set* where $\dot{V}(\bx) = 0$.

To be clear, an *invariant set*, ${\cal G}$, of the dynamical system is a set
for which $\bx(0)\in{\cal G} \Rightarrow \forall t>0, \bx(t) \in {\cal G}$. In other words,
once you enter the set you never leave. The "largest invariant set" need not be connected;
in fact for the pendulum example each fixed point is an invariant set, so the largest
invariant set is the *union* of all the fixed points of the system.

Finding a Lyapunov function which $\dot{V} \prec 0$
is more difficult than finding one that has $\dot{V} \preceq 0$. LaSalle's theorem gives
us the ability to make a statement about *asymptotic* stability even in this case. In the pendulum
example, every state with $\dot\theta=0$ had $\dot{E}=0$, but only the fixed points are
in the largest invariant set.

At this point, you might be wondering if there is any relationship between Lyapunov functions and the cost-to-go functions that we discussed in the context of dynamic programming. After all, the cost-to-go functions also captured a great deal about the long-term dynamics of the system in a scalar function. We can see the connection if we re-examine the HJB equation \[ 0 = \min_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu). \right] \]Let's imagine that we can solve for the optimizing $\bu^*(\bx)$, then we are left with $ 0 = g(\bx,\bu^*) + \pd{J^*}{\bx}f(\bx,\bu^*) $ or simply \[ \dot{J}^* = -g(\bx,\bu^*). \] In other words, in optimal control we must find a cost-to-go function which matches this gradient for every $\bx$; that's very difficult and involves solving a potentially high-dimensional partial differential equation. By contrast, Lyapunov analysis is asking for much less - any function which is going downhill (at any rate) for all states. This can be much easier! Also note that if we do manage to find the optimal cost-to-go, $J^*(\bx)$, then it can also serve as a Lyapunov function so long as $\forall \bx, g(\bx,\bu^*(\bx))>0$.

One of the primary limitations in Lyapunov analysis as I have presented it so far is that it is potentially very difficult to come up with suitable Lyapunov function candidates for interesting systems, especially for underactuated systems. ("Underactuated" is almost synonymous with "interesting" in my vocabulary.) Even if somebody were to give me a Lyapunov candidate for a general nonlinear system, the Lyapunov conditions can be difficult to check -- for instance, how would I check that $\dot{V}$ is strictly negative for all $\bx$ except the origin if $\dot{V}$ is some arbitrarily complicated nonlinear function over a vector $\bx$?

In this section, we'll look at some computational approaches to verifying the Lyapunov conditions, and even to searching for the description of the Lyapunov functions themselves.

If you're imagining numerical algorithms to check the Lyapunov conditions for complicated Lyapunov functions and complicated dynamics, the first thought is probably that we can evaluate $V$ and $\dot{V}$ at a large number of sample points and check whether $V$ is positive and $\dot{V}$ is negative. That does work, and could potentially be combined with some smoothness or regularity assumptions to generalize beyond the sample points. But in many cases we will be able to do better -- providing optimization algorithms that will rigorously check these conditions *for all $\bx$* without dense sampling; these will also give us additional leverage in formulating the search for Lyapunov functions.

Let's take a moment to see how things play out for linear systems.

Imagine you have a linear system, $\dot\bx = {\bf A}\bx$, and can find a Lyapunov function $$V(\bx) = \bx^T {\bf P} \bx, \quad {\bf P} = {\bf P^T} \succ 0,$$ which also satisfies $$\dot{V}(\bx) = \bx^T {\bf PA} \bx + \bx^T {\bf A}^T {\bf P}\bx \prec 0.$$ Then the original is globally asymptotically stable.

Note that the radially-unbounded condition is satisfied by ${\bf P} \succ 0$, and that the derivative condition is equivalent to the matrix condition $${\bf PA} + {\bf A}^T {\bf P} \prec 0.$$

For stable linear systems the existence of a quadratic Lyapunov function is actually a necessary (as well as sufficient) condition. Furthermore, a Lyapunov function can be always be found by finding the positive-definite solution to the matrix Lyapunov equation \begin{equation}{\bf PA} + {\bf A}^T{\bf P} = - {\bf Q},\label{eq:algebraic_lyapunov} \end{equation} for any ${\bf Q}={\bf Q}^T\succ 0$.

This is a very powerful result - for nonlinear systems it will be potentially difficult to find a Lyapunov function, but for linear systems it is straight-forward. In fact, this results is often used to propose candidates for non-linear systems, e.g., by linearizing the equations and solving a local linear Lyapunov function which should be valid in the vicinity of a fixed point.

Lyapunov analysis for linear systems has an extremely important connection to convex optimization.
In particular, we could have also formulated the Lyapunov conditions for linear systems above using *semi-definite programming* (SDP).
Semidefinite programming is a subset of convex optimization -- an extremely important class of problems for which we can produce efficient algorithms that are guaranteed find the global optima solution (up to a numerical tolerance and barring any numerical difficulties).

If you don't know much about convex optimization or want a quick refresher, please take a few minutes to read the optimization preliminaries in the appendix. The main requirement for this section is to appreciate that it is possible to formulate efficient optimization problems where the constraints include specifying that one or more matrices are positive semi-definite (PSD). These matrices must be formed from a linear combination of the decision variables. For a trivial example, the optimization $$\min_a a,\quad \subjto \begin{bmatrix} a & 0 \\ 0 & 1 \end{bmatrix} \succeq 0,$$ returns $a = 0$ (up to numerical tolerances).

The value in this is immediate for linear systems. For example, we can formulate the search
for a Lyapunov function for the linear system $\dot\bx = {\bf A} \bx$ by using the
parameters ${\bf p}$ to populate a symmetric matrix ${\bf P}$ and then write the SDP: \[
\find_{\bf p} \quad \subjto \quad {\bf P} \succeq 0, \quad {\bf PA} + {\bf A}^T {\bf P} \preceq 0.\] Note that you would probably never use that particular formulation,
since there specialized algorithms for solving the simple Lyapunov equation which are more efficient and more numerically stable. But the SDP formulation does provide something new -- we can now easily formulate the search for a *"common Lyapunov function"* for uncertain
linear systems.

Suppose you have a system governed by the equations $\dot\bx = {\bf A}\bx$, where the matrix ${\bf A}$ is unknown, but its uncertain elements can be bounded. There are a number of ways to write down this uncertainty set; let us choose to write this by describing ${\bf A}$ as the affine combination of a number of known matrices, $${\bf A} = \sum_{i} \beta_i {\bf A}_i, \quad \sum_i \beta_i = 1, \quad \forall i, \beta_i > 0.$$ This is just one way to specify the uncertainty; geometrically it is describing a polygon of uncertain parameters (in the space of elements of ${\bf A}$ with each ${\bf A}_i$ as one of the vertices in the polygon.

Now we can formulate the search for a common Lyapunov function using \[ \find_{\bf p} \quad \subjto \quad {\bf P} \succeq 0, \quad \forall_i, {\bf PA}_i + {\bf A}_i^T {\bf P} \preceq 0.\] The solver will then return a matrix ${\bf P}$ which satisfies all of the constraints, or return saying "problem is infeasible". It can easily be verified that if ${\bf P}$ satisfies the Lyapunov condition at all of the vertices, then it satisfies the condition for every ${\bf A}$ in the set: \[ {\bf P}(\sum_i \beta_i {\bf A}_i) + (\sum_i \beta_i {\bf A}_i)^T {\bf P} = \sum_i \beta_i ({\bf P A}_i + {\bf A}_i^T {\bf P}) \preceq 0, \] since $\forall i$, $\beta_i > 0$. Note that, unlike the simple Lyapunov equation for a known linear system, this condition being satisfied is a sufficient but not a necessary condition -- it is possible that the set of uncertain matrices ${\bf A}$ is robustly stable, but that this stability cannot be demonstrated with a common quadratic Lyapunov function.

You can try this example for yourself in

```
cd(fullfile(getDrakePath,'examples'));
commonLyapunovForLinearSystems;
```

As always, make sure that you open up the code and take a look.

This example is very important because it establishes a connection between Lyapunov functions and (convex) optimization. But so far we've only demonstrated this connection for linear systems, where the PSD matrices provide a magical recipe for establishing the positivity of the (quadratic) functions for all $\bx$. Is there any hope of extending this type of analysis to more general nonlinear systems? Surprisingly, it turns out that there is.

It turns out that in the same way that we can use SDP to search over the positive quadratic equations, we can generalize this to search over the positive polynomial equations. To be clear, for quadratic equations we have \[ {\bf P}\succeq 0 \quad \Rightarrow \quad \bx^T {\bf P} \bx \ge 0, \quad \forall \bx. \] It turns out that
we can generalize this to \[ {\bf P}\succeq 0 \quad \Rightarrow \quad {\bf m}^T(\bx) {\bf P} {\bf m}(\bx) \ge 0, \quad \forall \bx, \] where ${\bf m}(\bx)$ is a vector of polynomial equations, typically chosen as a vector of *monomials* (polynomials with only one term). The set of positive polynomials parameterized in this way is exactly the set of polynomials that can be written as a *sum of squares*

Even better, there is quite a bit that is known about how to choose the terms in ${\bf m}(\bx)$. For example, if you give me the polynomial \[ p(x) = 2 - 4x + 5x^2 \] and ask if it is positive for all real $x$, I can convince you that it is by producing the sums-of-squares factorization \[ p(x) = 1 + (1-2x)^2 + x^2, \] and I know that I can formulate the problem without needing any monomials of degree greater than 1 (the square-root of the degree of $p$) in my monomial vector. In practice what this means
for us is that people have authored optimization front-ends which take a high-level specification with constraints on the positivity of polynomials and they automatically
generate the SDP problem for you, without having to think about what terms should appear in ${\bf m}(\bx)$ (c.f.

We will write optimizations using sums-of-squares constraints as \[ p(\bx) \sos \] as shorthand for the constraint that $p(\bx) \ge 0$ for all $\bx$, as demonstrated by finding a sums-of-squares decomposition.

Now we can see that it may be possible to generalize the optimization approach using SDP to search for Lyapunov functions for linear systems to searching for Lyapunov functions for at least the polynomial systems: $\dot\bx = f(\bx),$ where $f$ is a vector-valued polynomial function. If we parameterize a *fixed-degree* Lyapunov candidate as a polynomial with unknown coefficients, e.g., \[ V_\alpha(\bx) = \alpha_0 + \alpha_1 x_1 + \alpha_2 x_2 + \alpha_3 x_1x_2 + \alpha_4 x_1^2 + ..., \] then the time-derivative of $V$ is also a polynomial, and I can formulate the optimization: \begin{align*} \find_\alpha, \quad \subjto \quad& V_\alpha(\bx) \sos \\ & -\dot{V}_\alpha(\bx) = -\pd{V_\alpha}{\bx} f(\bx) \sos. \end{align*} Because this is a convex optimization, the solver will return a solution if one exists.

We opened this chapter using our intuition about energy to discuss stability on the simple pendulum. Now we'll replace that intuition with convex optimization (because it will also work for more difficult systems where our intuition fails).

But wait! The equations of motion of the simple pendulum are not polynomial, are they (they have a $\sin(\theta)$ in them)? Actually
nearly all rigid-body systems can be written as polynomials given a change of coordinates. Indeed, the rigid-body-assumption
is simply an assumption that points on the same body stay a constant distance apart, and Euclidean distance is certainly
a polynomial. For a more thorough discussion see, for instance,

For the purposes of this example, let's change coordinates from $[\theta,\dot\theta]^T$ to $\bx = [s,c,\dot\theta]^t$, where $s \equiv \sin\theta$ and $c \equiv \cos\theta$. Then we can write the pendulum dynamics as $$\dot\bx = \begin{bmatrix} c \dot\theta \\ -s \dot\theta \\ -\frac{1}{m l^2} \left( b \dot\theta + mgls \right) \end{bmatrix}.$$

Now let's parameterize a Lyapunov candidate $V(s,c,\dot\theta)$ as the polynomial with unknown coefficients which
contains all monomials up to degree 2: $$V = \alpha_1 + \alpha_2 s + \alpha_3 c + ... \alpha_{10} s^2 + \alpha_{11} sc + \alpha_{12} s\dot\theta.$$
Now we'll formulate the feasibility problem:
\[
\find_{\bf \alpha} \quad \subjto \quad V \sos, \quad -\dot{V} \sos.\]
In fact, this is asking too much -- really $\dot{V}$ only needs to be negative when $s^2+c^2=1$. We can
accomplish this with the so-called *S-procedure*, which we will discuss more below, and instead write
\[
\find_{{\bf \alpha},\lambda} \quad \subjto \quad V \sos, \quad -\dot{V} -\lambda(\bx)(s^2+c^2-1) \sos.\]
(Here $\lambda(\bx)$ is another polynomial with free coefficients which the optimization can use to make terms
arbitrarily more positive when $s^2+c^2 \neq 1$.)
Finally, for style points, in the code example in

```
cd(fullfile(getDrakePath,'examples','Pendulum'));
globalStability;
```

As always, make sure that you open up the code and take a look. This example uses our front-end *"spotless"*
("spot" was "systems polynomial optimization toolbox" written by Prof. Alex Megretski, and this is just a slightly leaner implementation)
to construct the SDP from the description as a SOS program. The result is a Lyapunov function that looks familiar (visualized as a countour plot here):

Aha! Not only does the optimization finds us coefficients for the Lyapunov function which satisfy our Lyapunov conditions, but the result looks a lot like mechanical energy. In fact, the result is a little better than energy... there are some small extra terms added which prove exponential stability without having to invoke LaSalle's Theorem.

It is important to remember that there are a handful of gaps which make the existence of this solution a sufficient condition (for proving that every sub-level set of $V$ is an invariant set of $f$) instead of a necessary one. First, there is no guarantee that a stable polynomial system can be verified using a polynomial Lyapunov function (of any degree, and in fact there are known counter-examples

Despite these caveats, I have found this formulation to be surprisingly effective in practice. Intuitively, I think that this is because there is relatively a lot of flexibility in the Lyapunov conditions -- if you can find one function which is a Lyapunov function for the system, then there are also many "nearby" functions which will satisfy the same constraints.

There is another very important connection between Lyapunov functions and the
concept of an invariant set: *any sub-level set of a Lyapunov function is
also an invariant set*. This gives us the ability to use sub-level sets
of a Lyapunov function as approximations of the region of attraction for nonlinear systems.

Given a system $\dot{\bx} = f(\bx)$ with $f$ continuous, if we can find a scalar function $V(\bx) \succ 0$ and a sub-level set $${\cal G}: \{ \bx | V(\bx) < \rho \}$$ on which $$\forall \bx \in {\cal G}, \dot{V}(\bx) \preceq 0,$$ then ${\cal G}$ is an invariant set. By LaSalle, $\bx$ will converge to the largest invariant subset of ${\cal G}$ on which $\dot{V}=0$.

Furthermore, if $\dot{V}(\bx) \prec 0$ in ${\cal G}$, then the origin is locally asymptotically stable and the set ${\cal G}$ is inside the region of attraction of this fixed point. Alternatively, if $\dot{V}(\bx) \preceq 0$ in ${\cal G}$ and $\bx = 0$ is the only invariant subset of ${\cal G}$ where $\dot{V}=0$, then the origin is asymptotically stable and the set ${\cal G}$ is inside the region of attraction of this fixed point.

Consider the first-order, one-dimensional system $\dot{x} = -x + x^3.$ We can quickly understand this system using our tools for graphical analysis.

In the vicinity of the origin, $\dot{x}$ looks like $-x$, and as we move away it looks increasingly like $x^3$. There is a stable fixed point at the origin and unstable fixed points at $\pm 1$. In fact, we can deduce visually that the region of attraction to the stable fixed point at the origin is $\bx \in (-1,1)$. Let's see if we can demonstrate this with a Lyapunov argument.

First, let us linearize the dynamics about the origin and use the Lyapunov equation for linear systems to find a candidate $V(\bx)$. Since the linearization is $\dot{x} = -x$, if we take ${\bf Q}=1$, then we find ${\bf P}=\frac{1}{2}$ is the positive definite solution to the algebraic Lyapunov equation (\ref{eq:algebraic_lyapunov}). Proceeding with $$V(\bx) = \frac{1}{2} x^2,$$ we have $$\dot{V} = x (-x + x^3) = -x^2 + x^4.$$ This function is zero at the origin, negative for $|x| < 1$, and positive for $|x| > 1$. Therefore we can conclude that the sub-level set $V < \frac{1}{2}$ is invariant and the set $x \in (-1,1)$ is inside the region of attraction of the nonlinear system. In fact, this estimate is tight.

While we will defer most discussions on robustness analysis until later in the notes, there is a simple and powerful idea that can be presented now: the idea of a *common Lyapunov function*. Imagine that you have a model of a dynamical system but that you are uncertain about some of the parameters. For example, you have a model of a quadrotor, and are fairly confident about the mass and lengths (both of which are easy to measure), but are not confident about the moment of inertia. One approach to robustness analysis is to define a bounded uncertainty, which could take
the form of $$\dot{\bx} = f_\alpha(\bx), \quad \alpha_{min} \le \alpha \le \alpha_{max},$$ with $\alpha$ a vector of uncertain parameters in your model. Richer specifications of the uncertainty bounds are also possible, but this will serve for our examples.

In standard Lyapunov analysis, we are searching for a function that goes downhill for all $\bx$ to make statements about the long-term dynamics of the system. In common Lyapunov analysis, we can make many similar statements about the long-term dynamics of an uncertain system if we can find a single Lyapunov function that goes downhill *for all possible values of $\alpha$*. If we can find such a function, then we can use it to make statements with all of the variations we've discussed (local, regional, or global; in the sense of Lyapunov, asymptotic, or exponential).

Let's consider the same one-dimensional example used above, but add
an uncertain parameter into the dynamics. In particular, consider the system: $$\bx = -x + \alpha x^3, \quad \frac{3}{4} < \alpha < \frac{3}{2}.$$ Plotting the
graph of the one-dimensional dynamics for a few values of $\alpha$, we
can see that the fixed point at the origin is still stable, but the *robust region of attraction* to this fixed point (shaded in blue below) is smaller than the region of attraction for the system with $\alpha=1$.

Taking the same Lyapunov candidate as above, $V = \frac{1}{2} x^2$, we have $$\dot{V} = -x^2 + \alpha x^4.$$ This function is zero at the origin, and negative for all $\alpha$ whenever $x^2 > \alpha x^4$, or $$|x| < \frac{1}{\sqrt{\alpha_{max}}} = \sqrt{\frac{2}{3}}.$$ Therefore, we can conclude that $|x| < \sqrt{\frac{2}{3}}$ is inside the robust region of attraction of the uncertain system.

Not all forms of uncertainty are as simple to deal with as the gain uncertainty in that example. For many forms of uncertainty, we might not even know the location of the fixed points of the uncertain system. In this case, we can often still use common Lyapunov functions to give some guarantees about the system, such as guarantees of *robust set invariance*. For instance, if you have uncertain parameters on a quadrotor model, you might be ok with the quadrotor stabilizing to a pitch of $0.01$ radians, but you would like to guarantee that it definitely does not flip over and crash into the ground.

Now consider the system: $$\bx = -x + x^3 + \alpha, \quad -\frac{1}{4} < \alpha < \frac{1}{4}.$$ Plotting the graph of the one-dimensional dynamics for a few values of $\alpha$, this time we can see that the fixed point is not necessarily at the origin; the location of the fixed point moves depending on the value of $\alpha$. But we should be able to guarantee that the uncertain system will stay near the origin if it starts near the origin, using an invariant set argument.

Taking the same Lyapunov candidate as above, $V = \frac{1}{2} x^2$, we have $$\dot{V} = -x^2 + x^4 + \alpha x.$$ This Lyapunov function allows us to easily verify, for instance, that $V \le \frac{1}{3}$ is a *robust invariant set*, because whenever $V = \frac{1}{3}$, we have $$\forall \alpha \in [\alpha_{min},\alpha_{max}],\quad \dot{V}(x,\alpha) < 0.$$ Therefore $V$ can never start at less than one-third and cross over to greater than one-third. To see this, see that $$ V=\frac{1}{3} \quad \Rightarrow \quad x = \pm \sqrt{\frac{2}{3}} \quad \Rightarrow \quad \dot{V} = -\frac{2}{9} \pm \alpha \sqrt{\frac{2}{3}} < 0, \forall \alpha \in \left[-\frac{1}{4},\frac{1}{4} \right]. $$ Note that not all sub-level sets of this invariant set are invariant. For instance $V < \frac{1}{32}$ does not satisfy this condition, and by visual inspection we can see that it is in fact not robustly invariant.

Now we have arrived at the tool that I believe can be a work-horse for many serious robotics applications. Most of our robots are not actually globally stable (that's not because they are robots -- if you push me hard enough, I will fall down, too), which means that understanding the regions where a particular controller can be guaranteed to work can be of critical importance.

Sums-of-squares optimization effectively gives us an oracle which we can ask: is this polynomial positive for all $\bx$? To use this for regional analysis, we have to figure out how to modify our questions to the oracle so that the oracle will say "yes" or "no" when we ask if a function is positive over a certain region which is a subset of $\Re^n$. That trick is called the S-procedure. It is closely related to the Lagrange multipliers from constrained optimization, and has deep connections to "Positivstellensatz" from algebraic geometry.

Consider a scalar polynomial, $p(\bx)$, and a semi-algebraic set $g(\bx) \preceq 0$, where $g$ is a vector of polynomials. If I can find a polynomial "multiplier", $\lambda(\bx)$ such that \[ p(\bx) + \lambda^T(\bx) g(\bx) \sos, \quad \text{and} \quad \lambda(\bx) \sos, \] then this is sufficient to demonstrate that $$p(\bx)\ge 0 \quad \forall \bx \in \{ g(\bx) \le 0 \}.$$ To convince yourself, observe that when $g(\bx) \le 0$, it is only harder to be positive but when $g(\bx) > 0$, it is possible for the combined function to be SOS even if $p(\bx)$ is negative.

Let's return to our example from above: \[ \dot{x} = -x + x^3 \] and try to use SOS optimization to demonstrate that the region of attraction of the fixed point at the origin is $x \in (-1,1)$, using the Lyapunov candidate $V = x^2.$

First, define the multiplier polynomial, \[ \lambda(x) = c_1 + c_2 x + c_3 x^2 + c_4 x^3 + c_5 x^4. \] Then define the optimization \begin{align*} \find_{\bf c} \quad & \\ \subjto \quad& - \dot{V}(x) - \lambda(x) (1-V(x)) \sos \\ & \lambda(x) \sos \end{align*}

Try it yourself:

```
cd(fullfile(getDrakePath,'examples'));
CubicPolynomialExample.sosExample;
```

To see the code that sets up the optimization, try:

```
edit('CubicPolynomialExample.sosExample')
```

While the example above only verifies that the one-sub-level set of the pre-specified Lyapunov candidate is negative (certifying the ROA that we already understood), we can generalize the optimization to allow us to search for the largest sub-level set (with the objective using a convex approximation for volume). We can even search for the coefficients of the Lyapunov function using an iteration of convex optimizations. There are a number of variations and nuances in the various formulations, and
some basic rescaling tricks that can help make the numerics of the problem better for the solvers. In `regionOfAttraction`

method that is defined for objects of type `PolynomialSystem`

. This makes it as simple as, for instance:

```
cd(fullfile(getDrakePath,'examples'));
% create a new CubicPolynomialExample object
p = CubicPolynomialExample();
% compute region of attraction
% the levelset V<1 is the region of attraction
V=regionOfAttraction(p,Point(p.getStateFrame,0));
% display the polynomial representation of V that results
display(V.getPoly);
```

which yields the console output:
```
[ (1)*x1^2 ]
```

The algorithm successfully found that $x \in (-1,1)$ is the region of attraction.
Remember that although we have tried to make it convenient to call these functions, they are not a black box. I highly recommend opening up the `regionOfAttraction`

method and understanding how it works.

```
cd(fullfile(getDrakePath,'examples'));
VanDerPol.run();
```

LQR gives the cost-to-go which can be used as the Lyapunov candidate. Otherwise, use a Lyapunov equation. You may not even need to search for a better Lyapunov function, but rather just ask the question: what is the largest region of attraction that can be demonstrated for the nonlinear system using the Lyapunov function from linear analysis?

Coming soon... For a nice explanation of how rigid-body kinematics are polynomial, see

In practice, you can also Taylor approximate any smooth nonlinear function using polynomials. This can be an effective strategy in practice, because you can limit the degrees of the polynomial, and because in many cases it is possible to provide conservative bounds on the errors due to the approximation.

I've argued that optimal control is a powerful framework for specifying complex behaviors with simple objective functions, letting the dynamics and constraints on the system shape the resulting feedback controller (and vice versa!). But the computational tools that we've provided so far have been limited in some important ways. The numerical approaches to dynamic programming which involve putting a mesh over the state space do not scale well to systems with state dimension more than four or five. Linearization around a nominal operating point (or trajectory) allowed us to solve for locally optimal control policies (e.g. using LQR) for even very high-dimensional systems, but the effectiveness of the resulting controllers is limited to the region of state space where the linearization is a good approximation of the nonlinear dynamics. The computational tools for Lyapunov analysis from the last chapter can provide, among other things, an effective way to compute estimates of those regions. But we have not yet provided any real computational tools for approximate optimal control that work for high-dimensional systems beyond the linearization around a goal. That is precisely the goal for this chapter.

The big change that is going to allow us to scale to high-dimensional systems
is that we are going to give up the goal of solving for the optimal feedback
controller for the entire state space, and instead attempt to find an optimal
control solution that is valid from only a single initial condition. Instead
of representing this as a feedback control function, we can represent this
solution as a *trajectory*, $\bx(\cdot), \bu(\cdot)$, typically
defined over a finite interval. In our graph-search dynamic programming algorithms, we discretized the dynamics
of the system on a mesh spread across the state space. This does not scale
to high-dimensional systems, and it was difficult to bound the errors
due to the discretization. If we instead restrict ourselves to optimizing
only a single initial condition, then a different discretization scheme emerges:
we can discretize the state and input trajectories *over time*.

Given an initial condition, $\bx_0$, and an input trajectory $u(t)$ defined over a finite interval, $t\in[t_0,t_f]$, we can compute the long-term (finite-horizon) cost of executing that trajectory using the standard additive-cost optimal control objective, \[ J_{\bu(\cdot)}(\bx_0) = \int_{t_0}^{t_f} g(\bx(t),\bu(t)) dt. \] We will write the trajectory optimization problem as \begin{align*} \minimize_{u(\cdot)} \quad & \int_{t_0}^{t_f} g(\bx(t),\bu(t)) dt \\ \subjto \quad & \forall t, \dot{\bx}(t) = f(\bx(t),\bu(t)), \\ & \bx(t_0) = \bx_0. \\ \end{align*} Some trajectory optimization problems may also include additional constraints, such as collision avoidance ($\bx$ can not cause the robot to be inside an obstacle) or input limits (e.g. $\bu_{min} \le \bu \le \bu_{max}$ ), which can be defined for all time or some subset of the trajectory.

As written, the optimization above is an optimization over continuous
trajectories. In order to formulate this as a numerical optimization, we must
parameterize it with a finite set of numbers. Perhaps not surprisingly, there are
many different ways to write down this parameterization, with a variety
of different properties in terms of speed, robustness, and accuracy of the results.
We will outline just a few of the most popular below. I would recommend

Before we dive in, we need to take a moment to understand the optimization tools that we will be using. In the graph-search dynamic programming algorithm, we were magically able to provide an iterative algorithm that was known to converge to the optimal cost-to-go function. With LQR we were able to reduce the problem to a matrix Riccati equation, for which we have scalable numerical methods to solve. In the Lyapunov analysis chapter, we were able to formulate a very specific kind of optimization problem--a semi-definite program (or SDP)--which is a subset of convex optimization, and relied on custom solvers like SeDuMi or Mosek to solve the problems for us. Convex optimization is a hugely important subset of nonlinear optimization, in which we can guarantee that the optimization has no "local minima". In this chapter we won't be so lucky; the optimizations that we formulate may have local minima and the solution techniques will at best only guarantee that they give a locally optimal solution.

The generic formulation of a nonlinear optimization problem is \[ \minimize_z
c(z) \quad \subjto \quad {\bf \phi}(z) \le 0, \] where $z$ is a vector
of *decision variables*, $c$ is a scalar
*objective function* and $\phi$ is a vector of *constraints*. Note that,
although we write $\phi \le 0$, this formulation captures positivity
constraints on the decision variables (simply multiply the constraint by $-1$)
and equality constraints (simply list both $\phi\le0$ and $-\phi\le0$) as well.

The picture that you should have in your head is a nonlinear, potentially non-convex objective function defined over (multi-dimensional) $z$, with a subset of possible $z$ values satisfying the constraints.

Note that minima can be the result of the objective function having zero derivative
*or* due to a sloped objective up against a constraint.

Numerical methods for solving these optimization problems require an initial guess, $\hat{z}$,
and proceed by trying to move down the objective function to a minima. Common approaches
include *gradient descent*, in which the gradient of the objective function is computed or estimated, or
second-order methods such as *sequential quadratic programming (SQP)* which attempts to make a local quadratic approximation of the objective function
and local linear approximations of the constraints and solves a quadratic program on each iteration to jump
directly to the minimum of the local approximation.

While not strictly required, these algorithms can often benefit a great deal from having
the gradients of the objective and constraints computed explicitly; the alternative is
to obtain them from numerical differentiation. Beyond pure speed considerations, I strongly
prefer to compute the gradients explicitly because it can help avoid numerical accuracy issues
that can creep in with finite difference methods. The desire to calculate these gradients will
be a major theme in the discussion below, and we have gone to great lengths
to provide explicit gradients of our provided functions and automatic differentiation of user-provided
functions in

When I started out, I was of the opinion that there is nothing difficult about
implementing gradient descent or even a second-order method, and I wrote all
of the solvers myself. I now realize that I was wrong. The commercial solvers
available for nonlinear programming are substantially higher performance than
anything I wrote myself, with a number of tricks, subtleties, and parameter
choices that can make a huge difference in practice. Some of these solvers can
exploit sparsity in the problem (e.g., if the constraints operate in a sparse way on the decision variables).
Nowadays, we make heaviest use of SNOPT `fmincon`

from the Optimization Toolbox in MATLAB. Note that while I do
advocate using these tools, you do not need to use them as a black box. In many
cases you can improve the optimization performance by understanding and selecting non-default
configuration parameters.

Let
us start by representing the finite-time trajectories, $\bx(t)$ and $\bu(t)$ $\forall t\in[t_0,t_f]$,
by their values at a series of *break points*, $t_0,t_1,t_2,t_3$, and denoting
the values at those points (aka the *knot points*) $\bx_0,...,\bx_N$ and $\bu_0,...,\bu_N$, respectively.

Then perhaps the simplest mapping of the trajectory optimization problem onto a nonlinear program is to fix the break points at even intervals, $dt$, and use Euler integration to write \begin{align*} \minimize_{\bx_1,...,\bx_N,\bu_0,...,\bu_{N-1}} & \sum_{n=0}^{N-1} g(\bx_n,\bu_n)dt \\ & \bx_{n+1} = \bx_n + f(\bx_n,\bu_n)dt, \quad \forall n\in [0,N-1]. \end{align*} Note that the decision variables here are $(\bx_1,...,\bx_N,\bu_0,...,\bu_{N-1})$, because $\bx_0$ is given, and $\bu_{N}$ does not appear in the cost nor any of the constraints. It is easy to generalize this approach to add additional costs or constraints on $\bx$ and/or $\bu$. (Note that this formulation does not actually benefit from the additive cost structure, so more general cost formulations are also possible.) Computing the gradients of the objective and constraints is essentially as simple as computing the gradients of $g()$ and $f()$.

We have implemented an optimization class hierarchy in

```
% note: requires Drake ver >= 0.9.7
cd(fullfile(getDrakePath,'examples'));
DoubleIntegrator.runDirtran;
```

Make sure you take a look at the code:
```
edit('DoubleIntegrator.runDirtran')
```

Nothing compares to running it yourself, and poking around in the code. But you can also click here to watch the result.
I hope that you recognize the parabolic trajectory from the initial condition up to the
switching surface, and then the second parabolic trajectory down to the origin. You should
also notice that the transition between $u=1$ and $u=-1$ is imperfect, due to the discretization
error. As an exercise, try increasing the number of knot points (the variable `N`

in the code)
to see if you can get a sharper response, like this.

If you take a moment to think about what the direct transcription algorithm
is doing, you will see that by satisfying the dynamic constraints, the
optimization is effectively solving the (Euler approximation of the)
differential equation. But instead of marching forward through time, it is
minimizing the inconsistency at each of the knot points simultaneously. While
it's easy enough to generalize the constraints to use higher-order integration
schemes, paying attention to the trade-off between the number of times the
constraint must be evaluated vs the density of the knot points in time, if the
goal is to obtain smooth $\bx$ trajectory solutions then another approach
quickly emerges: the approach taken by the so-called *collocation methods*.

In direct collocation (c.f., *collocation points*
also matches the plant dynamics. For the special case of cubic polynomial state trajectories and piecewise linear input
trajectories, the derivative at the midpoint of each segment is particularly easy to compute,
making it the natural choice.

Once again, direct collocation effectively integrates the equations of motion
by satisfying the constraints of the optimization -- this time producing a
third-order approximation of the dynamics with effectively two evaluations of
the plant dynamics per segment.

Direct collocation also easily solves the minimum-time problem for the double integrator. The performance is similar to direct transcription, but the convergence is visibly different. Try it for yourself:

```
% note: requires Drake ver >= 0.9.7
cd(fullfile(getDrakePath,'examples'));
DoubleIntegrator.runDircol;
```

Make sure you take a look at the code:
```
edit('DoubleIntegrator.runDircol')
```

In the methods described above, by asking the optimization package to perform the numerical integration of the equations of motion, we are effectively over-parameterizing the problem. In fact, the optimization is perfectly well defined if we restrict the decision variables to $\bu_0,...,\bu_{N-1}$ only--we can compute $\bx_1,...,\bx_{N}$ ourselves by knowing $\bx_0$, and $\bu(\cdot)$. This is exactly the approach taken in the shooting methods.

Again, providing gradients of the objectives and constraints to the solver is not strictly required -- most solvers will obtain them from finite differences if they are not provided -- but I feel strongly that the solvers are faster and more robust when exact gradients are provided. Now that we have removed the $\bx$ decision variables from the program, we have to take a little extra care to compute the gradients. This is still easily accomplished using the chain rule. To be concise (and slightly more general), let us define $\bx[n+1]=f_d(\bx[n],\bu[n])$ as the discrete-time approximation of the continuous dynamics; for example, the forward Euler integration scheme used above would give $f_d(\bx[n],\bu[n]) = \bx[n]+f(\bx[n],\bu[n])dt.$ Then we have \[ \pd{J}{\bu_k} = \sum_{n=0}^{N-1} \left( \pd{g(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} + \pd{g(\bx[n],\bu[n])}{\bu_k} \right), \] where the gradient of the state with respect to the inputs can be computed during the "forward simulation", \[ \pd{\bx[n+1]}{\bu_k} = \pd{f_d(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} + \pd{f_d(\bx[n],\bu[n])}{\bu_k}. \] These simulation gradients can also be used in the chain rule to provide the gradients of any constraints. Note that there are a lot of terms to keep around here, on the order of (state dim) $\times$ (control dim) $\times$ (number of timesteps). Ouch. Note also that many of these terms are zero; for instance with the Euler integration scheme above $\pd{\bu[n]}{\bu_k} = 0$ if $k\ne n$. (If this looks like I'm mixing two notations here, recall that I'm using $\bu_k$ to represent the decision variable and $\bu[n]$ to represent the input used in the $n$th step of the simulation.)

By solving for $\bx(\cdot)$ ourselves, we've removed a large number of constraints from the optimization. If no additional state constraints are present, and the only gradients we need to compute are the gradients of the objective, then a surprisingly efficient algorithm emerges. I'll give the steps here without derivation, but will derive it in the Pontryagin section below:

- Simulate forward: $$\bx[n+1] = f_d(\bx[n],\bu_n),$$ from $\bx[0] = \bx_0$
- Calculate backwards: $$\lambda[n-1] = \pd{g(\bx[n],\bu[n])}{\bx[n]}^T + \pd{f(\bx[n],\bu[n])}{\bx[n]}^T \lambda[n],$$ from $\lambda[N-1]=0$
- Extract the gradients: $$\pd{J}{\bu[n]} = \pd{g(\bx[n],\bu[n])}{\bu[n]} + \lambda[n]^T \pd{f(\bx[n],\bu[n])}{\bu[n]},$$ with $\pd{J}{\bu_k} = \sum_n \pd{J}{\bu[n]}\pd{\bu[n]}{\bu_k}$.

Here $\lambda[n]$ is a vector the same size as $\bx[n]$ which has an interpretation as
$\lambda[n]=\pd{J}{\bx[n+1]}^T$. The equation governing $\lambda$ is known as the *adjoint
equation*, and it represents a dramatic efficient improvement over calculating the
huge number of simulation gradients described above. In case you are interested,
the adjoint equation is known as the *backpropagation algorithm* in the neural
networks literature and it is one of the primary reasons that training neural networks
became so popular in the 1980's; super fast GPU implementations of this algorithm are
also one of the reasons that *deep learning* is incredibly popular right now (the
availability of massive training databases is perhaps the other main reason).

To take advantage of this efficiency, advocates of the shooting methods often use penalty methods instead of enforcing hard state constraints; instead of telling the solver about the constraint explicitly, you simply add an additional term to the cost function which gives a large penalty commensurate with the amount by which the constraint is violated. These are not quite as accurate and can be harder to tune (you'd like the cost to be high compared to other costs, but making it too high can lead to numerical conditioning issues), but they can work.

Although the decision about which algorithm is best may depend on the situation, in our work we have come to favor the direct collocation method (and occasionally direct transcription) for most of our work. There are a number of arguments for and against each approach; I will try to discuss a few of them here.

Numerical conditioning. Tail wagging the dog.

Sparse constraints. Potential for parallel evaluation of the constraints (computing the dynamics and their derivatives are often the most expensive part).

to avoid local minima. direct transcription and collocation can take an initial guess in $\bx$, too.

Another potential advantage of the direct transcription and collocation methods is that the dynamics constraints can be written in implicit form.

There are number of useful variations to the problem formulations I've presented above. By far the most common is the addition of a terminal cost, e.g.: $$\min \quad h(\bx[N]) + \sum_{n=0}^{N-1} g(\bx[n],\bu[n]).$$ These terms are easily added to the cost function in the any of methods, and the adjoint equations of the shooting method simply require the a modified terminal condition $$\lambda[N-1] = \pd{h(\bx[N])}{\bx[N]}$$ when computing the gradients.

Another common modification is including the spacing of the breakpoints as additional decision variables. This is particularly easy in the direct transcription and collocation methods, and can also be worked into the shooting methods. Typically we add a lower-bound on the time-step so that they don't all vanish to zero.

One potential complaint about the direct transcription and collocation
algorithms is that we tend to use simplistic numerical integration methods and
often fix the integration timestep (e.g. by choosing Euler integration and
selecting a $dt$); it is difficult to bound the resulting integration errors
in the solution. One tantalizing possibility in the shooting methods is that
the forward integration could be accomplished by more sophisticated methods,
like variable-step integration. But I must say that I have not had much
success with this approach, because while the numerical accuracy of any one
function evaluation might be improved, these integration schemes do not
necessarily give smooth outputs as you make incremental changes to the initial
conditions and control (changing $\bu_2$ by $\epsilon$ could result in taking
a different number of steps in the integration scheme). This lack of
smoothness can wreak havoc on the convergence of the optimization.
If numerical accuracy is a premium, then I think you will have more success
by imposing consistency constraints (e.g. as in the Runge-Kutta 4th order simulation
with 5th order error checking method) as additional constraints on the
time-steps; shooting methods do not have any particular advantage here.

The tools that we've been developing for numerical trajectory optimization are closely tied to theorems from (analytical) optimal control. Let us take one section to appreciate those connections.

What precisely does it mean for a trajectory, $\bx(\cdot),\bu(\cdot)$, to be
locally optimal? It means that if I were to perturb that trajectory in any way
(e.g. change $\bu_3$ by $\epsilon$), then I would either incur higher cost in
my objective function or violate a constraint. For an unconstrained optimization,
a *necessary condition* for local optimality is that the gradient of the
objective at the solution be exactly zero. Of course the gradient can also vanish
at local maxima or saddle points, but it certainly must vanish at local minima.
We can generalize this argument to constrained optimization using *Lagrange multipliers*.

Given the equality-constrained optimization problem $$\minimize_\bz g(\bz) \quad \subjto \quad {\bf \phi}(\bz) = 0,$$ where ${\bf \phi}$ is a vector. Define a vector $\lambda$ of Lagrange multipliers, the same size as $\phi$, and the scalar Lagrangian function, $$L(\bz,{\bf \lambda}) = g(\bz) + \lambda^T\phi(\bz).$$ A necessary condition for $\bz^*$ to be an optimal value of the constrained optimization is that the gradients of $L$ vanish with respect to both $\bz$ and $\lambda$: $$\pd{L}{\bz} = 0, \quad \pd{L}{\lambda} = 0.$$ Note that $\pd{L}{\lambda} = \phi(\bz)$, so requiring this to be zero is equivalent to requiring the constraints to be satisfied.

Consider the following optimization: $$\min_{x,y} x+y, \quad \subjto \quad x^2 + y^2 = 1.$$ The level sets of $x+y$ are straight lines with slope $-1$, and the constraint requires that the solution lives on the unit circle.

Simply by inspection, we can determine that the optimal solution should be $x=y=-\frac{\sqrt{2}}{2}.$ Let's make sure we can obtain the same result using Lagrange multipliers.

Formulating $$L = x+y+\lambda(x^2+y^2-1),$$ we can take the derivatives and solve \begin{align*} \pd{L}{x} = 1 + 2\lambda x = 0 \quad & \Rightarrow & \lambda = -\frac{1}{2x}, \\ \pd{L}{y} = 1 + 2\lambda y = 1 - \frac{y}{x} = 0 \quad & \Rightarrow & y = x,\\ \pd{L}{\lambda} = x^2+y^2-1 = 2x^2 -1 = 0 \quad & \Rightarrow & x = \pm \frac{1}{\sqrt{2}}. \end{align*} Given the two solutions which satisfy the necessary conditions, the negative solution is clearly the minimizer of the objective.

Let us use Lagrange multipliers to derive the necessary conditions for our constrained trajectory optimization problem in discrete time \begin{align*} \minimize_{\bx_1,...,\bx_N,\bu_0,...,\bu_{N-1}} & \sum_{n=0}^{N-1} g(\bx[n],\bu[n]),\\ \subjto \quad & \bx[n+1] = f_d(\bx[n],\bu[n]). \end{align*} Formulate the Lagrangian, \[ L(\bx[\cdot],\bu[\cdot],\lambda[\cdot]) = \sum_{n=0}^{N-1} g(\bx[n],\bu[n]) + \sum_{n=0}^{N-1} \lambda^T[n] \left(f_d(\bx[n],\bu[n]) - \bx[n+1]\right), \] and set the derivatives to zero to obtain the adjoint equation method described for the shooting algorithm above: \begin{gather*} \forall n\in[0,N-1], \pd{L}{\lambda[n]} = f_d(\bx[n],\bu[n]) - \bx[n+1] = 0 \Rightarrow \bx[n+1] = f(\bx[n],\bu[n]) \\ \forall n\in[0,N-1], \pd{L}{\bx[n]} = \pd{g(\bx[n],\bu[n])}{\bx} + \lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bx} - \lambda^T[n-1] = 0 \\ \quad \Rightarrow \lambda[n-1] = \pd{g(\bx[n],\bu[n])}{\bx}^T + \pd{f_d(\bx[n],\bu[n])}{\bx}^T \lambda[n]. \\ \pd{L}{\bx[N]} = -\lambda[N-1] = 0 \Rightarrow \lambda[N-1] = 0 \\ \forall n\in[0,N-1], \pd{L}{\bu[n]} = \pd{g(\bx[n],\bu[n])}{\bu} + \lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bu} = 0. \end{gather*} Therefore, if we are given an initial condition $\bx_0$ and an input trajectory $\bu[\cdot]$, we can verify that it satisfies the necessary conditions for optimality by simulating the system forward in time to solve for $\bx[\cdot]$, solving the adjoint equation backwards in time to solve for $\lambda[\cdot]$, and verifying that $\pd{L}{\bu[n]} = 0$ for all $n$. The fact that $\pd{J}{\bu} = \pd{L}{\bu}$ when $\pd{L}{\bx} = 0$ and $\pd{L}{\lambda} = 0$ follows from some basic results in the calculus of variations.

You won't be surprised to hear that these necessary conditions have an analogue in continuous time. I'll state it here without derivation. Given the initial conditions, $\bx_0$, a continuous dynamics, $\dot\bx = f(\bx,\bu)$, and the instantaneous cost $g(\bx,\bu)$, for a trajectory $\bx(\cdot),\bu(\cdot)$ defined over $t\in[t_0,t_f]$ to be optimal it must satisfy the conditions that \begin{align*} \forall t\in[t_0,t_f],\quad & \dot{\bx} = f(\bx,\bu), \quad &\bx(0)=\bx_0\\ \forall t\in[t_0,t_f],\quad & -\dot\lambda = \pd{g}{\bx}^T + \pd{f}{\bx}^T \lambda, \quad &\lambda(T) = 0\\ \forall t\in[t_0,t_f],\quad & \pd{g}{\bu} + \lambda\pd{f}{\bu} = 0.& \end{align*}

In fact the statement can be generalized even beyond this to the case where $\bu$ has constraints. The result is known as Pontryagin's minimum principle -- givingAdapted from

Note that the terms which are minimized in the final line of the theorem are commonly referred to as the Hamiltonian of the optimal control problem, $$H(\bx,\bu,\lambda,t) = g(\bx,\bu) + \lambda^T f(\bx,\bu).$$ It is distinct from, but inspired by, the Hamiltonian of classical mechanics. Remembering that $\lambda$ has an interpretation as $\pd{J}{\bx}^T$, you should also recognize it from the HJB.

An important special case. Linear/Quadratic objectives results in an LP/QP $\Rightarrow$ Convex optimization.

Once we have obtained a locally optimal trajectory from trajectory optimization, there is still work to do...

Take $\bar\bx(t) = \bx(t) - \bx_0(t)$, and $\bar\bu(t) = \bu(t)-\bu_0(t)$, then apply finite-horizon LQR (see the LQR chapter).

The discussion of walking and running robots in Chapter 4 motivated the notion of limit cycle stability. Linear systems are not capable of producing stable limit cycle behavior, so this rich topic is unique to nonlinear systems design and analysis. Furthermore, the tools that are required to design, stabilize, and verify limit cycles will have applicability beyond simple periodic motions.

The first natural question we must ask is, given a system $\dot{\bx} = f(\bx)$, or a control system $\dot{x} = f(\bx,\bu)$, how do we go about finding periodic solutions which may be passively stable, open-loop stable, or stabilized with closed-loop feedback? It turns out that the trajectory optimization tools that we developed in the last chapter are very well suited to this task.

I introduced the trajectory optimization tools as means for optimizing a single
trajectory of the system from a particular known initial condition. But they
can be used in a slightly different way, too -- in some formulations we might
choose to have the initial conditions as one of the decision variables. In
fact, the simplest formulation for finding a periodic solution is to leave
the entire trajectory as a decision variable, but to add *periodicity constraints*
enforcing that the initial conditions and the final conditions match. Some
care is needed here; if you're not careful then these constraints will all
be met if the entire trajectory remains at a fixed point in the system. In
the example below, we constrain that the initial and final conditions lie on
the surface of section that we used before, and additionally constrain that
the velocity at the initial/final points is non-zero (to avoid the fixed point solution).

Note that although we introduced trajectory optimization as searching over control inputs $\bu$, this formulation is now rich enough that it solves an important problem even for passive systems -- with no control input at all.

Recall the dynamics of the Van der Pol oscillator given by $$\ddot{q} + \mu (q^2 - 1) \dot{q} + q = 0, \quad \mu>0,$$ which exhibited a stable limit cycle.

Formulate the direct collocation optimization: \begin{align*} \find_{\bx_0,...,\bx_N,dt} \quad \\ \subjto \quad & q_0 = 0, \quad q_N = 0, \\ & \dot{q}_0 = \dot{q}_N, \quad \dot{q}_0 > 0, \\ & \text{collocation dynamic constraints} \end{align*}

Try it yourself:

```
cd(fullfile(getDrakePath,'examples'));
VanDerPol.runDircol();
```

Click here to see an animation of the result. The red dots are the initial and final conditions (which appear as one dot once the constraints are satisfied). As always, make sure that you take a look at the code. Poke around. Try changing some things.

One of the things that you should notice in the code is that I provide an initial guess for the solver. In most of the examples so far I've been able to avoid doing that--the solver takes small random numbers as a default initial guess and solves from there. But for this problem, I found that it was getting stuck in a local minima. Adding the initial guess that the solution moves around a circle in state space was enough.

Recall the important distinction between stability of a trajectory in time
and stability of a limit cycle was that the limit cycle does not converge in
phase -- trajectories near the cycle converge to the cycle, but trajectories
on the cycle may not converge with each other. This is type of stability,
also known as *orbital stability* can be written as stability to the
manifold described by cycle $\bx^*(t)$, \[ \min_\tau || x(t) - x^*(\tau) ||
\rightarrow 0.\] In the case of limit cycles, this manifold is a periodic
solution with $\bx^*(t+t_{period}) = \bx^*(t)$. Depending on exactly how that
convergence happens, we can define orbital stability in the sense of Lyapunov,
asymptotic orbital stability, exponential orbital stability, or even
finite-time orbital stability.

In order to prove that a system is orbitally stable (locally, over a region, or globally), or to analyze the region of attraction of a limit cycle, we can use a Lyapunov function. In particular, we would like to consider Lyapunov functions which have the form cartooned below; they vanish to zero everywhere along the cycle, and are strictly positive everywhere away from the cycle.

How can we parameterize this class of functions? For arbitrary cycles this could be very difficult in the original coordinates. For simple cycles like in the cartoon, one could imagine using polar coordinates. More generally, we will define a new coordinate system relative to the orbit, with coordinates

- $\tau$ - the phase along the orbit
- $\bx_\perp(\tau)$ - the remaining coordinates, linearly independent from $\tau$.

Given a state $\bx$ in the original coordinates, we must define a smooth mapping $\bx \rightarrow (\tau, \bx_\perp)$ to this new coordinate system. For example, for a simple ring oscillator we might have:

In general, for an $n$-dimensional state space, $\tau$ will always be a scalar, and $\bx_\perp$ will be an $(n-1)$-dimensional vector defining the remaining coordinates relative to $\bx^*(\tau)$. In fact, although we use the notation $\bx_\perp$ the coordinate system need not be strictly orthogonal to the orbit, but must simply beThe value of this construction for Lyapunov analysis was proposed in

For a dynamical system $\dot\bx = f(\bx)$ with $\bx \in \Re^n$, $f$ continuous, a continuous periodic solution
$\bx^*(\tau)$, and a smooth mapping $\bx \rightarrow (\tau,\bx_\perp)$ where $\bx_\perp$ vanishes
on $\bx^*$, then for some $n-1$ dimensional ball ${\cal B}$ around the origin,
if I can produce
a $V(\bx_\perp,\tau)$ such that
\begin{gather*}
\forall \tau, V(0,\tau) = 0, \\
\forall \tau, \forall \bx_\perp \in {\cal B}, \bx_\perp \ne 0, V(\bx_\perp,\tau) > 0,
\end{gather*}
with
\begin{gather*}
\forall \tau, \dot{V}(0,\tau) = 0, \\
\forall \tau, \forall \bx_\perp \in {\cal B}, \bx_\perp \ne 0, \dot{V}(\bx_\perp,\tau) < 0,
\end{gather*}
then the solution $\bx^*(t)$ is *locally orbitally asymptotically stable*.

Perhaps the simplest oscillator is the first-order system which converges to the unit circle. In cartesian coordinates, the dynamics are \begin{align*} \dot{x}_1 =& x_2 -\alpha x_1 \left( 1 - \frac{1}{\sqrt{x_1^2+x_2^2}}\right) \\ \dot{x}_2 =& -x_1 -\alpha x_2 \left( 1 - \frac{1}{\sqrt{x_1^2+x_2^2}}\right), \end{align*} where $\alpha$ is a positive scalar gain.

If we take the transverse coordinates to be the polar coordinates, shifted so that $x_\perp$ is zero on the unit circle, \begin{align*}\tau =& \text{atan2}(-x_2,x_1) \\ x_\perp =& \sqrt{x_1^2+x_2^2}-1, \end{align*} which is valid when $x_\perp>-1$, then the simple transverse dynamics are revealed: \begin{align*} \dot\tau =& 1 \\ \dot{x}_\perp =& -\alpha x_\perp. \end{align*} Taking a Lyapunov candidate $V(x_\perp,\tau) = x_\perp^2$, we can verify that $$\dot{V} = -2 \alpha x_\perp^2 \prec 0, \quad \forall x_\perp > -1.$$ This demonstrates that the limit cycle is locally asymptotically stable, and furthermore that the invariant open-set $V < 1$ is inside the region of attraction of that limit cycle. In fact, we know that all $x_\perp > -1$ are in the region of attraction that limit cycle, but this is not proven by the Lyapunov argument above.

Let's compare this approach with the approach that we used in the chapter on walking robots, where we used a Poincaré map analysis to investigate limit cycle stability. In transverse coordinates approach, there is an additional burden to construct the coordinate system along the entire trajectory, instead of only at a single surface of section. In fact, the transverse coordinates approach is sometimes referred to as a "moving Poincaré section". But the reward for this extra work is that we can check a condition that only involves the instantaneous dynamics, $f(\bx)$, as opposed to having to integrate the dynamics over an entire cycle to generate the discrete Poincaré map, $\bx_p[n+1] = P(\bx_p[n])$. While computing $P$ in closed-form happened to be possible for the simple rimless wheel model, it is rarely possible for more complex models. As we will see below, this approach will also be more compatible with designing continuous feedback controller that stabilize the limit cycle.

In the case of Lyapunov analysis around a fixed point, there was an important special case: for stable linear systems, we actually have a recipe for constructing a Lyapunov function. As a result, for nonlinear systems, we often found it convenient to begin our search by linearizing around the fixed point and using the Lyapunov candidate for the linear system as an initial guess. That same approach can be extended to limit cycle analysis.

Transverse stabilization is more natural than full LQR

The term "motion planning" is a hopelessly general term which almost certainly encompasses the dynamic programming, feedback design, and trajectory optimization algorithms that we have already discussed. However, there are a number of algorithms and ideas that we have not yet discussed which have grown from the idea of formulating motion planning as a search problem -- for instance searching for a path from a start to a goal in a graph which is too large solve completely with dynamic programming. Some, but certainly not all, of these algorithms sacrifice optimality in order to find any path if it exists, and the notion of a planner being "complete" -- guaranteed to find a path if one exists -- is highly valued. This is precisely our goal for this chapter, to add some additional tools that will be able to provide some form of solutions for our most geometrically complex, highly non-convex, robot control problems.

*motion planning* typically refers to problems where the planning domain is continuous
(e.g. continuous state space, continuous action space), but many motion planning
algorithms trace their origins back to ideas in discrete domains (e.g., graph search).

For this chapter, we will consider the following problem formulation: given a system defined by the nonlinear dynamics (in continuous- or discrete-time) $$\dot{\bx} = f(\bx,\bu) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n]),$$ and given a start state $\bx(0) = \bx_0$ and a goal region ${\cal G}$, find any finite-time trajectory from $\bx_0$ to to $\bx \in {\cal G}$ if such a trajectory exists.

A long history... some people feel that the route to creating intelligent machines is to collect large ontologies of knowledge, and then perform very efficient search. (The more modern view of AI is focused instead on machine learning, the right answer probably involves pieces of both.) Samuels checker players, Deep Blue playing chess, theorem proving, Cyc, IBM Watson,...

One of the key ideas is the use of "heuristics" to guide the search. "Is it possible to find and optimal path from the start to a goal without visiting every node?". $A^*$. Admissible heuristics. Example: google maps.

Online planning. $D^*$, $D^*$-Lite.

If you remember how we introduced dynamic programming initially as a graph search, you'll remember that there were some challenges in discretizing the state space. Let's assume that we have discretized the continuous space into some finite set of discrete nodes in our graph. Even if we are willing to discretize the action space for the robot (this might be even be acceptable in practice), we had a problem where discrete actions from one node in the graph, integrated over some finite interval $h$, are extremely unlikely to land exactly on top of another node in the graph. To combat this, we had to start working on methods for interpolating the value function estimate between nodes.

Interpolation can work well if you are trying to solve for the cost-to-go function over the entire state space, but it's less compatible with search methods which are trying to find just a single path through the space. If I start in node 1, and land between node 2 and node 3, then which node to I continue to expand from?

One approach to avoiding this problem is to build a *search tree* as the search executes, instead of relying on a
predefined mesh discretization. This tree will contains nodes rooted in the continuous space at exactly
the points where system can reach.

Another other problem with any fixed mesh discretization of a continuous space, or even
a fixed discretization of the action space, is that
unless we have specific geometric / dynamic insights into our continuous system, it very difficult
to provide a *complete* planning algorithm. Even if we can show that no path to the goal exists on
the tree/graph, how can we be certain that there is no path for the continuous system? Perhaps
a solution would have emerged if we had discretized the system differently, or more finely?

One approach to addressing this second challenge is to toss out the notion of fixed
discretizations, and replace them with random sampling (another approach would be to adaptively
add resolution to the discretization as the algorithm runs). Random sampling, e.g. of the action
space, can yield algorithms that are *probabilistically complete* for the continuous space --
if a solution to the problem exists, then a probabilistically complete algorithm will
find that solution with probability 1 as the number of samples goes to infinity.

With these motivations in mind, we can build what is perhaps the simplest probabilistically complete algorithm for finding a path from the a starting state to some goal region with in a continuous state and action space:

Let us denote the data structure which contains the tree as ${\cal T}$. The algorithm is very simple:

- Initialize the tree with the start state: ${\cal T} \leftarrow \bx_0$.
- On each iteration:
- Select a random node, $\bx_{rand}$, from the tree, ${\cal T}$
- Select a random action, $\bu_{rand}$, from a distribution over feasible actions.
- Compute the dynamics: $\bx_{new} = f(\bx_{rand},\bu_{rand})$
- If $\bx_{new} \in {\cal G}$, then terminate. Solution found!
- Otherwise add the new node to the tree, ${\cal T} \leftarrow \bx_{new}$.

```
T = struct('parent',zeros(1,1000),'node',zeros(2,1000)); % pre-allocate memory for the "tree"
for i=2:size(T.parent,2)
T.parent(i) = randi(i-1);
x_rand = T.node(:,T.parent(i));
u_rand = 2*rand(2,1)-1;
x_new = x_rand+u_rand;
if (15<=x_new(1) && x_new(1)<=20 && 15<=x_new(2) && x_new(2)<=20)
disp('Success!'); break;
end
T.node(:,i) = x_new;
end
clf;
line([T.node(1,T.parent(2:end));T.node(1,2:end)],[T.node(2,T.parent(2:end));T.node(2,2:end)],'Color','k');
patch([15,15,20,20],[15,20,20,15],'g')
axis([-10,25,-10,25]);
```

Again, this algorithm We're nowhere close to the goal yet, and it's not exactly a hard problem.

While the idea of generating a tree of feasible points has clear advantages, we have lost the ability to cross off a node (and therefore a region of space) once it has been explored. It seems that, to make randomized algorithms effective, we are going to at the very least need some form of heuristic for encouraging the nodes to spread out and explore the space.

Multi-query planning with PRMs, ...

RRT*, RRT-sharp, RRTx,...

Complexity bounds and dispersion limits

Not sure yet whether randomness is fundamental here, or whether is a temporary "crutch" until we understand geometric and dynamic planning better.

Cell decomposition...

Mixed-integer planning.

Approximate decompositions for complex environments (e.g. IRIS)

The equations of motion for a standard robot can be derived using the method
of Lagrange. Using $T$ as the total kinetic energy of the system, and $U$ as
the total potential energy of the system, $L = T-U$, and $Q_i$ as the
generalized force corresponding to $q_i$, the Lagrangian dynamic equations are:
$$\frac{d}{dt}\pd{L}{\dot{q}_i} - \pd{L}{q_i} = Q_i.$$ If you are not
comfortable with these equations, then any good book chapter on rigid body
mechanics can bring you up to speed -- try

Consider the simple double pendulum with torque actuation at both joints and all of the mass concentrated in two points (for simplicity). Using $\bq = [\theta_1,\theta_2]^T$, and ${\bf p}_1,{\bf p}_2$ to denote the locations of $m_1,m_2$, respectively, the kinematics of this system are: \begin{eqnarray*} {\bf p}_1 =& l_1\begin{bmatrix} s_1 \\ - c_1 \end{bmatrix}, &{\bf p}_2 = {\bf p}_1 + l_2\begin{bmatrix} s_{1+2} \\ - c_{1+2} \end{bmatrix} \\ \dot{{\bf p}}_1 =& l_1 \dot{q}_1\begin{bmatrix} c_1 \\ s_1 \end{bmatrix}, &\dot{{\bf p}}_2 = \dot{{\bf p}}_1 + l_2 (\dot{q}_1+\dot{q}_2) \begin{bmatrix} c_{1+2} \\ s_{1+2} \end{bmatrix} \end{eqnarray*} Note that $s_1$ is shorthand for $\sin(q_1)$, $c_{1+2}$ is shorthand for $\cos(q_1+q_2)$, etc. From this we can write the kinetic and potential energy: \begin{align*} T =& \frac{1}{2} m_1 \dot{\bf p}_1^T \dot{\bf p}_1 + \frac{1}{2} m_2 \dot{\bf p}_2^T \dot{\bf p}_2 \\ =& \frac{1}{2}(m_1 + m_2) l_1^2 \dot{q}_1^2 + \frac{1}{2} m_2 l_2^2 (\dot{q}_1 + \dot{q}_2)^2 + m_2 l_1 l_2 \dot{q}_1 (\dot{q}_1 + \dot{q}_2) c_2 \\ U =& m_1 g y_1 + m_2 g y_2 = -(m_1+m_2) g l_1 c_1 - m_2 g l_2 c_{1+2} \end{align*} Taking the partial derivatives $\pd{T}{q_i}$, $\pd{T}{\dot{q}_i}$, and $\pd{U}{q_i}$ ($\pd{U}{\dot{q}_i}$ terms are always zero), then $\frac{d}{dt}\pd{T}{\dot{q}_i}$, and plugging them into the Lagrangian, reveals the equations of motion: \begin{align*} (m_1 + m_2) l_1^2 \ddot{q}_1& + m_2 l_2^2 (\ddot{q}_1 + \ddot{q}_2) + m_2 l_1 l_2 (2 \ddot{q}_1 + \ddot{q}_2) c_2 \\ &- m_2 l_1 l_2 (2 \dot{q}_1 + \dot{q}_2) \dot{q}_2 s_2 + (m_1 + m_2) l_1 g s_1 + m_2 g l_2 s_{1+2} = \tau_1 \\ m_2 l_2^2 (\ddot{q}_1 + \ddot{q}_2)& + m_2 l_1 l_2 \ddot{q}_1 c_2 + m_2 l_1 l_2 \dot{q}_1^2 s_2 + m_2 g l_2 s_{1+2} = \tau_2 \end{align*} As we saw in chapter 1, numerically integrating (and animating) these equations in MATLAB produces the expected result.

If you crank through the Lagrangian dynamics for a few simple robotic
manipulators, you will begin to see a pattern emerge - the resulting equations
of motion all have a characteristic form. For example, the kinetic energy of
your robot can always be written in the form: $$T = \frac{1}{2} \dot{\bq}^T {\bf
H}(\bq) \dot{\bq},$$ where ${\bf H}$ is the state-dependent inertial matrix.
This abstraction affords some insight into general manipulator dynamics - for
example we know that ${\bf H}$ is always positive definite, and
symmetric

Continuing our abstractions, we find that the equations of motion of a general robotic manipulator (sans kinematic loops) take the form $${\bf H}(\bq)\ddot{\bq} + {\bf C}(\bq,\dot{\bq})\dot{\bq} + {\bf g}(\bq) = {\bf B}(\bq)\bu,$$ where $\bq$ is the state vector, ${\bf H}$ is the inertial matrix, ${\bf C}$ captures Coriolis forces, and ${\bf g}$ captures potentials (such as gravity). The matrix ${\bf B}$ maps control inputs $\bu$ into generalized forces.

The manipulator equations are very general, but they do define some important characteristics. For example, $\ddot{\bq}$ is (state-dependent) linearly related to the control input, $\bu$. This observation justifies the control-affine form of the dynamics assumed throughout the notes.

Note that we have
chosen to use the notation of second-order systems (with $\dot{\bq}$ and
$\ddot{\bq}$ appearing in the equations) throughout this book. Although I
believe it provides more clarity, there is an important limitation to this
notation: it is impossible to describe 3D rotations in "minimal coordinates"
using this notation without introducing kinematic singularities (like the famous
"gimbal lock"). For instance, a common singularity-free choice for representing
a 3D rotation is the unit quaternion, described by 4 real values (plus a norm
constraint). However we can still represent the rotational velocity without
singularities using just 3 real values. This means that the length of our
velocity vector is no longer the same as the length of our position vector. For
this reason, you will see that most of the software in

Consider constraint equation $$\phi(q) = 0.$$ These arise, for instance, when the system has a closed kinematic chain. The (floating base) equations of motion can be written $$H(q)\ddot{q} + c(q,\dot{q}) = B u + J(q)^T \lambda,$$ where $J(q) = \pd{\phi}{q}$ and $\lambda$ is the constraint force.

To solve for $\lambda$, observe that when the constraint is imposed, $\phi(q)=0$ and therefor $\dot{\phi}=0$ and $\ddot{\phi}=0$. Writing this out, we have \begin{gather*}\dot{\phi} = J(q)\dot{q}=0,\\ \ddot{\phi} = J(q) \ddot{q} + \dot{J}(q) \dot{q} = 0.\end{gather*} Inserting the dynamics and solving for $\lambda$ yields $$\lambda =- (J H^{-1} J^T)^+ (J H^{-1} (Bu - c) + \dot{J}\dot{q}).$$ The $^+$ notation refers to a Moore-Penrose pseudo-inverse. In most cases, there are less constraints than degrees of freedom, in which case the inverse has a unique solution (and the traditional inverse could have been used). But the pseudo-inverse also works in cases where the system is over-constrained.

For numerical stability, one might like to add a restoring force to this constraint in the event that the constraint is not satisfied to numerical precision. To accomplish this, we can implement $$\ddot\phi = -\frac{2}{\epsilon} \dot\phi(q) - \frac{1}{\epsilon^2} \phi(q).$$ Carrying this through yields \begin{gather*} \lambda =- (J H^{-1} J^T)^+ (J H^{-1} (Bu - c) + (\dot{J} + \frac{2}{\epsilon} J)\dot{q} + \frac{1}{\epsilon^2} \phi). \end{gather*}

Consider the constraint equation $$\psi(q,\dot{q}) = 0,$$ where $\pd{\psi}{\dot{q}} \ne 0.$ These are less common, but arise when, for instance, a joint is driven through a prescribed motion. Here, the manipulator equations are given by $$H(q)\ddot{q} + c = B u + \pd{\psi}{\dot{q}}^T \lambda.$$ To solve for $\lambda$, we take $$\dot{\psi} = \pd{\psi}{q} \dot{q} + \pd{\psi}{\dot{q}}\ddot{q} = 0,$$ which yields $$\lambda = - \left( \pd{\psi}{\dot{q}} H^{-1} \pd{\psi}{\dot{q}} \right)^+ \left[ \pd{\psi}{\dot{q}} H^{-1} (Bu - c) + \pd{\psi}{q} \dot{q} \right].$$

Again, for numerical stability, we ask instead for $\dot{\psi} = -\frac{1}{\epsilon} \psi$, which yields $$\lambda = - \left( \pd{\psi}{\dot{q}} H^{-1} \pd{\psi}{\dot{q}} \right)^+ \left[ \pd{\psi}{\dot{q}} H^{-1} (Bu - c) + \pd{\psi}{q} \dot{q} + \frac{1}{\epsilon} \psi \right].$$

Now let $\phi(q)$ indicate the relative (signed) distance between two rigid bodies. The collision event is described by the zero-crossings (from positive to negative) of $\phi$. Let us start by assuming frictionless collisions, allowing us to write $$H\ddot{q} + c = Bu + J^T \lambda,$$ where $\lambda \ge 0$ is now a (scalar) coefficient of the impulsive force that is well-defined when integrated over the time of the collision (denoted $t_c^-$ to $t_c^+$). Integrate both sides of the equation over that (instantaneous) interval: \begin{align*}\int_{t_c^-}^{t_c^+} dt\left[ H \ddot{q} + C \right] = \int_{t_c^-}^{t_c^+} dt \left[ Bu + J^T \lambda \right] \end{align*} Since $q$ and $u$ are constants over this interval, we are left with $$H\dot{q}^+ - H\dot{q}^- = J^T \int_{t_c^-}^{t_c^+} \lambda dt,$$ where $\dot{q}^+$ is short-hand for $\dot{q}(t_c^+)$. Multiplying both sides by $J H^{-1}$, we have $$ J \dot{q}^+ - J\dot{q}^- = J H^{-1} J^T \int_{t_c^-}^{t_c^+} \lambda dt.$$ After the collision, we have $\dot\phi^+ = -e \dot\phi^-$, with $0 \le e \le 1$ denoting the coefficient of restitution, yielding: $$ J \dot{q}^+ - J\dot{q}^- = -(1+e)J\dot{q}^-,$$ and therefore $$\int_{t_c^-}^{t_c^+} \lambda dt = - (1+e)\left[ J H^{-1} J^T \right]^+ J \dot{q}^-.$$ Substituting this back in above results in $$\dot{q}^+ = \left[ I - (1+e)H^{-1} J^T \left[ J H^{-1} J^T \right]^+ J \right] \dot{q}^-.$$

We can add friction at the contact. Rigid bodies will almost always experience
contact at only a point, so we typically ignore torsional friction

We can put the bilateral constraint equations and the impulsive collision equations together to implement a hybrid model for unilateral constraints of the form $$\phi(q) \ge 0.$$ At every moment, some subset of the contacts are active, and can be treated as a bilateral constraint ($\phi_i=0$). The guards of the hybrid model are when an inactive constraint becomes active ($\phi_i>0 \rightarrow \phi_i=0$), or when an active constraint becomes inactive ($\lambda_i>0 \rightarrow \lambda_i=0$). Note that collision events will (almost always) only result in new active constraints when $e=0$, e.g. we have pure inelastic collisions.

If the contact involves Coulomb friction, then the transitions between sticking and sliding can be modeled
as additional hybrid guards, or can be solved efficiently as a complementarity problem

The equations of motions for our machines get complicated quickly.
Fortunately, for robots with a tree-link kinematic structure, there
are very efficient and natural recursive algorithms for generating
these equations of motion. For a detailed reference on these methods,
see

Drake ("dragon" in Middle English) is a toolbox maintained by the Robot Locomotion Group at the MIT Computer Science and Artificial Intelligence Lab (CSAIL) which contains implementations of many of the ideas from these notes. It was designed with the goal of rapidly prototyping advanced control ideas, but has also been optimized (in places) for performance -- it provides the complete planning and control backend for most of our research robots, including MIT's entry into the DARPA Robotics Challenge, and now has users from academia and industry around the world.

*symbolically* on the governing equations. The toolbox does a lot of work behind the scenes to make sure that, for instance, feedback or cascade combinations of polynomial systems remain polynomial. Another major focus is on numerical-optimization-based design and analysis -- for instance, in addition to providing a full rigid-body dynamics engine the software also provides analytical gradients for those dynamics.

Roughly speaking, MATLAB's Simulink provides a very nice interface for describing dynamical systems (as S-Functions), a graphical interface for combining these systems in very nontrivial ways, and a number of powerful solvers for simulating the resulting systems. For simulation analysis, it provides everything we need. However the S-Function abstraction which makes Simulink powerful also hides some of the detailed structure available in the equations governing a dynamical system; for the purposes of control design and analysis I would like to be able to declare that a particular system is governed by analytic equations, or polynomial equations, or even linear equations, and for many of the tools it is important to be able to manipulate these equations symbolically.

You can think of

Like SimMechanics,

For everything you need to get started, including installation instructions, a discussion forum, and a (modest but growing) gallery of examples, please visit the

The remainder of this chapter will give you an important conceptual overview of defining and working with dynamical systems in drake, and will give a number of simple examples for you to start with.

If you have questions, please search the previous questions on the github issues page, and post a new question if you aren't able to find the solution.

The fundamental object in

Dynamical systems in

As we will see, dynamical systems in `DrakeSystem`

class interface, or -- in the case of rigid-body systems -- even by simple XML file (see the section below on the URDF format) that can be parsed by our existing class implementations.

In this section, we will describe how you can write your own dynamics class in MATLAB by deriving from the `DrakeSystem`

interface class.

**Continuous-Time Systems.**
Consider a basic continuous-time nonlinear input-output dynamical system described by the following state-space equations:
\begin{gather*}
\dot{x} = f(t,x,u), \\ y = g(t,x,u).
\end{gather*}
In `DrakeSystem`

class, defining the size of the input, state, and output vectors in the constructor, and overloading the `dynamics`

and `output`

methods, as illustrated by the following example:

Consider the system \begin{gather*}\dot{x} = -x + x^3,\\ y = x.\end{gather*} This system has zero inputs, one (continuous) state variable, and one output. It can be implemented in

```
classdef SimpleCTExample < DrakeSystem
methods
function obj = SimpleCTExample()
% call the parent class constructor:
obj = obj@DrakeSystem(...
1, ... % number of continuous states
0, ... % number of discrete states
0, ... % number of inputs
1, ... % number of outputs
false, ... % because the output does not depend on u
true); % because the dynamics and output do not depend on t
end
function xdot = dynamics(obj,t,x,u)
xdot = -x+x^3;
end
function y=output(obj,t,x,u)
y=x;
end
end
end
```

**Discrete-Time Systems**.
Implementing a basic discrete-time system in `DrakeSystem`

and defining the `update`

and `output`

methods, as seen in the following example.

Consider the system \begin{gather*}x[n+1] = x^3[n],\\ y[n] = x[n].\end{gather*} This system has zero inputs, one (discrete) state variable, and one output. It can be implemented in

```
classdef SimpleDTExample < DrakeSystem
methods
function obj = SimpleDTExample()
% call the parent class constructor:
obj = obj@DrakeSystem(...
0, ... % number of continuous states
1, ... % number of discrete states
0, ... % number of inputs
1, ... % number of outputs
false, ... % because the output does not depend on u
true); % because the update and output do not depend on t
end
function xnext = update(obj,t,x,u)
xnext = x^3;
end
function y=output(obj,t,x,u)
y=x;
end
end
end
```

**Mixed Discrete and Continuous Dynamics**.
It is also possible to implement systems that have both continuous
dynamics and discrete dynamics. There are two subtleties that must be
addressed. First, we'll denote the discrete states as $x_d$ and the
continuous states as $x_c$, and the entire state vector $x = [x_d^T,
x_c^T]^T$. Second, we must define the timing of the discrete dynamics
update relative to the continuous time variable $t$; we'll denote this
period with $\Delta_t$. Then a mixed system can be written
as:\begin{gather*} \dot{x}_c = f_c(t,x,u),\\ x_d(t+t') = f_d(t,x,u),
\quad \forall t \in \{0,\Delta_t,2\Delta_t,...\},\forall t' \in
(0,\Delta_t] \\ y=g(t,x,u). \end{gather*} Note that, unlike the
purely discrete time example, the solution of the discrete time variable is defined
for all $t$. To implement this, derive from `DrakeSystem`

and
implement the `dynamics`

, `update`

, `output`

methods
for $f_c()$, $f_d()$, and $g()$, respectively. To define the timing,
you must also implement the `getSampleTime`

method. Type
`help DrakeSystem/getSampleTime`

at the MATLAB prompt for
more information. Note that currently `DrakeSystem`

(but multiple systems can be combined with different sampling times).

Consider the system \begin{gather*}x_1(t+t') = x_1^3(t), \quad \forall
t \in \{0,1,2,..\}, \forall t' \in (0,1], \\\dot{x}_2 =
-x_2(t) + x_2^3(t), \end{gather*} which is the combination of the
previous two examples into a single system. It can be implemented in

```
classdef SimpleMixedCTDTExample < DrakeSystem
methods
function obj = SimpleMixedCTDTExample()
% call the parent class constructor:
obj = obj@DrakeSystem(...
1, ... % number of continuous states
1, ... % number of discrete states
0, ... % number of inputs
2, ... % number of outputs
false, ... % because the output does not depend on u
true); % because the update and output do not depend on t
end
function ts = getSampleTime(obj)
ts = [[0;0], ... % continuous and discrete sample times
[1;0]]; % with dt = 1
end
function xdnext = update(obj,t,x,u)
xdnext = x(1)^3; % the DT state is x(1)
end
function xcdot = dynamics(obj,t,x,u);
xcdot = -x(2)+x(2)^3; % the CT state is x(2)
end
function y=output(obj,t,x,u)
y=x;
end
end
end
```

**Systems with Constraints**. Nonlinear input-output systems with constraints can
also be defined. There are two distinct types of constraints supported: state constraints that can be modeled as a function $\phi(x) = 0$ and input constraints which can be modeled as $u_{min} \le u \le u_{max}$. For instance, we would write a continuous-time system with state and input constraints as:
\begin{gather*}
\dot{x} = f(t,x,u),\quad y = g(t,x,u), \\ \text{subject to } \phi(x) =
0, u_{min} \le u \le u_{max}.
\end{gather*}
These two types of constraints are handled quite differently.

Input constraints are designed to act in the same way that an actuator limit might work for a mechanical system. These act as a saturation nonlinearity system attached to the input, where for each element: $$y_i = \begin{cases} u_{max,i} & \text{if } u_i > u_{max,i} \\ u_{min,i} & \text{if } u_i < u_{min,i} \\ u_i & \text{otherwise.}\end{cases}$$ The advantage of using the input limits instead of implementing the saturation in your own code is that the discontinuity associated with hitting or leaving a saturation is communicated to the solver correctly, allowing for more efficient and accurate simulations. Input constraints are set by calling the `setInputLimits`

method.

State constraints are additional information that you are providing to the solver and analysis routines. They should be read as "this dynamical system will only ever visit states described by $\phi(x)=0$". Evaluating the dynamics at a vector $x$ for which $\phi(x) \ne 0$ may lead to an undefined or non-sensible output. Telling `stateConstraints`

method \emph{and} by calling `setNumStateConstraints`

to tell the solver what to expect in that method.

**Hybrid (Event-Driven) Systems.**

Event-driven systems in *guard* function, denoted by
$\phi(t,x,u)$ in the figure, which triggers a transition
out of the current mode when $\phi \le 0$. The dynamics of the
transition are given by the *reset* function, $x^+ =
\Delta(t,x^-,u)$. Event-driven systems are constructed by creating
(or inheriting from) an empty `HybridDrakeSystem`

, then
populating the system with modes (nodes in the graph) by calling
`addMode`

, and populating the system with transitions (edges in
the graph) by calling `addTransition`

. Note that it is often
useful to create guard out of a logical combination of smooth guards
(e.g. $x(1)>0$ and $x(2)<.5$); to accomplish this you should use the
`andGuard`

and `notGuard`

methods.
The output of a `HybridDrakeSystem`

is the output
of the active mode.

The dynamics of a vertically bouncing ball can be described by a
`HybridDrakeSystem`

with a single continuous mode to model
the flight of the ball through the air and a discrete collision
event at the moment that the ball hit the ground. This can be
accomplished with the following two classes (which can be found in the `examples/BouncingBall`

directory):

```
classdef BallFlightPhasePlant < DrakeSystem
methods
function obj = BallFlightPhasePlant()
obj = obj@DrakeSystem(...
2, ... % number of continuous states
0, ... % number of discrete states
0, ... % number of inputs
1, ... % number of outputs
false, ... % not direct feedthrough
true); % time invariant
end
function xdot = dynamics(obj,t,x,u)
xdot = [x(2); -obj.g]; % qddot = -g; x=[q,qdot]
end
function y = output(obj,t,x,u)
y = x(1); % height of the ball
end
end
properties
g = 9.81; % gravity
end
end
```

```
classdef BallPlant < HybridDrakeSystem
methods
function obj = BallPlant()
obj = obj@HybridDrakeSystem(...
0, ... % number of inputs
1); % number of outputs
% create flight mode system
sys = BallFlightPhasePlant();
obj = setInputFrame(obj,sys.getInputFrame);
obj = setOutputFrame(obj,sys.getOutputFrame);
[obj,flight_mode] = addMode(obj,sys); % add the single mode
g1=inline('x(1)-obj.r','obj','t','x','u'); % q-r<=0
g2=inline('x(2)','obj','t','x','u'); % qdot<=0
obj = addTransition(obj, ...
flight_mode, ... % from mode
andGuards(obj,g1,g2), ... % q-r<=0 & qdot<=0
@collisionDynamics, ... % transition method
false, ... % not direct feedthrough
true); % time invariant
end
function [xn,m,status] = collisionDynamics(obj,m,t,x,u)
xn = [x(1); -obj.cor*x(2)]; % qdot = -cor*qdot
if (xn(2)<0.01) status = 1; % stop simulating if ball has stopped
else status = 0; end
end
end
properties
r = 1; % radius of the ball
cor = .8; % coefficient of restitution
end
end
```

**Stochastic Systems**.

Stochastic models can be implemented in `StochastiDrakeSystem`

and implementing the
`stochasticDynamics`

, `stochasticUpdate`

, and
`stochasticOutput`

methods.

Consider the system \begin{gather*}\dot{x} = -x + w,\\ y = x.\end{gather*} This system has zero inputs, one (continuous) state variable, and one output. It can be implemented in

```
classdef LinearGaussianExample < StochasticDrakeSystem
methods
function obj = LinearGaussianExample
obj = obj@StochasticDrakeSystem(...
1, ... % number of continuous states
0, ... % number of discrete states
0, ... % number of inputs
1, ... % number of outputs
false, ... % not direct feedthrough
true, ... % time invariant
1, ... % number of noise inputs
.01); % time constant of w(t)
end
function xcdot = stochasticDynamics(obj,t,x,u,w)
xcdot = -x + w;
end
function y = stochasticOutput(obj,t,x,u,w);
y=x;
end
end
end
```

**Important Special Cases**.
There are many special cases of dynamical systems with structure in the
dynamics which can be exploited by our algorithms. Special cases of
dynamical systems implemented in

- Second-order systems, given by $\ddot{q} = f(t,q,\dot{q},u),$ $y = g(t,q,\dot{q},u)$, and $\phi_1(q)=0$ and $\phi_2(q,\dot{q})=0$.
- Rigid-body systems, governed by the manipulator equations, \begin{gather*}H(q)\ddot{q} + C(q,\dot{q})\dot{q} + G(q) = Bu + \pd{\phi_1}{q}^T \lambda_1 + \pd{\phi_2}{\dot{q}}^T \lambda_2\\ \phi_1(q) = 0, \quad \phi_2(q,\dot{q})=0\end{gather*} where $\lambda_1$ and $\lambda_2$ are forces defined implicitly by the constraints.
- (Rational) polynomial systems, given by $e(x)\dot{x} = f(t,x,u)$, $y = g(t,x,u)$, subject to $\phi(x)=0$, where $e(), f(), g(),$ and $\phi()$ are all polynomial.
- Linear time-invariant systems, given by \begin{gather*} \dot{x}_c = A_c x + B_c u,\\ x_d[n+1] = A_d x + B_d u, \\ y = Cx + Du, \end{gather*} and $\phi(x)=\{\}$.

These special cases are implemented by classes derived from `DrakeSystem`

.
You should always attempt to derive from the deepest class in the hierarchy that
describes your system.

In some cases it is possible to convert between these derived classes. For example, it is possible to convert a rigid-body system to a rational polynomial system (e.g., by changing coordinates through a stereographic projection). Methods which implement these conversions are provided whenever possible.

URDF = Universal Robot Description Format. Details coming soon.

Although most users of Drake never open a Simulink GUI,

Whenever possible, structure in the equations is preserved on combination. A polynomial system that is feedback combined with another polynomial system produces a new polynomial system. However, if a polynomial system is feedback combined with a Simulink Block, then the new system is a DynamicalSystem, but not a PolynomialSystem.

A combination of two systems should return a system of the type that is the least common ancestor in the class hierarchy. There are two exceptions: combinations with a hybrid system stay hybrid, and combinations with a stochastic system stay stochastic.

Every dynamical system in Drake has a number of coordinate frames associated with it, including frames which describe the inputs, states, and outputs of the system. These coordinate frames provide extra information about the vector signals, including a name for each element. Must match to allow combination. Hybrid modes automatically inherit the input and output coordinate frame of the hybrid system; the hybrid system does not have a coordinate frame for the state.

Once you have acquired a DynamicalSystem object describing the
dynamics of interest, the most basic thing that you can do is to
simulate it. This is accomplished with the `simulate`

method in
the DynamicalSystem class, which takes the timespan of the simulation
as a 1x2 vector of the form `[t0 tf]`

and optionally a vector
initial condition as input. Type ```
help
DynamicalSystem/simulate
```

for further documentation about simulation
options.

Every `simulate`

method outputs a instance of the
`Trajectory`

class. To inspect
the output, you may want to evaluate the trajectory at any snapshot in time
using `eval`

, plot the output using `fnplt`

, or hand the
trajectory to a visualizer (described below).

Use the following code to instantiate the `SimpleCTExample`

,
simulate it, and plot the results:

```
sys = SimpleCTExample;
traj = simulate(sys, [0 10], .99);
fnplt(traj);
```

The arguments passed to `simulate`

set the initial time to $t_0=0$,
final time to $t_f=5$, and initial condition to $x_0=.99$. The code looks the same for the `SimpleDTExample`

:

```
sys = SimpleDTExample;
traj = simulate(sys, [0 10], .99);
fnplt(traj);
```

These generate the plots below

`SimpleCTSystem`

(left) and
`SimpleDTSystem`

(right).The `simulate`

method sets the input, $u$, to the
system to zero. In order to simulate with an input tape, you should
`cascade`

a `Trajectory`

object describing $u(t)$ with the system,
then simulate. If you do not specify the initial conditions on your
call to `simulate`

, then the `getInitialState()`

method is
called on your `DynamicalSystem`

object. The default
`getInitialState`

method sets $x_0 = 0$; you should consider
overloading this method in your `DynamicalSystem`

class. In special
cases, you may need to set initial conditions based on the initial
time or input -- in this case you should overload the method
`getInitialStateWInput`

.

By default, `setSimulinkParam`

method.
For `DrakeSystems`

, you can optionally use the MATLAB's `ode45`

suite of solvers using the method `simulateODE`

, which can also
understand (most of) the Simulink solver parameters.

Visualizers are dynamical systems, too.

Playback

Use ball bouncing as an example.

Cascading. Use the realtime block.

There are a number of options for creating a movie file from your simulation,
depending on which Visualizer you used. Given a trajectory, `traj`

,
and a Visualizer object, `v`

, use

```
playbackMovie(v,traj);
```

If you specify an extension in your filename when you are prompted, then playbackMovie will attempt to write in the specified format. If you specify a filename with no extension, then you will be prompted with the output formats supported by your current visualizer.

The best overview we have for the remaining capabilities in

This section explains the most common ways that we interface

LCM is a simple, elegant, and efficient network protocol developed at
MIT during the DARPA Urban Challenge
(http://lcm-proj.github.io/)

LCM passes messages defined in an lcmtype file which can describe an
almost arbitrary data structure. Drake traffics primarily in signals,
which are (timestamped) vectors of doubles. To convert between the two, we define
an `LCMCoder`

class which defines how to `encode`

a double
as the rich data structure and `decode`

the data structure back
into a double.

You can implement your `LCMCoder`

directly in MATLAB or in Java.
The `LCMCoderFromType`

class can also construct one
automatically using Java reflection -- it simply concatenates the
elements of the structure into a double vector in the order that they
appear in the lcmtype file.

Any coordinate frame can be associated with an `LCMCoder`

by deriving
from the `LCMCoordinateFrame`

class.

In order to use any `DynamicalSystem`

as a node in the LCM
network, simply call `runLCM(sys,...)`

instead of
`simulate(sys,...)`

-- this will automatically attach LCM input blocks
and LCM output blocks on any input/output frames of the system that
are of type `LCMCoordinateFrame`

, and then simulate the system.
The `runLCM`

method takes a number of optional arguments
allowing you to refine this interaction.

The Robot Operating System (ROS) is an immensely popular suite of tools for robotics research. Nodes in the ROS network communicate with a very similar message passing system to LCM -- one primary different is that ROS is based on TCP/IP and a server/client network configuration, where LCM is based on all-to-all multicast UDP. Typically when we need Drake to act as a node in a ROS network (or to subscribe to ROS messages), we send and receive LCM messages and have a thin "ROS-LCM translator" app that connects the two networks.

With the MathWork's recently announced support for ROS, implementing something like a ROSCoordinateFrame is now on our short-term todo list.

Again, please engage us on the forums on the