Underactuated Robotics

Algorithms for Walking, Running, Swimming, Flying, and Manipulation

Russ Tedrake

© Russ Tedrake, 2016
How to cite these notes

Note: These are working notes used for a course taught at MIT and a free online course on MITx. They will be updated throughout the Spring 2016 semester.


This book is about building robots that move with speed, efficiency, and grace. I believe that this can only be achieve through a tight coupling between mechanical design, passive dynamics, and nonlinear control synthesis. Therefore, these notes contain selected material from dynamical systems theory, as well as linear and nonlinear control.

These notes also reflect a deep belief in computational algorithms playing an essential role in finding and optimizing solutions to complex dynamics and control problems. Algorithms play an increasingly central role in modern control theory; these days even rigorous mathematicians consider finding convexity in a problem (therefore making it amenable to an efficient computational solution) almost tantamount to an analytical result. Therefore, the notes necessarily also cover selected material from optimization theory, motion planning, and machine learning.

Although the material in the book comes from many sources, the presentation is targeted very specifically at a handful of robotics problems. Concepts are introduced only when and if they can help progress the capabilities we are trying to develop. Many of the disciplines that I am drawing from are traditionally very rigorous, to the point where the basic ideas can be hard to penetrate for someone that is new to the field. I've made a conscious effort in these notes to keep a very informal, conversational tone even when introducing these rigorous topics, and to reference the most powerful theorems but only to prove them when that proof would add particular insights without distracting from the mainstream presentation. I hope that the result is a broad but reasonably self-contained and readable manuscript that will be of use to any enthusiastic roboticist.


The material in these notes is organized into a few main parts. "Model Systems" introduces a series of increasingly complex dynamical systems and overviews some of the relevant results from the literature for each system. "Nonlinear Planning and Control" introduces quite general computational algorithms for reasoning about those dynamical systems, with optimization theory playing a central role. Many of these algorithms treat the dynamical system as known and deterministic until the last chapters in this part which introduce stochasticity and robustness. "Estimation and Learning" follows this up with techniques from statistics and machine learning which capitalize on this viewpoint to introduce additional algorithms which can operate with less assumptions on knowing the model or having perfect sensors. The book closes with chapters on "Applications and Extensions" and an "Appendix".

The order of the chapters was chosen to make the book valuable as a reference. When teaching the course, however, I take a spiral trajectory through the material, introducing robot dynamics and control problems one at a time, and introducing only the techniques that are required to solve that particular problem.

insert figure showing progression of problems here. pendulum -> cp/acro -> walking ... with chapter numbers associated.


All of the examples and algorithms in this book, plus many more, are now available as a part of our open-source software project: . I will give pointers to these examples throughout the text, and provide an introduction to the software in the appendix.

First chapter

Fully-actuated vs Underactuated Systems

Robots today move far too conservatively, and accomplish only a fraction of the tasks and achieve a fraction of the performance that they are mechanically capable of. In many cases, we are still fundamentally limited by control technology which matured on rigid robotic arms in structured factory environments. The study of underactuated robotics focuses on building control systems which use the natural dynamics of the machines in an attempt to achieve extraordinary performance in terms of speed, efficiency, or robustness.


Let's start with some examples, and some videos.

Honda's ASIMO vs. passive dynamic walkers

The world of robotics changed when, in late 1996, Honda Motor Co. announced that they had been working for nearly 15 years (behind closed doors) on walking robot technology. Their designs have continued to evolve, resulting in a humanoid robot they call ASIMO (Advanced Step in Innovative MObility). Honda's ASIMO is widely considered to be state of the art in walking robots, although there are now many robots with designs and performance very similar to ASIMO's. We will dedicate effort to understanding a few of the details of ASIMO when we discuss algorithms for walking... for now I just want you to become familiar with the look and feel of ASIMO's movements [watch the asimo video below now].

Honda's ASIMO (from http://world.honda.com/ASIMO/video/)

I hope that your first reaction is to be incredibly impressed with the quality and versatility of ASIMO's movements. Now take a second look. Although the motions are very smooth, there is something a little unnatural about ASIMO's gait. It feels a little like an astronaut encumbered by a heavy space suit. In fact this is a reasonable analogy... ASIMO is walking a bit like somebody that is unfamiliar with his/her dynamics. Its control system is using high-gain feedback, and therefore considerable joint torque, to cancel out the natural dynamics of the machine and strictly follow a desired trajectory. This control approach comes with a stiff penalty. ASIMO uses roughly 20 times the energy (scaled) that a human uses to walk on the flat (measured by cost of transport)Collins05. Also, control stabilization in this approach only works in a relatively small portion of the state space (when the stance foot is flat on the ground), so ASIMO can't move nearly as quickly as a human, and cannot walk on unmodelled or uneven terrain.

Download the video

Download the video

A 3D passive dynamic walker by Steve Collins and Andy RuinaCollins01

For contrast, let's now consider a very different type of walking robot, called a passive dynamic walker (PDW). This "robot" has no motors, no controllers, no computer, but is still capable of walking stably down a small ramp, powered only by gravity [watch videos above now]. Most people will agree that the passive gait of this machine is more natural than ASIMO's; it is certainly more efficient. Passive walking machines have a long history - there are patents for passively walking toys dating back to the mid 1800's. We will discuss, in detail, what people know about the dynamics of these machines and what has been accomplished experimentally. This most impressive passive dynamic walker to date was built by Steve Collins in Andy Ruina's lab at CornellCollins01.

Passive walkers demonstrate that the high-gain, dynamics-cancelling feedback approach taken on ASIMO is not a necessary one. In fact, the dynamics of walking is beautiful, and should be exploited - not cancelled out.

Birds vs. modern aircraft

The story is surprisingly similar in a very different type of machine. Modern airplanes are extremely effective for steady-level flight in still air. Propellers produce thrust very efficiently, and today's cambered airfoils are highly optimized for speed and/or efficiency. It would be easy to convince yourself that we have nothing left to learn from birds. But, like ASIMO, these machines are mostly confined to a very conservative, low angle-of-attack flight regime where the aerodynamics on the wing are well understood. Birds routinely execute maneuvers outside of this flight envelope (for instance, when they are landing on a perch), and are considerably more effective than our best aircraft at exploiting energy (eg, wind) in the air.

As a consequence, birds are extremely efficient flying machines; some are capable of migrating thousands of kilometers with incredibly small fuel supplies. The wandering albatross can fly for hours, or even days, without flapping its wings - these birds exploit the shear layer formed by the wind over the ocean surface in a technique called dynamic soaring. Remarkably, the metabolic cost of flying for these birds is indistinguishable from the baseline metabolic costArnould96, suggesting that they can travel incredible distances (upwind or downwind) powered almost completely by gradients in the wind. Other birds achieve efficiency through similarly rich interactions with the air - including formation flying, thermal soaring, and ridge soaring. Small birds and large insects, such as butterflies and locusts, use `gust soaring' to migrate hundreds or even thousands of kilometers carried primarily by the wind.

Birds are also incredibly maneuverable. The roll rate of a highly acrobatic aircraft (e.g, the A-4 Skyhawk) is approximately 720 deg/secShyy08; a barn swallow has a roll rate in excess of 5000 deg/secShyy08. Bats can be flying at full-speed in one direction, and completely reverse direction while maintaining forward speed, all in just over 2 wing-beats and in a distance less than half the wingspanTian06. Although quantitative flow visualization data from maneuvering flight is scarce, a dominant theory is that the ability of these animals to produce sudden, large forces for maneuverability can be attributed to unsteady aerodynamics, e.g., the animal creates a large suction vortex to rapidly change directionTriantafyllou95. These astonishing capabilities are called upon routinely in maneuvers like flared perching, prey-catching, and high speed flying through forests and caves. Even at high speeds and high turn rates, these animals are capable of incredible agility - bats sometimes capture prey on their wings, Peregrine falcons can pull 25 G's out of a 240 mph dive to catch a sparrow in mid-flightTucker98, and even the small birds outside our building can be seen diving through a chain-link fence to grab a bite of food.

Although many impressive statistics about avian flight have been recorded, our understanding is partially limited by experimental accessibility - it is quite difficult to carefully measure birds (and the surrounding airflow) during their most impressive maneuvers without disturbing them. The dynamics of a swimming fish are closely related, and can be more convenient to study. Dolphins have been known to swim gracefully through the waves alongside ships moving at 20 knotsTriantafyllou95. Smaller fish, such as the bluegill sunfish, are known to possess an escape response in which they propel themselves to full speed from rest in less than a body length; flow visualizations indeed confirm that this is accomplished by creating a large suction vortex along the side of the bodyTytell08 - similar to how bats change direction in less than a body length. There are even observations of a dead fish swimming upstream by pulling energy out of the wake of a cylinder; this passive propulsion is presumably part of the technique used by rainbow trout to swim upstream at mating seasonBeal06.

The common theme

Classical control techniques for robotics are based on the idea that feedback can be used to override the dynamics of our machines. These examples suggest that to achieve outstanding dynamic performance (efficiency, agility, and robustness) from our robots, we need to understand how to design control system which take advantage of the dynamics, not cancel them out. That is the topic of this course.

Surprisingly, there are relatively few formal control ideas that consider "exploiting" the dynamics. In order to convince a control theorist to consider the dynamics (efficiency arguments are not enough), you have to do something drastic, like taking away his control authority - remove a motor, or enforce a torque-limit. These issues have created a formal class of systems, the underactuated systems, for which people have begun to more carefully consider the dynamics of their machines in the context of control.


According to Newton, the dynamics of mechanical systems are second order ($F = ma$). Their state is given by a vector of positions, $\bq$ (also known as the configuration vector), and a vector of velocities, $\dot{\bq}$, and (possibly) time. The general form for a second-order controllable dynamical system is: $$\ddot{\bq} = {\bf f}(\bq,\dot{\bq},\bu,t),$$ where $\bu$ is the control vector. As we will see, the dynamics for many of the robots that we care about turn out to be affine in commanded torque, so let's consider a slightly constrained form: \begin{equation}\ddot{\bq} = {\bf f}_1(\bq,\dot{\bq},t) + {\bf f}_2(\bq,\dot{\bq},t)\bu \label{eq:f1_plus_f2}.\end{equation}


A control system described by equation \ref{eq:f1_plus_f2} is fully-actuated in state $(\bq,\dot{\bq})$ at time $t$ if it is able to command any instantaneous acceleration in $\bq$: \begin{equation} \textrm{rank}\left[{\bf f}_2 (\bq,\dot{\bq},t)\right] = \dim\left[\bq\right].\end{equation}


A control system described by equation \ref{eq:f1_plus_f2} is underactuated in state $(\bq,\dot{\bq})$ at time $t$ if it is not able to command an arbitrary instantaneous acceleration in $\bq$: \begin{equation} \textrm{rank}\left[{\bf f}_2(\bq,\dot{\bq},t)\right] < \dim\left[\bq\right]. \label{eq:underactuated_def}\end{equation}

Notice that whether or not a control system is underactuated may depend on the state of the system or even on time, although for most systems (including all of the systems in this book) underactuation is a global property of the system. We will refer to a system as underactuated if it is underactuated in any state or time.

Robot Manipulators

make this image spring to life with a matlab movie<
Simple double pendulum

Consider the simple robot manipulator illustrated above. As described in Appendix A, the equations of motion for this system are quite simple to derive, and take the form of the standard "manipulator equations": $${\bf H}(\bq)\ddot\bq + {\bf C}(\bq,\dot\bq)\dot\bq + {\bf G}(\bq) = {\bf B}(\bq)\bu.$$ It is well known that the inertial matrix, ${\bf H}(\bq)$ is (always) uniformly symmetric and positive definite, and is therefore invertible. Putting the system into the form of equation \ref{eq:f1_plus_f2} yields: \begin{align*}\ddot{\bq} =& -{\bf H}^{-1}(\bq)\left[ {\bf C}(\bq,\dot\bq) \dot\bq + {\bf G}(\bq) \right]\\ &+ {\bf H}^{-1}(\bq) {\bf B}(\bq) \bu.\end{align*} Because ${\bf H}^{-1}(\bq)$ is always full rank, we find that a system described by the manipulator equations is fully-actuated if and only if ${\bf B}(\bq)$ is full row rank. For this particular example, $\bq = [\theta_1,\theta_2]^T$ and $\bu = [\tau_1,\tau_2]^T$, and ${\bf B}(\bq) = {\bf I}_{2 \times 2}$. The system is fully actuated.

make drake elements expandable/collapsible. nice example here: http://quhno.internetstrahlen.de/myopera/csstests/collapsible-paragraph.html

MATLAB Example

I personally learn best when I can experiment and get some physical intuition. The companion software for the course should make it easy for you to see this system in action. To try it, make sure you've installed , then open MATLAB and run addpath_drake (possibly from your startup.m) in the drake directory, then try the following lines (one at a time) from the MATLAB command line:

% a platform independent way to cd into the example directory

% first construct the "robot" or "plant" object
plant = DoublePendPlant;  % check out the code in DoublePendPlant.m

% a visualizer will draw the robot
visualizer = DoublePendVisualizer(plant);
visualizer.inspector();  % (optional) simple gui to examine the joints

% simulate the robot from time=0 to time=5 seconds from random
% initial conditions taken from a Gaussian distribution (using randn).
% note that the first simulation you run will take longer because
% simulink needs to be started
trajectory = simulate(plant, [0 5], randn(4,1));

% play back the simulation using the cpu clock to get the timing right

% if you want to access the manipulator equations programmatically,
% you can use, e.g.:
[H,C_times_v,G,B] = plant.manipulatorEquations()
% which outputs using the abbreviations v for qdot, s1 for sin(q1), etc

MATLAB Example (an even simpler version)

Note that you don't have to write the dynamics yourself. The software supports a high-level description format for describing robots. We could have alternatively done:


% construct the robot from a URDF high-level description file
plant = RigidBodyManipulator('SimpleDoublePendulum.urdf');
visualizer = plant.constructVisualizer();

% all of the other commands will work, e.g.:
trajectory = simulate(plant, [0 5], randn(4,1));
[H,C_times_v,G,B] = plant.manipulatorEquations()

Make sure you take a peek at the file SimpleDoublePendulum.urdf. Documentation for the input format is available on the drake wiki; it supports even quite complex robots.

While the basic double pendulum is fully actuated, imagine the somewhat bizarre case that we have a motor to provide torque at the elbow, but no motor at the shoulder. In this case, we have $\bu = \tau_2$, and ${\bf B}(\bq) = [0,1]^T$. This system is clearly underactuated. While it may sound like a contrived example, it turns out that it is almost exactly the dynamics we will use to study as our simplest model of walking later in the class.

The matrix ${\bf f}_2$ in equation \ref{eq:f1_plus_f2} always has dim$[\bq]$ rows, and dim$[\bu]$ columns. Therefore, as in the example, one of the most common cases for underactuation, which trivially implies that ${\bf f}_2$ is not full row rank, is dim$[\bu] < $ dim$[\bq]$. This is the case when a robot has joints with no motors. But this is not the only case. The human body, for instance, has an incredible number of actuators (muscles), and in many cases has multiple muscles per joint; despite having more actuators than position variables, when I jump through the air, there is no combination of muscle inputs that can change the ballistic trajectory of my center of mass (barring aerodynamic effects). My control system is underactuated.

For completeness, let's generalize the definition of underactuation to systems beyond the second-order control affine systems.

Underactuated Control Differential Equations

An $n$th-order control differential equation control described by the equations \begin{equation} \frac{d^n{\bf q}}{dt^n} = f({\bf q}, ..., \frac{d^{n-1} {\bf q}}{dt^{n-1}}, t, {\bf u}) \end{equation} is fully actuated in state ${\bf x} = ({\bf q}, ..., \frac{d^{n-1} {\bf q}}{dt^{n-1}})$ and time $t$ if the resulting map ${\bf f}$ is surjective: for every $\frac{d^n{\bf q}}{dt^n} $ there exists a ${\bf u}$ which produces the desired response. Otherwise it is underactuated.

It is easy to see that equation \ref{eq:underactuated_def} is a sufficient condition for underactuation. This definition can also be extended to discrete-time systems and/or differential inclusions.

A quick note about notation. When describing the dynamics of rigid-body systems in this class, I will use $\bq$ for configurations (positions), $\dot{\bq}$ for velocities, and use $\bx$ for the full state ($\bx = [\bq^T,\dot{\bq}^T]^T$). There is an important limitation to this convention, described in the Appendix, but it will keep the notes cleaner. Unless otherwise noted, vectors are always treated as column vectors. Vectors and matrices are bold (scalars are not).

Feedback Linearization

Fully-actuated systems are dramatically easier to control than underactuated systems. The key observation is that, for fully-actuated systems with known dynamics (e.g., ${\bf f}_1$ and ${\bf f}_2$ are known for a second-order control-affine system), it is possible to use feedback to effectively change a nonlinear control problem into a linear control problem. The field of linear control is incredibly advanced, and there are many well-known solutions for controlling linear systems.

The trick is called feedback linearization. When ${\bf f}_2$ is full row rank, it is invertible. Consider the nonlinear feedback law: $$\bu = {\bf \pi}(\bq,\dot\bq,t) = {\bf f}_2^{-1}(\bq,\dot\bq,t) \left[ \bu' - {\bf f}_1(\bq,\dot\bq,t) \right],$$ where $\bu'$ is some additional control input. Applying this feedback controller to equation \ref{eq:f1_plus_f2} results in the linear, decoupled, second-order system: $$\ddot{\bq} = \bu'.$$ In other words, if ${\bf f}_1$ and ${\bf f}_2$ are known and ${\bf f}_2$ is invertible, then we say that the system is "feedback equivalent" to $\ddot{\bq} = \bu'$. There are a number of strong results which generalize this idea to the case where ${\bf f}_1$ and ${\bf f}_2$ are estimated, rather than known (e.g, Slotine90).

Feedback-Cancellation Double Pendulum

Let's say that we would like our simple double pendulum to act like a simple single pendulum (with damping), whose dynamics are given by: \begin{align*} \ddot \theta_1 &= -\frac{g}{l}\cos\theta_1 -b\dot\theta_1 \\ \ddot\theta_2 &= 0. \end{align*} This is easily achieved using Note that our chosen dynamics do not actually stabilize $\theta_2$ - this detail was left out for clarity, but would be necessary for any real implementation. $$\bu = {\bf B}^{-1}\left[ {\bf C}\dot{\bq} + {\bf G} + {\bf H}\begin{bmatrix} -\frac{g}{l}c_1 - b\dot{q}_1 \\ 0 \end{bmatrix} \right].$$

Since we are embedding a nonlinear dynamics (not a linear one), we refer to this as "feedback cancellation", or "dynamic inversion". This idea can, and does, make control look easy - for the special case of a fully-actuated deterministic system with known dynamics. For example, it would have been just as easy for me to invert gravity. Observe that the control derivations here would not have been any more difficult if the robot had 100 joints.

MATLAB Example

You can find scripts which implement both of these examples in :

runSimplePend;    % feedback linearization to simulate a pendulum
runSimplePendInv; % and a gravity-inverted pendulum

The underactuated systems are not feedback linearizable. Therefore, unlike fully-actuated systems, the control designer has no choice but to reason about the nonlinear dynamics of the plant in the control design. This dramatically complicates feedback controller design.

Input and State Constraints

Although the dynamic constraints due to missing actuators certainly embody the spirit of this course, many of the systems we care about could be subject to other dynamic constraints as well. For example, the actuators on our machines may only be mechanically capable of producing some limited amount of torque, or there may be a physical obstacle in the free space with which we cannot permit our robot to come into contact with.

Input and State Constraints

A dynamical system described by $\dot{\bx} = {\bf f}(\bx,\bu,t)$ may be subject to one or more constraints described by ${\bf \phi}(\bx,\bu,t)\le0$.

In practice it can be useful to separate out constraints which depend only on the input, e.g. $\phi(\bu)\le0$, such as actuator limits, as they can often be easier to handle than state constraints. An obstacle in the environment might manifest itself as one or more constraints that depend only on position, e.g. $\phi(\bq)\le0$.

By our generalized definition of underactuation, we can see that input and state constraints can also cause a system to be underactuated.

Input limits

Consider the constrained second-order linear system \[ \ddot{x} = u, \quad |u| \le 1. \] By our definition, this system is underactuated. For example, there is no $u$ which can produce the acceleration $\ddot{x} = 2$.

Input and state constraints can complicate control design in similar ways to having an insufficient number of actuators, (i.e., further limiting the set of the feasible trajectories), and often require similar tools to find a control solution.

Nonholonomic constraints

You might have heard of the term "nonholonomic system" (see e.g. Bloch03), and be thinking about how nonholonomy relates to underactuation. Briefly, a nonholonomic constraint is a constraint of the form $\phi({\bf q}, {\bf \dot{q}}, t) = 0$, which cannot be integrated into a constraint of the form $\phi({\bf q}, t) = 0$ (a holonomic constraint). A nonholonomic constraint does not restrain the possible configurations of the system, but rather the manner in which those configurations can be reached. While a holonomic constraint reduces the number of degrees of freedom of a system by one, a nonholonomic constraint does not. An automobile or traditional wheeled robot provides a canonical example:

Wheeled robot

Consider a simple model of a wheeled robot whose configuration is described by its Cartesian position $x,y$ and its orientation, $\theta$, so ${\bf q} = \begin{bmatrix} x, y, \theta \end{bmatrix}^T$. The system is subject to a differential constraint that prevents side-slip, \begin{gather*} \dot{x} = v \cos\theta \\ \dot{y} = v \sin\theta \\ v = \sqrt{\dot{x}^2 + \dot{y}^2} \end{gather*} or equivalently, \[\dot{y} \cos \theta - \dot x \sin \theta = 0.\] This constraint cannot be integrated into a constraint on configuration—the car can get to any configuration $(x,y,\theta)$, it just can't move directly sideways—so this is a nonholonomic constraint.

Contrast the wheeled robot example with a robot on train tracks. The train tracks correspond to a holonomic constraint: the track constraint can be written directly in terms of the configuration ${\bf q}$ of the system, without using the velocity ${\bf \dot{q}}$. Even though the track constraint could also be written as a differential constraint on the velocity, it would be possible to integrate this constraint to obtain a constraint on configuration. The track restrains the possible configurations of the system.

A nonholonomic constraint like the no-side-slip constraint on the wheeled vehicle certainly results in an underactuated system. The converse is not necessarily true—the double pendulum system which is missing an actuator is underactuated but would not typically be called a nonholonomic system. Note that the Lagrangian equations of motion are a constraint of the form \[{\bf \phi}({\bf q},{\bf \dot{q}},{\bf \ddot{q}},t) = 0,\] so do not qualify as a nonholonomic constraint.

Underactuated robotics

The control of underactuated systems is an open and interesting problem in controls. Although there are a number of special cases where underactuated systems have been controlled, there are relatively few general principles. Now here's the rub... most of the interesting problems in robotics are underactuated:

  • Legged robots are underactuated. Consider a legged machine with $N$ internal joints and $N$ actuators. If the robot is not bolted to the ground, then the degrees of freedom of the system include both the internal joints and the six degrees of freedom which define the position and orientation of the robot in space. Since $\bu \in \Re^N$ and $\bq \in \Re^{N+6}$, equation \ref{eq:underactuated_def} is satisfied.
  • (Most) Swimming and flying robots are underactuated. The story is the same here as for legged machines. Each control surface adds one actuator and one DOF. And this is already a simplification, as the true state of the system should really include the (infinite-dimensional) state of the flow.
  • Robot manipulation is (often) underactuated. Consider a fully-actuated robotic arm. When this arm is manipulating an object with degrees of freedom (even a brick has six), it can become underactuated. If force closure is achieved, and maintained, then we can think of the system as fully-actuated, because the degrees of freedom of the object are constrained to match the degrees of freedom of the hand. That is, of course, unless the manipulated object has extra DOFs.

Even control systems for fully-actuated and otherwise unconstrained systems can be improved using the lessons from underactuated systems, particularly if there is a need to increase the efficiency of their motions or reduce the complexity of their designs.

Goals for the course

This course is based on the observation that there are new computational tools from optimization theory, control theory, motion planning, and even machine learning which can be used to design feedback control for underactuated systems. The goal of this class is to develop these tools in order to design robots that are more dynamic and more agile than the current state-of-the-art.

The target audience for the class includes both computer science and mechanical/aero students pursuing research in robotics. Although I assume a comfort with linear algebra, ODEs, and MATLAB, the course notes aim to provide most of the material and references required for the course.

The Simple Pendulum


Our goals for this chapter are modest: we'd like to understand the dynamics of a pendulum.

Why a pendulum? In part, because the dynamics of a majority of our multi-link robotics manipulators are simply the dynamics of a large number of coupled pendula. Also, the dynamics of a single pendulum are rich enough to introduce most of the concepts from nonlinear dynamics that we will use in this text, but tractable enough for us to (mostly) understand in the next few pages.

The simple pendulum

The Lagrangian derivation of the equations of motion (as described in the appendix) of the simple pendulum yields: \begin{equation*} m l^2 \ddot\theta(t) + mgl\sin{\theta(t)} = Q. \end{equation*} We'll consider the case where the generalized force, $Q$, models a damping torque (from friction) plus a control torque input, $u(t)$: $$Q = -b\dot\theta(t) + u(t).$$

Nonlinear dynamics with a constant torque

Let us first consider the dynamics of the pendulum if it is driven in a particular simple way: a torque which does not vary with time: \begin{equation} ml^2 \ddot\theta + b\dot\theta + mgl \sin\theta = u_0. \end{equation} You can experiment with this system in using

% hit play to start the simulation, and click on the boxes labeled as
% 'slider' to control the torque and damping.
% if you get an error about not keeping up with real time, then
% double-click on the realtime block and reduce the realtime factor
% (e.g. to .5).

These are relatively simple differential equations, so if I give you $\theta(0)$ and $\dot\theta(0)$, then you should be able to integrate them to obtain $\theta(t)$... right? Although it is possible, integrating even the simplest case ($b = u = 0$) involves elliptic integrals of the first kind; there is relatively little intuition to be gained here.

This is in stark contrast to the case of linear systems, where much of our understanding comes from being able to explicitly integrate the equations. For instance, for a simple linear system we have $$\dot{q} = a q \quad \rightarrow \quad q(t) = q(0) e^{at},$$ and we can immediately understand that the long-term behavior of the system is a (stable) decaying exponential if $a<0$, an (unstable) growing exponential if $a>0$, and that the system does nothing if $a=0$. Here we are with certainly one of the simplest nonlinear systems we can imagine, and we can't even solve this system?

All is not lost. If what we care about is the long-term behavior of the system, then there are a number of techniques we can apply. In this chapter, we will start by investigating graphical solution methods. These methods are described beautifully in a book by Steve StrogatzStrogatz94.

The overdamped pendulum

Let's start by studying a special case -- intuitively when $b\dot\theta \gg ml^2\ddot\theta$ -- which via dimensional analysis (using the characteristic time constant $\sqrt{\frac{l}{g}}$ to match units) occurs when $b \sqrt\frac{g}{l} \gg ml^2$. This is the case of heavy damping, for instance if the pendulum was moving in molasses. In this case, the damping term dominates the acceleration term, and we have: $$ml^2 \ddot\theta + b\dot\theta \approx b\dot\theta = u_0 - mgl\sin\theta.$$ In other words, in the case of heavy damping, the system looks approximately first order. This is a general property of heavily damped systems, such as fluids at very low Reynolds number.

I'd like to ignore one detail for a moment: the fact that $\theta$ wraps around on itself every $2\pi$. To be clear, let's write the system without the wrap-around as: \begin{equation}b\dot{x} = u_0 - mgl\sin{x}.\label{eq:overdamped_pend_ct}\end{equation} Our goal is to understand the long-term behavior of this system: to find $x(\infty)$ given $x(0)$. Let's start by plotting $\dot{x}$ vs $x$ for the case when $u_0=0$:

The first thing to notice is that the system has a number of fixed points or steady states, which occur whenever $\dot{x} = 0$. In this simple example, the zero-crossings are $x^* = \{..., -\pi, 0, \pi, 2\pi, ...\}$. When the system is in one of these states, it will never leave that state. If the initial conditions are at a fixed point, we know that $x(\infty)$ will be at the same fixed point.

Next let's investigate the behavior of the system in the local vicinity of the fixed points. Examining the fixed point at $x^* = \pi$, if the system starts just to the right of the fixed point, then $\dot{x}$ is positive, so the system will move away from the fixed point. If it starts to the left, then $\dot{x}$ is negative, and the system will move away in the opposite direction. We'll call fixed-points which have this property unstable. If we look at the fixed point at $x^* = 0$, then the story is different: trajectories starting to the right or to the left will move back towards the fixed point. We will call this fixed point locally stable. More specifically, we'll distinguish between three types of local stability:

  • Locally stable in the sense of Lyapunov (i.s.L.). A fixed point, $x^*$ is locally stable i.s.L. if for every small $\epsilon > 0$, I can produce a $\delta > 0$ such that if $\| x(0) - x^* \| < \delta$ then $\forall t$ $\| x(t) - x^*\| < \epsilon$. In words, this means that for any ball of size $\epsilon$ around the fixed point, I can create a ball of size $\delta$ which guarantees that if the system is started inside the $\delta$ ball then it will remain inside the $\epsilon$ ball for all of time.
  • Locally asymptotically stable. A fixed point is locally asymptotically stable if $x(0) = x^* + \epsilon$ implies that $\lim_{t\rightarrow \infty} x(t) = x^*$.
  • Locally exponentially stable. A fixed point is locally exponentially stable if $x(0) = x^* + \epsilon$ implies that $\| x(t) - x^* \| < Ce^{-\alpha t}$, for some positive constants $C$ and $\alpha$.

An initial condition near a fixed point that is stable in the sense of Lyapunov may never reach the fixed point (but it won't diverge), near an asymptotically stable fixed point will reach the fixed point as $t \rightarrow \infty$, and near an exponentially stable fixed point will reach the fixed point with a bounded rate. An exponentially stable fixed point is also an asymptotically stable fixed point, but the converse is not true. Asymptotic stability and Lyapunov stability, however, are distinct notions -- it is actually possible to have a system that is asymptotically stable but not stable i.s.L.†† we can't see that in one dimension so will have to hold that example for a moment. Systems which are stable i.s.L. but not asymptotically stable are easy to construct (e.g. $\dot{x} = 0$). Interestingly, it is also possible to have nonlinear systems that converge (or diverge) in finite-time; a so-called finite-time stability; we will see examples of this later in the book, but it is a difficult topic to penetrate with graphical analysis. Rigorous nonlinear system analysis is rich with subtleties and surprises.

Our graph of $\dot{x}$ vs. $x$ can be used to convince ourselves of i.s.L. and asymptotic stability by visually inspecting $\dot{x}$ in the vicinity of a fixed point. Even exponential stability can be inferred if the function can be bounded away from the origin by a negatively-sloped line through the fixed point, since it implies that the nonlinear system will converge at least as fast as the linear system represented by the straight line. I will graphically illustrate unstable fixed points with open circles and stable fixed points (i.s.L.) with filled circles.

Next, we need to consider what happens to initial conditions which begin farther from the fixed points. If we think of the dynamics of the system as a flow on the $x$-axis, then we know that anytime $\dot{x} > 0$, the flow is moving to the right, and $\dot{x} < 0$, the flow is moving to the left. If we further annotate our graph with arrows indicating the direction of the flow, then the entire (long-term) system behavior becomes clear:

For instance, we can see that any initial condition $x(0) \in (\pi,\pi)$ will result in $\lim_{t\rightarrow \infty} x(t) = 0$. This region is called the basin of attraction of the fixed point at $x^* = 0$. Basins of attraction of two fixed points cannot overlap, and the manifold separating two basins of attraction is called the separatrix. Here the unstable fixed points, at $x^* = \{.., -\pi, \pi, 3\pi, ...\}$ form the separatrix between the basins of attraction of the stable fixed points.

As these plots demonstrate, the behavior of a first-order one dimensional system on a line is relatively constrained. The system will either monotonically approach a fixed-point or monotonically move toward $\pm \infty$. There are no other possibilities. Oscillations, for example, are impossible. Graphical analysis is a fantastic analysis tool for many first-order nonlinear systems (not just pendula); as illustrated by the following example:

Nonlinear autapse

Consider the following system: \begin{equation} \dot{x} + x = \tanh(w x), \end{equation} which is plotted below for a few values of $w$. It's convenient to note that $\tanh(z) \approx z$ for small $z$. For $w \le 1$ the system has only a single fixed point. For $w > 1$ the system has three fixed points : two stable and one unstable.
These equations are not arbitrary - they are actually a model for one of the simplest neural networks, and one of the simplest model of persistent memorySeung00. In the equation $x$ models the firing rate of a single neuron, which has a feedback connection to itself. $\tanh$ is the activation (sigmoidal) function of the neuron, and $w$ is the weight of the synaptic feedback.

One last piece of terminology. In the neuron example, and in many dynamical systems, the dynamics were parameterized; in this case by a single parameter, $w$. As we varied $w$, the fixed points of the system moved around. In fact, if we increase $w$ through $w=1$, something dramatic happens - the system goes from having one fixed point to having three fixed points. This is called a bifurcation. This particular bifurcation is called a pitchfork bifurcation. We often draw bifurcation diagrams which plot the fixed points of the system as a function of the parameters, with solid lines indicating stable fixed points and dashed lines indicating unstable fixed points, as seen in the figure:

bifurcation diagram asymptotes to $x^* = 1$
Bifurcation diagram of the nonlinear autapse.

Our pendulum equations also have a (saddle-node) bifurcation when we change the constant torque input, $u_0$. Finally, let's return to the original equations in $\theta$, instead of in $x$. Only one point to make: because of the wrap-around, this system will appear have oscillations. In fact, the graphical analysis reveals that the pendulum will turn forever whenever $|u_0| > \frac{mgl}{b}$, but now you understand that this is not an oscillation, but an instability with $\theta \rightarrow \pm \infty$.

The undamped pendulum with zero torque

Consider again the system $$ml^2 \ddot\theta = u_0 - mgl \sin\theta - b\dot\theta,$$ this time with $b = 0$. This time the system dynamics are truly second-order. We can always think of any second-order system as (coupled) first-order system with twice as many variables. Consider a general, autonomous (not dependent on time), second-order system, $$\ddot{q} = f(q,\dot q,u).$$ This system is equivalent to the two-dimensional first-order system \begin{align*} \dot x_1 =& x_2 \\ \dot x_2 =& f(x_1,x_2,u), \end{align*} where $x_1 = q$ and $x_2 = \dot q$. Therefore, the graphical depiction of this system is not a line, but a vector field where the vectors $[\dot x_1, \dot x_2]^T$ are plotted over the domain $(x_1,x_2)$. This vector field is known as the phase portrait of the system.

In this section we restrict ourselves to the simplest case when $u_0 = 0$. Let's sketch the phase portrait. First sketch along the $\theta$-axis. The $x$-component of the vector field here is zero, the $y$-component is $-mgl\sin\theta.$ As expected, we have fixed points at $\pm \pi, ...$ Now sketch the rest of the vector field. Can you tell me which fixed points are stable? Some of them are stable i.s.L., none are asymptotically stable.

Orbit calculations

You might wonder how we drew the black contour lines in the figure above. We could have obtained them by simulating the system numerically, but those lines can be easily obtained in closed-form. Directly integrating the equations of motion is difficult, but at least for the case when $u_0 = 0$, we have some additional physical insight for this problem that we can take advantage of. The kinetic energy, $T$, and potential energy, $U$, of the pendulum are given by $$T = \frac{1}{2}I\dot\theta^2, \quad U = -mgl\cos(\theta),$$ where $I=ml^2$ and the total energy is $E(\theta,\dot\theta) = T(\dot\theta)+U(\theta)$. The undamped pendulum is a conservative system: total energy is a constant over system trajectories. Using conservation of energy, we have: \begin{gather*} E(\theta(t),\dot\theta(t)) = E(\theta(0),\dot\theta(0)) = E_0 \\ \frac{1}{2} I \dot\theta^2(t) - mgl\cos(\theta(t)) = E_0 \\ \dot\theta(t) = \pm \sqrt{\frac{2}{I}\left[E_0 + mgl\cos\left(\theta(t)\right)\right]} \end{gather*} Using this, if you tell me $\theta$ I can determine one of two possible values for $\dot\theta$, and the solution has all of the richness of the black countour lines from the plot. This equation has a real solution when $\cos(\theta) > \cos(\theta_{max})$, where $$\theta_{max} = \begin{cases} \cos^{-1}\left( \frac{E}{mgl} \right), & E < mgl \\ \pi, & \text{otherwise}. \end{cases}$$ Of course this is just the intuitive notion that the pendulum will not swing above the height where the total energy equals the potential energy. As an exercise, you can verify that differentiating this equation with respect to time indeed results in the equations of motion.

The undamped pendulum with a constant torque

Now what happens if we add a constant torque? If you visualize the bifurcation diagram, you'll see that the fixed points come together, towards $q = \frac{\pi}{2}, \frac{5\pi}{2}, ...$, until they disappear. One fixed-point is unstable, and one is stable.

Before we continue, let me know give you the promised example of a system that is asymptotically stable, but not stable i.s.L.. We can accomplish this with a very pendulum-like example (written here in polar coordinates):

Asymptotically stable, but not Lyapunov stable

The system \[ \dot{r} = r(1-r) \\ \dot\theta = \sin^2 (\frac{\theta}{2}) \] is asymptotically stable to $x^* = [1,0]^T$, but is not stable i.s.L.

Take a minute to draw the vector field of this (you can draw each coordinate independently, if it helps) to make sure you understand. Note that to wrap-around rotation is convenient but not essential -- we could have written the same dynamical system in cartesian coordinates without wrapping.

The damped pendulum

Now let's add damping back. You can still add torque to move the fixed points (in the same way).

Phase diagram for the damped pendulum

With damping, the downright fixed point of the pendulum now becomes asymptotically stable (in addition to stable i.s.L.). Is it also exponentially stable? How can we tell? One technique is to linearize the system at the fixed point. A smooth, time-invariant, nonlinear system that is locally exponentially stable must have a stable linearization; we'll discuss linearization more in the next chapter.

Here's a thought exercise. If $u$ is no longer a constant, but a function $\pi(q,\dot{q})$, then how would you choose $\pi$ to stabilize the vertical position. Feedback linearization is the trivial solution, for example: $$u = \pi(q,\dot{q}) = 2 \frac{g}{l}\cos\theta.$$ But these plots we've been making tell a different story. How would you shape the natural dynamics - at each point pick a $u$ from the stack of phase plots - to stabilize the vertical fixed point with minimal torque effort? This is exactly the way that I would like you to think about control system design. And we'll give you your first solution techniques -- using dynamic programming -- in the next lecture.

The torque-limited simple pendulum

The simple pendulum is fully actuated. Given enough torque, we can produce any number of control solutions to stabilize the originally unstable fixed point at the top (such as designing a feedback law to effectively invert gravity).

The problem begins to get interesting (a.k.a. becomes underactuated) if we impose a torque-limit constraint, $|u|\le u_{max}$. Looking at the phase portraits again, you can now visualize the control problem. Via feedback, you are allowed to change the direction of the vector field at each point, but only by a fixed amount. Clearly, if the maximum torque is small (smaller than $mgl$), then there are some states which cannot be driven directly to the goal, but must pump up energy to reach the goal. Furthermore, if the torque-limit is too severe and the system has damping, then it may be impossible to swing up to the top. The existence of a solution, and number of pumps required to reach the top, is a non-trivial function of the initial conditions and the torque-limits.

Although this system is very simple, its solution requires much of the same reasoning necessary for controlling much more complex underactuated systems; this problem will be a work-horse for us as we introduce new algorithms throughout this book.

Acrobots, Cart-Poles, and Quadrotors

A great deal of work in the control of underactuated systems has been done in the context of low-dimensional model systems. These model systems capture the essence of the problem without introducing all of the complexity that is often involved in more real-world examples. In this chapter we will focus on two of the most well-known and well-studied model systems--the Acrobot and the Cart-Pole. After we have developed some tools, we will see that they can be applied directly to other model systems; we will give a number of examples using Quadrotors. All of these systems are trivially underactuated, having less actuators than degrees of freedom.

The Acrobot

The Acrobot is a planar two-link robotic arm in the vertical plane (working against gravity), with an actuator at the elbow, but no actuator at the shoulder. It was first described in detail in Murray91. The companion system, with an actuator at the shoulder but not at the elbow, is known as the PendubotSpong97. The Acrobot is so named because of its resemblance to a gymnast (or acrobat) on a parallel bar, who controls his/her motion predominantly by effort at the waist (and not effort at the wrist). The most common control task studied for the Acrobot is the swing-up task, in which the system must use the elbow (or waist) torque to move the system into a vertical configuration then balance.

The Acrobot. Click here for an animation of the swing-up task or here to see a real robot.

The Acrobot is representative of the primary challenge in underactuated robots. In order to swing up and balance the entire system, the controller must reason about and exploit the state-dependent coupling between the actuated degree of freedom and the unactuated degree of freedom. It is also an important system because, as we will see, it closely resembles one of the simplest models of a walking robot.

Equations of motion

Figure 3.1 illustrates the model parameters used in our analysis. $\theta_1$ is the shoulder joint angle, $\theta_2$ is the elbow (relative) joint angle, and we will use $\bq = [\theta_1,\theta_2]^T$, $\bx = [\bq,\dot\bq]^T$. The zero configuration is with both links pointed directly down. The moments of inertia, $I_1,I_2$ are taken about the pivots. The task is to stabilize the unstable fixed point $\bx = [\pi,0,0,0]^T$.

We will derive the equations of motion for the Acrobot using the method of Lagrange. The kinematics are given by: \begin{equation} \bx_1 = \begin{bmatrix} l_1 s_1 \\ -l_1 c_1 \end{bmatrix}, \quad \bx_2 = \bx_1 + \begin{bmatrix} l_2 s_{1+2} \\ - l_2 c_{1+2} \end{bmatrix} . \end{equation} The energy is given by: \begin{gather} T = T_1 + T_2, \quad T_1 = \frac{1}{2} I_1 \dot{q}_1^2 \\ T_2 = \frac{1}{2} ( m_2 l_1^2 + I_2 + 2 m_2 l_1 l_{c2} c_2 ) \dot{q}_1^2 + \frac{1}{2} I_2 \dot{q}_2^2 + (I_2 + m_2 l_1 l_{c2} c_2 ) \dot{q}_1 \dot{q}_2 \\ % from expanding sum of point masses U = -m_1 g l_{c1} c_1 - m_2 g (l_1 c_1 + l_{c2} c_{1+2}) \end{gather} Entering these quantities into the Lagrangian yields the equations of motion: \begin{gather} (I_1 + I_2 + m_2 l_1^2 + 2m_2 l_1 l_{c2} c_2) \ddot{q}_1 + (I_2 + m_2 l_1 l_{c2} c_2)\ddot{q}_2 - 2m_2 l_1 l_{c2} s_2 \dot{q}_1 \dot{q}_2 \\ \quad -m_2 l_1 l_{c2} s_2 \dot{q}_2^2 + m_1 g l_{c1}s_1 + m_2 g (l_1 s_1 + l_{c2} s_{1+2}) = 0 \\ (I_2 + m_2 l_1 l_{c2} c_2) \ddot{q}_1 + I_2 \ddot{q}_2 + m_2 l_1 l_{c2} s_2 \dot{q}_1^2 + m_2 g l_{c2} s_{1+2} = \tau \end{gather} In standard, manipulator equation form, we have: \begin{gather} {\bf H}(\bq) = \begin{bmatrix} I_1 + I_2 + m_2 l_1^2 + 2m_2 l_1 l_{c2} c_2 & I_2 + m_2 l_1 l_{c2} c_2 \\ I_2 + m_2 l_1 l_{c2} c_2 & I_2 \end{bmatrix},\label{eq:Hacrobot}\\{\bf C}(\bq,\dot{\bq}) = \begin{bmatrix} -2 m_2 l_1 l_{c2} s_2 \dot{q}_2 & -m_2 l_1 l_{c2} s_2 \dot{q}_2 \\ m_2 l_1 l_{c2} s_2 \dot{q}_1 & 0 \end{bmatrix}, \\ {\bf G}(\bq) = \begin{bmatrix} m_1 g l_{c1}s_1 + m_2 g (l_1 s_1 + l_{c2}s_{1+2}) \\ m_2 g l_{c2} s_{1+2} \end{bmatrix}, \quad {\bf B} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}. \end{gather}

The Acrobot in MATLAB

You can experiment with the Acrobot dynamics in using, e.g.


[H,C_times_v,G,B] = plant.manipulatorEquations()

runPassive();  % simulate with torque=0 (and display the results)

The Cart-Pole system

The other model system that we will investigate here is the cart-pole system, in which the task is to balance a simple pendulum around its unstable equilibrium, using only horizontal forces on the cart. Balancing the cart-pole system is used in many introductory courses in control, including 6.003 at MIT, because it can be accomplished with simple linear control (e.g. pole placement) techniques. In this chapter we will consider the full swing-up and balance control problem, which requires a full nonlinear control treatment.

The Cart-Pole system. Click here to see a real robot.
add swing-up + balance swf

The figure shows our parameterization of the system. $x$ is the horizontal position of the cart, $\theta$ is the counter-clockwise angle of the pendulum (zero is hanging straight down). We will use $\bq = [x,\theta]^T$, and $\bx = [\bq,\dot\bq]^T$. The task is to stabilize the unstable fixed point at $\bx = [0,\pi,0,0]^T.$

Equations of motion

The kinematics of the system are given by \begin{equation}\bx_1 = \begin{bmatrix} x \\ 0 \end{bmatrix}, \quad \bx_2 = \begin{bmatrix} x + l\sin\theta \\ -l\cos\theta \end{bmatrix}. \end{equation} The energy is given by \begin{align} T=& \frac{1}{2} (m_c + m_p)\dot{x}^2 + m_p \dot{x}\dot\theta l \cos{\theta} + \frac{1}{2}m_p l^2 \dot\theta^2 \\ U =& -m_p g l \cos\theta. \end{align} The Lagrangian yields the equations of motion: \begin{gather} (m_c + m_p)\ddot{x} + m_p l \ddot\theta \cos\theta - m_p l \dot\theta^2 \sin\theta = f \\ m_p l \ddot{x} \cos\theta + m_p l^2 \ddot\theta + m_p g l \sin\theta = 0 \end{gather} In standard form, using $\bq = [x,\theta]^T$, $\bu = f$: $${\bf H}(\bq)\ddot\bq + {\bf C}(\bq,\dot\bq)\dot\bq + {\bf G}(\bq) = {\bf B}\bu,$$ where \begin{gather*} {\bf H}(\bq) = \begin{bmatrix} m_c + m_p & m_p l \cos\theta \\ m_p l \cos\theta & m_p l^2 \end{bmatrix}, \quad {\bf C}(\bq,\dot{\bq}) = \begin{bmatrix} 0 & -m_p l \dot\theta \sin\theta \\ 0 & 0 \end{bmatrix}, \\ {\bf G}(\bq) = \begin{bmatrix} 0 \\ m_p gl \sin\theta \end{bmatrix}, \quad {\bf B} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \end{gather*} In this case, it is particularly easy to solve directly for the accelerations: \begin{align} \ddot{x} =& \frac{1}{m_c + m_p \sin^2\theta}\left[ f+m_p \sin\theta (l \dot\theta^2 + g\cos\theta)\right] \label{eq:ddot_x}\\ \ddot{\theta} =& \frac{1}{l(m_c + m_p \sin^2\theta)} \left[ -f \cos\theta - m_p l \dot\theta^2 \cos\theta \sin\theta - (m_c + m_p) g \sin\theta \right] \label{eq:ddot_theta} \end{align} In some of the analysis that follows, we will study the form of the equations of motion, ignoring the details, by arbitrarily setting all constants to 1: \begin{gather} 2\ddot{x} + \ddot\theta \cos\theta - \dot\theta^2 \sin\theta = f \label{eq:simple}\\ \ddot{x}\cos\theta + \ddot\theta + \sin\theta = 0. \label{eq:simple2} \end{gather}

The Cart-Pole System in MATLAB

You can experiment with the Cart-Pole dynamics in using, e.g.


[H,C_times_v,G,B] = plant.manipulatorEquations()

runPassive();  % simulate with force=0 (and display the results)


For both the Acrobot and the Cart-Pole systems, we will begin by designing a linear controller which can balance the system when it begins in the vicinity of the unstable fixed point. To accomplish this, we will linearize the nonlinear equations about the fixed point, examine the controllability of this linear system, then using linear quadratic regulator (LQR) theory to design our feedback controller.

Linearizing the manipulator equations

Although the equations of motion of both of these model systems are relatively tractable, the forward dynamics still involve quite a few nonlinear terms that must be considered in any linearization. Let's consider the general problem of linearizing a system described by the manipulator equations.

We can perform linearization around a fixed point, $(\bx^*, \bu^*)$, using a Taylor expansion: \begin{equation} \dot\bx = {\bf f}(\bx,\bu) \approx {\bf f}(\bx^*,\bu^*) + \left[ \pd{\bf f}{\bx}\right]_{\bx=\bx^*,\bu=\bu^*} (\bx - \bx^*) + \left[ \pd{\bf f}{\bu}\right]_{\bx=\bx^*,\bu=\bu^*} (\bu - \bu^*) \end{equation} Let us consider the specific problem of linearizing the manipulator equations around a (stable or unstable) fixed point. In this case, ${\bf f}(\bx^*,\bu^*)$ is zero, and we are left with the standard linear state-space form: \begin{align} \dot\bx =& \begin{bmatrix} \dot\bq \\ {\bf H}^{-1}(\bq) \left[ {\bf B}(\bq)\bu - {\bf C}(\bq,\dot\bq)\dot\bq - {\bf G}(\bq) \right] \end{bmatrix},\\ \approx& {\bf A}_{lin} (\bx-\bx^*) + {\bf B}_{lin} (\bu - \bu^*), \end{align} where ${\bf A}_{lin}$, and ${\bf B}_{lin}$ are constant matrices. Let us define $\bar\bx = \bx - \bx^*, \bar\bu = \bu - \bu^*$, and write $$\dot{\bar\bx} = {\bf A}_{lin}\bar\bx + {\bf B}_{lin}\bar\bu.$$ Evaluation of the Taylor expansion around a fixed point yields the following, very simple equations, given in block form by: \begin{align} {\bf A}_{lin} =& \begin{bmatrix} {\bf 0} & {\bf I} \\ -{\bf H}^{-1} \pd{\bf G}{\bq} + \sum_{j} {\bf H}^{-1}\pd{{\bf B}_j}{\bq} u_j & -{\bf H}^{-1} {\bf C} \end{bmatrix}_{\bx=\bx^*,\bu=\bu^*} \\ {\bf B}_{lin} =& \begin{bmatrix} {\bf 0} \\ {\bf H}^{-1} {\bf B} \end{bmatrix}_{\bx=\bx^*, \bu=\bu^*} \end{align} where ${\bf B}_j$ is the $j$th column of ${\bf B}$. Note that the term involving $\pd{{\bf H}^{-1}}{q_i}$ disappears because ${\bf B}\bu - {\bf C}\dot{\bq} - {\bf G}$ must be zero at the fixed point. Many of the ${\bf C}\dot\bq$ derivatives drop out, too, because $\dot{\bq}^* = 0$. In many cases, including both the Acrobot and Cart-Pole systems (but not the Quadrotors), the matrix ${\bf B}(\bq)$ is a constant, so the $\pd{\bf B}{\bq}$ terms also drop out.

Linearization of the Acrobot

Linearizing around the (unstable) upright point, we have: \begin{gather} {\bf C}(\bq,\dot\bq)_{\bx=\bx^*} = {\bf 0},\\ \left[\pd{\bf G}{\bq}\right]_{\bx=\bx^*} = \begin{bmatrix} -g (m_1 l_{c1} + m_2 l_1 + m_2 l_{c2}) & -m_2 g l_{c2} \\ -m_2 g l_{c2} & -m_2 g l_{c2} \end{bmatrix} \end{gather} The linear dynamics follow directly from these equations and the manipulator form of the Acrobot equations.

Linearization of the Cart-Pole System

Linearizing around the unstable fixed point in this system, we have: \begin{gather} {\bf C}(\bq,\dot\bq)_{\bx=\bx^*} = {\bf 0},\quad \left[\pd{\bf G}{\bq}\right]_{\bx=\bx^*} = \begin{bmatrix} 0 & 0 \\ 0 & -m_p g l \end{bmatrix} \end{gather} Again, the linear dynamics follow simply.

Controllability of linear systems


A control system of the form $$\dot{\bx} = {\bf f}(\bx,\bu)$$ is called controllable if it is possible to construct an unconstrained input signal, $\bu(t)$, $t \in [0,t_f],$ which will move the system from any initial state to any final state in a finite interval of time, $0 < t < t_f$ Ogata96.

For the linear system $$\dot{\bx} = {\bf A}\bx + {\bf B}u,$$ we can integrate this linear system in closed form, so it is possible to derive the exact conditions of controllability. In particular, for linear systems it is sufficient to demonstrate that there exists a control input which drives any initial condition to the origin.

The special case of non-repeated eigenvalues

Let us first examine a special case, which falls short as a general tool but may be more useful for understanding the intuition of controllability. Let's perform an eigenvalue analysis of the system matrix ${\bf A}$, so that: $${\bf A}{\bf v}_i = \lambda_i {\bf v}_i,$$ where $\lambda_i$ is the $i$th eigenvalue, and ${\bf v}_i$ is the corresponding (right) eigenvector. There will be $n$ eigenvalues for the $n \times n$ matrix ${\bf A}$. Collecting the (column) eigenvectors into the matrix ${\bf V}$ and the eigenvalues into a diagonal matrix ${\bf \Lambda}$, we have $${\bf A}{\bf V} = {\bf V}{\bf \Lambda}.$$ Here comes our primary assumption: let us assume that each of these $n$ eigenvalues takes on a distinct value (no repeats). With this assumption, it can be shown that the eigenvectors ${\bf v}_i$ form a linearly independent basis set, and therefore ${\bf V}^{-1}$ is well-defined.

We can continue our eigenmodal analysis of the linear system by defining the modal coordinates, ${\bf r}$, with: $$\bx = {\bf V}{\bf r},\quad \text{or}\quad {\bf r} = {\bf V}^{-1}\bx.$$ In modal coordinates, the dynamics of the linear system are given by $$\dot{\bf r} = {\bf V}^{-1} {\bf A} {\bf V} {\bf r} + {\bf V}^{-1} {\bf B} \bu = {\bf \Lambda} {\bf r} + {\bf V}^{-1}{\bf B} \bu.$$ This illustrates the power of modal analysis; in modal coordinates, the dynamics diagonalize yielding independent linear equations: $$\dot{r}_i = \lambda_i r_i + \sum_j \beta_{ij} u_j,\quad {\bf \beta} = {\bf V}^{-1} {\bf B}.$$

Now the concept of controllability becomes clear. Input $j$ can influence the dynamics in modal coordinate $i$ if and only if ${\bf \beta}_{ij} \neq 0$. In the special case of non-repeated eigenvalues, having control over each individual eigenmode is sufficient to (in finite time) regulate all of the eigenmodesOgata96. Therefore, we say that the system is controllable if and only if $$\forall i, \exists j \text{ such that }\beta_{ij} \neq 0.$$

A general solution

A more general solution to the controllability issue, which removes our assumption about the eigenvalues, can be obtained by examining the time-domain solution of the linear equations. The solution of this system is $$\bx(t) = e^{{\bf A}t} \bx(0) + \int_0^{t} e^{{\bf A}(t - \tau)} {\bf B} u(\tau) d\tau.$$ Without loss of generality, lets consider the that the final state of the system is zero. Then we have: $$\bx(0) = - \int_0^{t_f} e^{-{\bf A}\tau}{\bf B}u(\tau) d\tau.$$ You might be wondering what we mean by $e^{{\bf A}t}$; a scalar raised to the power of a matrix..? Recall that $e^{z}$ is actually defined by a convergent infinite sum: $$e^{z} =1 + z + \frac{1}{2} z^2 + \frac{1}{6} z^3 + ... .$$ The notation $e^{{\bf A}t}$ uses the same definition: $$e^{{\bf A}t} = {\bf I} + {\bf A}t + \frac{1}{2}({\bf A}t)^2 + \frac{1}{6}({\bf A}t)^3 + ... .$$ Not surprisingly, this has many special forms. For instance, if ${\bf A}$ is diagonalizable, $e^{{\bf A}t} = {\bf V}e^{{\bf\Lambda}t}{\bf V}^{-1},$ where ${\bf A} = {\bf V \Lambda V}^{-1}$ is the eigenvalue decomposition of ${\bf A}$ Strang98. The particular form we will use here is $$e^{-{\bf A}\tau} = \sum_{k=0}^{n-1} \alpha_k(\tau) {\bf A}^k.$$ This is a particularly surprising form, because the infinite sum above is represented by this finite sum; the derivation uses Sylvester's Theorem Ogata96, Chen98a. Then we have, \begin{align*} \bx(0) =& - \sum_{k=0}^{n-1} {\bf A}^k {\bf B} \int_0^{t_f} \alpha_k(\tau) u(\tau) d\tau \\ =& -\sum_{k=0}^{n-1} {\bf A}^k {\bf B} w_k \text{, where } w_k = \int_0^{t_f} \alpha_k(\tau) u(\tau) d\tau \\ =& - \begin{bmatrix} {\bf B} & {\bf AB} & {\bf A}^2{\bf B} & \cdots & {\bf A}^{n-1}{\bf B} \end{bmatrix}_{n \times (nm)} \begin{bmatrix} w_0 \\ w_1 \\ w_2 \\ \vdots \\ w_{n-1} \end{bmatrix} \end{align*} The matrix containing the vectors ${\bf B}$, ${\bf AB}$, ... ${\bf A}^{n-1}{\bf B}$ is called the controllability matrix. In order for the system to be controllable, for every initial condition $\bx(0)$ we must be able to find the corresponding vector ${\bf w}$. This is only possible when the rows of the controllability matrix are linearly independent. Therefore, the condition of controllability is that this controllability matrix is full rank.

Although we only treated the case of a scalar $u$, it is possible to extend the analysis to a vector $\bu$ of size $m$, yielding the condition $$\text{rank} \begin{bmatrix} {\bf B} & {\bf AB} & {\bf A}^2{\bf B} & \cdots & {\bf A}^{n-1}{\bf B} \end{bmatrix}_{n \times (nm)} = n.$$ In Matlab (using the Control Systems Toolbox), you can obtain the controllability matrix using Cm = ctrb(A,B), and evaluate its rank with rank(Cm).

Note that a linear feedback to change the eigenvalues of the eigenmodes is not sufficient to accomplish our goal of getting to the goal in finite time. In fact, the open-loop control to reach the goal is easily obtained with a final-value LQR problem, and (for ${\bf R}={\bf I}$) is actually a simple function of the controllability GrammianChen98a.

Controllability vs. underactuated

Analysis of the controllability of both the Acrobot and Cart-Pole systems reveals that the linearized dynamics about the upright are, in fact, controllable. This implies that the linearized system, if started away from the zero state, can be returned to the zero state in finite time. This is potentially surprising - after all the systems are underactuated. For example, it is interesting and surprising that the Acrobot can balance itself in the upright position without having a shoulder motor.

The controllability of these model systems demonstrates an extremely important, point: An underactuated system is not necessarily an uncontrollable system. Underactuated systems cannot follow arbitrary trajectories, but that does not imply that they cannot arrive at arbitrary points in state space. However, the trajectory required to place the system into a particular state may be arbitrarily complex.

The controllability analysis presented here is for linear time-invariant (LTI) systems. A comparable analysis exists for linear time-varying (LTV) systems. We will even see extensions to nonlinear systems; although it will often be referred to by the synonym of "reachability" analysis.

LQR feedback

Controllability tells us that a trajectory to the fixed point exists, but does not tell us which one we should take or what control inputs cause it to occur. Why not? There are potentially infinitely many solutions. We have to pick one.

The tools for controller design in linear systems are very advanced. In particular, as we describe in the linear optimal control chapter, one can easily design an optimal feedback controller for a regulation task like balancing, so long as we are willing to linearize the system around the operating point and define optimality in terms of a quadratic cost function: $$J(\bx_0) = \int_0^\infty \left[ \bx^T(t) {\bf Q} \bx(t) + \bu^T(t) {\bf R} \bu(t) \right]dt, \quad \bx(0)=\bx_0, {\bf Q}={\bf Q}^T>0, {\bf R}={\bf R}^T>0.$$ The linear feedback matrix ${\bf K}$ used as $$\bu(t) = - {\bf K}\bx(t),$$ is the so-called optimal linear quadratic regulator (LQR). Even without understanding the detailed derivation, we can quickly become practitioners of LQR. Conveniently, Matlab has a function, K = lqr(A,B,Q,R). Therefore, to use LQR, one simply needs to obtain the linearized system dynamics and to define the symmetric positive-definite cost matrices, ${\bf Q}$ and ${\bf R}$. In their most common form, ${\bf Q}$ and ${\bf R}$ are positive diagonal matrices, where the entries $Q_{ii}$ penalize the relative errors in state variable $x_i$ compared to the other state variables, and the entries $R_{ii}$ penalize actions in $u_i$.

LQR for the Acrobot and Cart-Pole

Take a minute to play around with the LQR controller for the Acrobot

and Cart-Pole

Make sure that you take a minute to look at the code which runs during these examples. Can you set the ${\bf Q}$ and ${\bf R}$ matrices differently, to improve the performance?

Simulation of the closed-loop response with LQR feedback shows that the task is indeed completed - and in an impressive manner. Oftentimes the state of the system has to move violently away from the origin in order to ultimately reach the origin. Further inspection reveals the (linearized) closed-loop dynamics are in fact non-minimum phase (acrobot has 3 right-half zeros, cart-pole has 1).

LQR for a Quadrotors

LQR works essentially out of the box for Quadrotors, if linearized around a nominal fixed point (where the non-zero thrust from the propellers is balancing gravity). We have a few Quadrotor models in . You should start with the Planar Quadrotor (technically a bi-rotor):

but you can also do this with a full 3D quadrotor model:

Try to change these ${\bf Q}$ and ${\bf R}$ costs here, too.

Note that LQR, although it is optimal for the linearized system, is not necessarily the best linear control solution for maximizing basin of attraction of the fixed-point. The theory of robust control(e.g., Zhou97), which explicitly takes into account the differences between the linearized model and the nonlinear model, will produce controllers which outperform our LQR solution in this regard.

Partial feedback linearization

In the introductory chapters, we made the point that the underactuated systems are not feedback linearizable. At least not completely. Although we cannot linearize the full dynamics of the system, it is still possible to linearize a portion of the system dynamics. The technique is called partial feedback linearization.

Consider the cart-pole example. The dynamics of the cart are affected by the motions of the pendulum. If we know the model, then it seems quite reasonable to think that we could create a feedback controller which would push the cart in exactly the way necessary to counter-act the dynamic contributions from the pendulum - thereby linearizing the cart dynamics. What we will see, which is potentially more surprising, is that we can also use a feedback law for the cart to feedback linearize the dynamics of the passive pendulum joint.

We'll use the term collocated partial feedback linearization to describe a controller which linearizes the dynamics of the actuated joints. What's more surprising is that it is often possible to achieve non-collocated partial feedback linearization - a controller which linearizes the dynamics of the unactuated joints. The treatment presented here follows from Spong94a.

PFL for the Cart-Pole System


Starting from the equations \ref{eq:simple} and \ref{eq:simple2}, we have \begin{gather*} \ddot\theta = -\ddot{x}c - s \\ % \ddot\theta = -\frac{1}{l} (\ddot{x} c + g s) \ddot{x}(2-c^2) - sc - \dot\theta^2 s = f \end{gather*} Therefore, applying the feedback control law \begin{equation} f = (2 - c^2) \ddot{x}^d - sc - \dot\theta^2 s % f = (m_c + m_p) u + m_p (u c + g s) c - m_p l \dot\theta^2 s \end{equation} results in \begin{align*} \ddot{x} =& \ddot{x}^d \\ \ddot{\theta} =& -\ddot{x}^dc - s, \end{align*} which are valid globally.


Starting again from equations \ref{eq:simple} and \ref{eq:simple2}, we have \begin{gather*} \ddot{x} = -\frac{\ddot\theta + s}{c} \\ \ddot\theta(c - \frac{2}{c}) - 2 \tan\theta - \dot\theta^2 s = f \end{gather*} Applying the feedback control law \begin{equation} f = (c - \frac{2}{c}) \ddot\theta^d - 2 \tan\theta - \dot\theta^2 s \end{equation} results in \begin{align*} \ddot\theta =& \ddot\theta^d \\ \ddot{x} =& -\frac{1}{c} \ddot\theta^d - \tan\theta. \end{align*} Note that this expression is only valid when $\cos\theta \neq 0$. This is not surprising, as we know that the force cannot create a torque when the beam is perfectly horizontal.

General form

For systems that are trivially underactuated (torques on some joints, no torques on other joints), we can, without loss of generality, reorganize the joint coordinates in any underactuated system described by the manipulator equations into the form: \begin{align} {\bf H}_{11} \ddot{\bq}_1 + {\bf H}_{12} \ddot{\bq}_2 + {\bf \phi}_1 = 0, \label{eq:passive_dyn}\\ {\bf H}_{21} \ddot{\bq}_1 + {\bf H}_{22} \ddot{\bq}_2 + {\bf \phi}_2 = {\bf \tau}, \label{eq:active_dyn} \end{align} with $\bq \in \Re^n$, $\bq_1 \in \Re^m$, $\bq_2 \in \Re^l$, $l=n-m$. $\bq_1$ represents all of the passive joints, and $\bq_2$ represents all of the actuated joints, and the $\phi$ terms capture all of the Coriolis and gravitational terms, and $${\bf H}(\bq) = \begin{bmatrix} {\bf H}_{11} & {\bf H}_{12} \\ {\bf H}_{21} & {\bf H}_{22} \end{bmatrix}.$$ Fortunately, because ${\bf H}$ is uniformly (e.g. for all $\bq$) positive definite, ${\bf H}_{11}$ and ${\bf H}_{22}$ are also positive definite.

Collocated linearization

Performing the same substitutions into the full manipulator equations, we get: \begin{gather} \ddot\bq_1 = -{\bf H}_{11}^{-1} \left[ {\bf H}_{12} \ddot\bq_2 + {\bf \phi}_1 \right] \\ ({\bf H}_{22} - {\bf H}_{21} {\bf H}_{11}^{-1} {\bf H}_{12}) \ddot\bq_2 + {\bf \phi_2} - {\bf H}_{21} {\bf H}_{11}^{-1} {\bf \phi}_1 = {\bf \tau} \end{gather} It can be easily shown that the matrix $({\bf H}_{22} - {\bf H}_{21} {\bf H}_{11}^{-1} {\bf H}_{12})$ is invertibleSpong94a; we can see from inspection that it is symmetric. PFL follows naturally, and is valid globally.

Non-collocated linearization

\begin{gather} \ddot\bq_2 = -{\bf H}_{12}^+ \left[ {\bf H}_{11} \ddot\bq_1 + {\bf \phi}_1 \right] \\ ({\bf H}_{21} - {\bf H}_{22} {\bf H}_{12}^+ {\bf H}_{11}) \ddot\bq_1 + {\bf \phi_2} - {\bf H}_{22} {\bf H}_{12}^+ {\bf \phi}_1 = {\bf \tau} \end{gather} where ${\bf H}_{12}^+$ is a Moore-Penrose pseudo-inverse. This inverse provides a unique solution when the rank of ${\bf H}_{12}$ equals $l$, the number of passive degrees of freedom in the system (it cannot be more, since the matrix only has $l$ rows). This rank condition is sometimes called the property of "Strong Inertial Coupling". It is state dependent. A system has Global Strong Inertial Coupling if it exhibits Strong Inertial Coupling in every state.

Task-space partial feedback linearization

In general, we can define some combination of active and passive joints that we would like to control. This combination is sometimes called a "task space". cite some task space refs here? Consider an output function of the form, $${\bf y} = {\bf f}(\bq),$$ with ${\bf y} \in \Re^p$, which defines the task space. Define ${\bf J}_1 = \frac{\partial {\bf f}}{\partial \bq_1}$, ${\bf J}_2 = \frac{\partial {\bf f}}{\partial \bq_2}$, ${\bf J} = [{\bf J}_1,{\bf J}_2]$.

Task Space PFL

If the actuated joints are commanded so that \begin{equation} \ddot\bq_2 = \bar{\bf J}^+ \left [{\bf v} - \dot{\bf J}\dot\bq + {\bf J}_1 {\bf H}_{11}^{-1}{\bf \phi}_1 \right], \label{eq:q2cmd} \end{equation} where $\bar{{\bf J}} = {\bf J}_2 - {\bf J}_1 {\bf H}_{11}^{-1} {\bf H}_{12}.$ and $\bar{\bf J}^+$ is the right Moore-Penrose pseudo-inverse, $$\bar{\bf J}^+ = \bar{\bf J}^T (\bar{\bf J} \bar{\bf J}^T)^{-1},$$ then we have \begin{equation} \ddot{\bf y} = {\bf v}.\end{equation} subject to \begin{equation}\text{rank}\left(\bar{{\bf J}} \right) = p, \label{eq:rank_condition}\end{equation}

Proof Sketch. Differentiating the output function we have \begin{gather*} \dot{\bf y} = {\bf J} \dot\bq \\ \ddot{\bf y} = \dot{\bf J} \dot\bq + {\bf J}_1 \ddot\bq_1 + {\bf J}_2 \ddot\bq_2. \end{gather*} Solving \ref{eq:passive_dyn} for the dynamics of the unactuated joints we have: \begin{equation} \ddot\bq_1 = - {\bf H}_{11}^{-1} ({\bf H}_{12} \ddot\bq_2 + {\bf \phi}_1) \label{eq:q1cmd} \end{equation} Substituting, we have \begin{align} \ddot{\bf y} =& \dot{\bf J} \dot\bq - {\bf J}_1 {\bf H}_{11}^{-1}({\bf H}_{12}\ddot\bq_2 + {\bf \phi}_1) + {\bf J}_2 \ddot\bq_2 \\ =& \dot{\bf J} \dot\bq + \bar{{\bf J}} \ddot\bq_2 - {\bf J}_1 {\bf H}_{11}^{-1}{\bf \phi}_1 \\ =& {\bf v} \end{align} Note that the last line required the rank condition ($\ref{eq:rank_condition}$) on $\bar{\bf J}$ to ensure that the rows of $\bar{{\bf J}}$ are linearly independent, allowing $\bar{\bf J} \bar{\bf J}^+ = {\bf I}$.

In order to execute a task space trajectory one could command $${\bf v} = \ddot{\bf y}^d + {\bf K}_d (\dot{\bf y}^d - \dot{\bf y}) + {\bf K}_p ({\bf y}^d -{\bf y}).$$ Assuming the internal dynamics are stable, this yields converging error dynamics, $({\bf y}_d - {\bf y})$, when ${\bf K}_p,{\bf K}_d > 0$Slotine90. For a position control robot, the acceleration command of ($\ref{eq:q2cmd}$) suffices. Alternatively, a torque command follows by substituting ($\ref{eq:q2cmd}$) and ($\ref{eq:q1cmd}$) into ($\ref{eq:active_dyn}$).

End-point trajectory following with the Cart-Pole system

Consider the task of trying to track a desired kinematic trajectory with the endpoint of pendulum in the cart-pole system. With one actuator and kinematic constraints, we might be hard-pressed to track a trajectory in both the horizontal and vertical coordinates. But we can at least try to track a trajectory in the vertical position of the end-effector.

Using the task-space PFL derivation, we have: \begin{gather*} y = f(\bq) = -l \cos\theta \\ \dot{y} = l \dot\theta \sin\theta \end{gather*} If we define a desired trajectory: $$y^d(t) = \frac{l}{2} + \frac{l}{4} \sin(t),$$ then the task-space controller is easily implemented using the derivation above.

add results here

The task space derivation above provides a convenient generalization of the partial feedback linearization (PFL) Spong94a, which encompasses both the collocated and non-collocated results. If we choose ${\bf y} = \bq_2$ (collocated), then we have $${\bf J}_1 = 0, {\bf J}_2 = {\bf I}, \dot{\bf J} = 0, \bar{\bf J} = {\bf I}, \bar{\bf J}^+ = {\bf I}.$$ From this, the command in ($\ref{eq:q2cmd}$) reduces to $\ddot\bq_2 = v$. The torque command is then $$ \tau = - {\bf H}_{21} {\bf H}_{11}^{-1} ({\bf H}_{12} v + {\bf \phi}_1) + {\bf H}_{22} v + {\bf \phi}_2,$$ and the rank condition ($\ref{eq:rank_condition}$) is always met.

If we choose ${\bf y} = \bq_1$ (non-collocated), we have $${\bf J}_1 = {\bf I}, {\bf J}_2 = 0, \dot{\bf J} = 0, \bar{{\bf J}}=-{\bf H}_{11}^{-1}{\bf H}_{12}.$$ The rank condition ($\ref{eq:rank_condition}$) requires that $\text{rank}({\bf H}_{12}) = l$, in which case we can write $\bar{{\bf J}}^+=-{\bf H}_{12}^+{\bf H}_{11}$, reducing the rank condition to precisely the "Strong Inertial Coupling" condition described in Spong94a. Now the command in ($\ref{eq:q2cmd}$) reduces to \begin{equation} \ddot\bq_2 = -{\bf H}_{12}^+ \left [ {\bf H}_{11}{\bf v} + {\bf \phi}_1 \right] \label{eq:q2ddnonco} \end{equation} The torque command is found by substituting $\ddot\bq_1 = v$ and ($\ref{eq:q2ddnonco}$) into ($\ref{eq:active_dyn}$), yielding the same results as in Spong94a.

Swing-up control

Energy shaping

Recall the phase portraits that we used to understand the dynamics of the undamped, unactuated, simple pendulum ($u=b=0$). The orbits of this phase plot were defined by contours of constant energy. One very special orbit, known as a homoclinic orbit, is the orbit which passes through the unstable fixed point. In fact, visual inspection will reveal that any state that lies on this homoclinic orbit must pass into the unstable fixed point. Therefore, if we seek to design a nonlinear feedback control policy which drives the simple pendulum from any initial condition to the unstable fixed point, a very reasonable strategy would be to use actuation to regulate the energy of the pendulum to place it on this homoclinic orbit, then allow the system dynamics to carry us to the unstable fixed point.

This idea turns out to be a bit more general than just for the simple pendulum. As we will see, we can use similar concepts of `energy shaping' to produce swing-up controllers for the acrobot and cart-pole systems. It's important to note that it only takes one actuator to change the total energy of a system.

Although a large variety of swing-up controllers have been proposed for these model systems (c.f. Fantoni02,Araki05,Xin04,Spong94,Mahindrakar05,Berkemeier99,Murray91,Lai06), the energy shaping controllers tend to be the most natural to derive and perhaps the most well-known.

Simple pendulum

Recall the equations of motion for the undamped simple pendulum were given by $$ml^2 \ddot\theta + mgl\sin\theta = u.$$ The total energy of the simple pendulum is given by $$E = \frac{1}{2} m l^2 \dot\theta^2 - mgl\cos\theta.$$ To understand how to control the energy, observe that \begin{align*} \dot{E} =& ml^2\dot\theta \ddot\theta + \dot\theta mgl\sin\theta \\ =& \dot\theta \left[ u - mgl\sin\theta \right] + \dot\theta mgl\sin\theta \\ =& u\dot\theta. \end{align*} In words, adding energy to the system is simple - simply apply torque in the same direction as $\dot\theta$. To remove energy, simply apply torque in the opposite direction (e.g., damping).

To drive the system to the homoclinic orbit, we must regulate the energy of the system to a particular desired energy, $$E^d = mgl.$$ If we define $\tilde{E} = E - E^d$, then we have $$\dot{\tilde{E}} = \dot{E} = u\dot\theta.$$ If we apply a feedback controller of the form $$u = -k \dot\theta \tilde{E},\quad k>0,$$ then the resulting error dynamics are $$\dot{\tilde{E}} = - k \dot\theta^2 \tilde{E}.$$ These error dynamics imply an exponential convergence: $$\tilde{E} \rightarrow 0,$$ except for states where $\dot\theta=0$. The essential property is that when $E > E^d$, we should remove energy from the system (damping) and when $E < E^d$, we should add energy (negative damping). Even if the control actions are bounded, the convergence is easily preserved.

This is a nonlinear controller that will push all system trajectories to the unstable equilibrium. But does it make the unstable equilibrium locally stable? No. Small perturbations may cause the system to drive all of the way around the circle in order to once again return to the unstable equilibrium. For this reason, once trajectories come into the vicinity of our swing-up controller, we prefer to switch to our LQR balancing controller to performance to complete the task.


Having thought about the swing-up problem for the simple pendulum, let's try to apply the same ideas to the cart-pole system. The basic idea, from Chung95, is to use collocated PFL to simplify the dynamics, use energy shaping to regulate the pendulum to its homoclinic orbit, then to add a few terms to make sure that the cart stays near the origin. The collocated PFL (when all parameters are set to 1) left us with: \begin{gather} \ddot{x} = u \\ \ddot\theta = - u c - s %\ddot\theta = -\frac{1}{l}( u c + g s ) \end{gather} The energy of the pendulum (a unit mass, unit length, simple pendulum in unit gravity) is given by: $$E(\bx) = \frac{1}{2} \dot\theta^2 - \cos\theta.$$ The desired energy, equivalent to the energy at the desired fixed-point, is $$E^d = 1.$$ Again defining $\tilde{E}(\bx) = E(\bx) - E^d$, we now observe that \begin{align*} \dot{\tilde{E}}(\bx) =& \dot{E}(\bx) = \dot\theta \ddot\theta + \dot\theta s \\ % \dot\tilde{E} = ml^2 \dot\theta \ddot\theta + mgl s =& \dot\theta [ -uc - s] + \dot\theta s \\ % - ml \dot\theta [ u c + g s ] + mgl s =& - u\dot\theta \cos\theta. % - ml u \dot\theta c \end{align*} Therefore, if we design a controller of the form $$u = k \dot\theta\cos\theta \tilde{E},\quad k>0$$ the result is $$\dot{\tilde{E}} = - k \dot\theta^2 \cos^2\theta \tilde{E}.$$ This guarantees that $| \tilde{E} |$ is non-increasing, but isn't quite enough to guarantee that it will go to zero. For example, if $\theta = \dot\theta = 0$, the system will never move. However, if we have that $$\int_0^t \dot\theta^2(t') \cos^2 \theta(t') dt' \rightarrow \infty,\quad \text{as } t \rightarrow \infty,$$ then we have $\tilde{E}(t) \rightarrow 0$. This condition is satisfied for all but the trivial constant trajectories at fixed points.

Now we return to the full system dynamics (which includes the cart). Chung95 and Spong96 use the simple pendulum energy controller with an addition PD controller designed to regulate the cart: $$\ddot{x}^d = k_E \dot\theta \cos\theta \tilde{E} - k_p x - k_d \dot{x}.$$ Chung95 provides a proof of convergence for this controller with some nominal parameters.

Cart-Pole Swingup: Example phase plot of the pendulum subsystem using energy shaping control. The controller drives the system to the homoclinic orbit, then switches to an LQR balancing controller near the top.


Swing-up control for the acrobot can be accomplished in much the same way. Spong94 - pump energy. Clean and simple. No proof. Slightly modified version (uses arctan instead of sat) in Spong95. Clearest presentation in Spong96.

Use collocated PFL. ($\ddot{q}_2 = \ddot{q}_2^d$). $$E(\bx) = \frac{1}{2}\dot\bq^T{\bf H}\dot{\bq} + U(\bx).$$ $$ E_d = U(\bx^*).$$ $$\bar{u} = \dot{q}_1 \tilde{E}.$$ $$\ddot{q}_2^d = - k_1 q_2 - k_2 \dot{q}_2 + k_3 \bar{u},$$

Extra PD terms prevent proof of asymptotic convergence to homoclinic orbit. Proof of another energy-based controller in Xin04.


The energy shaping controller for swing-up presented here is a pretty faithful representative from the field of nonlinear underactuated control. Typically these control derivations require some clever tricks for simplifying or canceling out terms in the nonlinear equations, then some clever Lyapunov function to prove stability. In these cases, PFL was used to simplify the equations, and therefore the controller design.

These controllers are important, representative, and relevant. But clever tricks with nonlinear equations seem to be fundamentally limited. Most of the rest of the material presented in this book will emphasize more general computational approaches to formulating and solving these and other control problems.

Other model systems

The acrobot and cart-pole systems are just two of the model systems used heavily in underactuated control research. Other examples include:

  • Pendubot
  • Inertia wheel pendulum
  • Furuta pendulum (horizontal rotation and vertical pendulum)
  • Hovercraft

Walking and Running

Practical legged locomotion is one of the fundamental unsolved problems in robotics. Many challenges are in mechanical design - a walking robot must carry all of its actuators and power, making it difficult to carry ideal force/torque - controlled actuators. But many of the unsolved problems are because walking robots are underactuated control systems.

In this chapter we'll introduce some of the simple models of walking and robots, the control problems that result, and a very brief summary of some of the control solutions described in the literature. Compared to the robots that we have studied so far, our investigations of legged locomotion will require additional tools for thinking about limit cycle dynamics and dealing with impacts.

Why is walking so hard?

State of the art

The Ballistic Walker

One of the earliest models of walking was proposed by McMahonMcMahon80, who argued that humans use a mostly ballistic (passive) gait. COM trajectory looks like a pendulum (roughly walking by vaulting). EMG activity in stance legs is high, but EMG in swing leg is very low, except for very beginning and end of swing. Proposed a three-link "ballistic walker" model, which models a single swing phase (but not transitions to the next swing nor stability). Interestingly, in recent years the field has developed a considerably deeper appreciation for the role of compliance during walking; simple walking-by-vaulting models are starting to fall out of favor.

McGeerMcGeer90 followed up with a series of walking machines, called "passive dynamic walkers". The walking machine by Collins and RuinaCollins01 is the most impressive passive walker to date.

The Rimless Wheel

The rimless wheel. The orientation of the stance leg, $\theta$, is measured clockwise from the vertical axis.

The most elementary model of passive dynamic walking, first used in the context of walking by McGeer90, is the rimless wheel. This simplified system has rigid legs and only a point mass at the hip as illustrated in the figure above. To further simplify the analysis, we make the following modeling assumptions:

  • Collisions with ground are inelastic and impulsive (only angular momentum is conserved around the point of collision).
  • The stance foot acts as a pin joint and does not slip.
  • The transfer of support at the time of contact is instantaneous (no double support phase).
  • $0 \le \gamma < \frac{\pi}{2}$, $0 < \alpha < \frac{\pi}{2}$, $l > 0$.

Note that the coordinate system used here is slightly different than for the simple pendulum ($\theta=0$ is at the top, and the sign of $\theta$ has changed).

update the coordinates (here and in drake)

The most comprehensive analysis of the rimless wheel was done by Coleman98a.

Stance Dynamics

The dynamics of the system when one leg is on the ground are given by $$\ddot\theta = \frac{g}{l}\sin(\theta).$$ If we assume that the system is started in a configuration directly after a transfer of support ($\theta(0^+) = \gamma-\alpha$), then forward walking occurs when the system has an initial velocity, $\dot\theta(0^+) > \omega_1$, where $$\omega_1 = \sqrt{ 2 \frac{g}{l} \left[ 1 - \cos \left (\gamma-\alpha \right) \right]}.$$ $\omega_1$ is the threshold at which the system has enough kinetic energy to vault the mass over the stance leg and take a step. This threshold is zero for $\gamma = \alpha$ and does not exist for $\gamma > \alpha$. The next foot touches down when $\theta(t) = \gamma+\alpha$, at which point the conversion of potential energy into kinetic energy yields the velocity $$\dot\theta(t^-) = \sqrt{ \dot\theta^2(0^+) + 4\frac{g}{l} \sin\alpha \sin\gamma }.$$ $t^-$ denotes the time immediately before the collision.

Foot Collision

Angular momentum is conserved around the point of impact

The angular momentum around the point of collision at time $t$ just before the next foot collides with the ground is $$L(t^-) = -ml^2\dot\theta(t^-) \cos(2\alpha).$$ The angular momentum at the same point immediately after the collision is $$L(t^+) = -ml^2\dot\theta(t^+).$$ Assuming angular momentum is conserved, this collision causes an instantaneous loss of velocity: $$\dot\theta(t^+) = \dot\theta(t^-) \cos(2\alpha).$$

Forward simulation

Given the stance dynamics, the collision detection logic ($\theta = \gamma \pm \alpha$), and the collision update, we can numerically simulate this simple model. Doing so reveals something remarkable... the wheel starts rolling, but then one of two things happens: it either runs out of energy and stands still, or it quickly falls into a stable periodic solution. Convergence to this stable periodic solution appears to happen from a huge range of initial conditions. Try it yourself.

Numerical simulation of the rimless wheel

Optionally you can specify the initial conditions. Here are a few good values to try. (You can either run the code yourself or click on the links below to see the animation).

One of the fantastic things about the rimless wheel is that the dynamics are exactly the undamped simple pendulum that we've thought so much about. So you will recognize the phase portraits of the system below - they are centered around the unstable fixed point of the simple pendulum. Phase plots of the trajectories from the various initial conditions recommended above are plotted below.

Phase portrait trajectories of the rimless wheel ($m=1$, $l=1$, $g=9.8$, $\alpha=\pi/8$, $\gamma=0.08$).

Take a minute to make sure you can follow these trajectories through state space for each of the initial conditions. It's non-trivial, but worth the effort.

Limit Cycles

A limit cycle is an asymptotically stable or unstable periodic orbit†† Marginally-stable orbits, such as the closed-orbits of the undamped simple pendulum, are typically not called limit cycles.. One of the simplest models of limit cycle behavior is the Van der Pol oscillator. Let's examine that first...

Van der Pol Oscillator

$$\ddot{q} + \mu (q^2 - 1)\dot{q} + q = 0, \quad \mu>0$$ One can think of this system as almost a simple spring-mass-damper system, except that it has nonlinear damping. In particular, the velocity term dissipates energy when $|q|>1$, and adds energy when $|q|<1$. Therefore, it is not terribly surprising to see that the system settles into a stable oscillation from almost any initial conditions (the exception is the state $q=0,\dot{q}=0$). This can be seen nicely in the phase portrait below.

System trajectories of the Van der Pol oscillator with $\mu =.2$. (Left) phase portrait. (Right) time domain.

However, as you can see from the plot of system trajectories in the time domain, then a slightly different picture emerges. Although the phase portrait clearly reveals that all trajectories converge to the same orbit, the time domain plot reveals that these trajectories do not necessarily synchronize in time.

The Van der Pol oscillator clearly demonstrates what we would think of as a stable limit cycle, but also exposes the subtlety in defining this limit cycle stability. Neighboring trajectories do not necessarily converge on a stable limit cycle. In contrast, defining the stability of a particular trajectory (parameterized by time) is relatively easy.

Let's make a few quick points about the existence of closed-orbits. If we can define a closed region of phase space which does not contain any fixed points, then it must contain a closed-orbitStrogatz94. By closed, I mean that any trajectory which enters the region will stay in the region (this is the Poincaré-Bendixson Theorem). It's also interesting to note that gradient potential fields (e.g. Lyapunov functions) cannot have a closed-orbitStrogatz94, and consequently Lyapunov analysis cannot be applied to limit cycle stability without some modification.

Poincaré Maps

One definition for the stability of a limit cycle uses the method of Poincaré. Let's consider an $n$ dimensional dynamical system, $\dot{\bx} = {\bf f}(\bx).$ Define an $n-1$ dimensional surface of section, $S$. We will also require that $S$ is transverse to the flow (i.e., all trajectories starting on $S$ flow through $S$, not parallel to it). The Poincaré map (or return map) is a mapping from $S$ to itself: $$\bx_p[n+1] = {\bf P}(\bx_p[n]),$$ where $\bx_p[n]$ is the state of the system at the $n$th crossing of the surface of section. Note that we will use the notation $\bx_p$ to distinguish the state of the discrete-time system from the continuous time system; they are related by $\bx_p[n] = \bx(t_c[n])$, where $t_c[n]$ is the time of the $n$th crossing of $S$.

Return map for the Van der Pol Oscillator

Since the full system is two dimensional, the return map dynamics are one dimensional. One dimensional maps, like one dimensional flows, are amenable to graphical analysis. To define a Poincaré section for the Van der Pol oscillator, let $S$ be the line segment where $q = 0$, $\dot{q} > 0$.

(Left) Phase plot with the surface of section, $S$, drawn with a black dashed line. (Right) The resulting Poincaré first-return map (blue), and the line of slope one (red).

If ${\bf P}(\bx_p)$ exists for all $\bx_p \in S$, then this method turns the stability analysis for a limit cycle into the stability analysis of a fixed point on a discrete map. In practice it is often difficult or impossible to find ${\bf P}$ analytically, but it can be obtained quite reasonably numerically. Once ${\bf P}$ is obtained, we can infer local limit cycle stability with an eigenvalue analysis. There will always be a single eigenvalue of 0 - corresponding to perturbations along the limit cycle which do not change the state of first return. The limit cycle is considered locally exponentially stable if all remaining eigenvalues, $\lambda_i$, have magnitude less than one, $|\lambda_i|<1$.

In fact, it is often possible to infer more global stability properties of the return map by examining ${\bf P}$. Koditschek91 describes some of the stability properties known for unimodal maps.

A particularly graphical method for understanding the dynamics of a one-dimensional iterated map is with the staircase method. Sketch the Poincaré map and also the line of slope one. Fixed points are the crossings with the unity line. Asymptotically stable if $|\lambda| < 1$. Unlike one dimensional flows, one dimensional maps can have oscillations (happens whenever $\lambda < 0$).

[insert staircase diagram of van der Pol oscillator return map here]

Analysis of the Rimless Wheel

Poincaré Map

We can now derive the angular velocity at the beginning of each stance phase as a function of the angular velocity of the previous stance phase. First, we will handle the case where $\gamma \le \alpha$ and $\dot\theta_n^+ > \omega_1$. The "step-to-step return map", factoring losses from a single collision, the resulting map is: $$\dot\theta^{+}_{n+1} = \cos(2\alpha) \sqrt{ ({\dot\theta_n}^{+})^{2} + 4\frac{g}{l}\sin\alpha \sin\gamma}.$$ where the $\dot{\theta}^{+}$ indicates the velocity just after the energy loss at impact has occurred.

Using the same analysis for the remaining cases, we can complete the return map. The threshold for taking a step in the opposite direction is $$\omega_2 = - \sqrt{2 \frac{g}{l} [1 - \cos(\alpha + \gamma)]}.$$ For $\omega_2 < \dot\theta_n^{+} < \omega_1,$ we have $$\dot\theta_{n+1}^{+} = -\dot\theta_n^{+} \cos(2\alpha).$$ Finally, for $\dot\theta_n^{+} < \omega_2$, we have $$\dot\theta_{n+1}^{+} = - \cos(2\alpha)\sqrt{(\dot\theta_n^{+})^2 - 4 \frac{g}{l} \sin\alpha \sin\gamma}.$$

Notice that the return map is undefined for $\dot\theta_n = \{ \omega_1, \omega_2 \}$, because from these configurations, the wheel will end up in the (unstable) equilibrium point where $\theta = 0$ and $\dot\theta = 0$, and will therefore never return to the map.

This return map blends smoothly into the case where $\gamma > \alpha$. In this regime, $$\dot\theta_{n+1}^{+} = \begin{cases} \cos(2\alpha) \sqrt{(\dot\theta_n^{+})^2 + 4 \frac{g}{l} \sin\alpha \sin\gamma}, & \text{ } 0 \le \dot\theta_n^{+} \\ -\dot\theta_n^{+} \cos(2\alpha), & \text{ } \omega_2 < \dot\theta_n^{+} < 0 \\ -\cos(2\alpha) \sqrt{(\dot\theta_n^{+})^2 - 4\frac{g}{l} \sin\alpha \sin\gamma}, & \dot\theta_n^{+} \le w_2 \end{cases}.$$ Notice that the formerly undefined points at $\{\omega_1,\omega_2\}$ are now well-defined transitions with $\omega_1 = 0$, because it is kinematically impossible to have the wheel statically balancing on a single leg.

Fixed Points and Stability

For a fixed point, we require that $\dot\theta_{n+1}^{+} = \dot\theta_n^{+} = \omega^*$. Our system has two possible fixed points, depending on the parameters: $$ \omega_{stand}^* = 0,~~~ \omega_{roll}^* = \cot(2 \alpha) \sqrt{4 \frac{g}{l} \sin\alpha\sin\gamma}.$$ The limit cycle plotted above illustrates a state-space trajectory in the vicinity of the rolling fixed point. $\omega_{stand}^*$ is a fixed point whenever $\gamma < \alpha$. $\omega_{roll}^*$ is a fixed point whenever $\omega_{roll}^* > \omega_1$. It is interesting to view these bifurcations in terms of $\gamma$. For small $\gamma$, $\omega_{stand}$ is the only fixed point, because energy lost from collisions with the ground is not compensated for by gravity. As we increase $\gamma$, we obtain a stable rolling solution, where the collisions with the ground exactly balance the conversion of gravitational potential to kinetic energy. As we increase $\gamma$ further to $\gamma > \alpha$, it becomes impossible for the center of mass of the wheel to be inside the support polygon, making standing an unstable configuration.

Limit cycle trajectory of the rimless wheel ($m=1,l=1,g=9.8,\alpha=\pi/8,\gamma=0.15$). All hatched regions converge to the rolling fixed point, $\omega_{roll}^*$; the white regions converge to zero velocity ($\omega_{stand}^*$).

The Compass Gait

The rimless wheel models only the dynamics of the stance leg, and simply assumes that there will always be a swing leg in position at the time of collision. To remove this assumption, we take away all but two of the spokes, and place a pin joint at the hip. To model the dynamics of swing, we add point masses to each of the legs. For actuation, we first consider the case where there is a torque source at the hip - resulting in swing dynamics equivalent to an Acrobot (although in a different coordinate frame).

The compass gait

In addition to the modeling assumptions used for the rimless wheel, we also assume that the swing leg retracts in order to clear the ground without disturbing the position of the mass of that leg. This model, known as the compass gait, is well studied in the literature using numerical methods Goswami99+Spong03, but relatively little is known about it analytically.

The state of this robot can be described by 4 variables: $\theta_{st},\theta_{sw},\dot\theta_{st}$, and $\dot\theta_{sw}$. The abbreviation $st$ is shorthand for the stance leg and $sw$ for the swing leg. Using $\bq = [ \theta_{sw}, \theta_{st} ]^T$ and $\bu = \tau$, we can write the dynamics as $${\bf H}(\bq)\ddot\bq + {\bf C}(\bq,\dot\bq)\dot\bq + {\bf G}(\bq) = {\bf B}\bu,$$ with \begin{gather*} {\bf H} = \begin{bmatrix} mb^2~~ & ~~-mlb\cos(\theta_{st}-\theta_{sw}) \\ -mlb\cos(\theta_{st}-\theta_{sw})~~ & ~~(m_h+m)l^2 + ma^2 \end{bmatrix} \\ {\bf C} = \begin{bmatrix} 0~~ & ~~mlb\sin(\theta_{st}-\theta_{sw})\dot\theta_{st} \\ mlb\sin(\theta_{st}-\theta_{sw})\dot\theta_{sw}~~ & ~~0 \end{bmatrix} \\ {\bf G} = \begin{bmatrix} mbg\sin(\theta_{sw}) \\ -(m_hl + ma + ml)g\sin(\theta_{st}) \end{bmatrix},\\ {\bf B} = \begin{bmatrix} 1 \\ -1 \end{bmatrix} \end{gather*} and $l=a+b$. These equations come straight out of Goswami96a.

The foot collision is an instantaneous change of velocity governed by the conservation of angular momentum around the point of impact: $${\bf Q}^+(\alpha) \dot{\bq}^+ = {\bf Q}^-(\alpha)\dot{\bq}^-,$$ where \begin{gather*} {\bf Q}^-(\alpha) = \begin{bmatrix} -mab~~ & ~~-mab + (m_hl^2 + 2mal) \cos(2\alpha)\\ 0~~ & ~~-mab \end{bmatrix} \\ {\bf Q}^+(\alpha) = \begin{bmatrix} mb(b - l\cos(2\alpha))~~ & ~~ml(l-b\cos(2\alpha) + ma^2 + m_hl^2 \\ mb^2~~ & ~~-mbl\cos(2\alpha) \end{bmatrix} \end{gather*} and $\alpha = \frac{\theta_{sw} - \theta_{st}}{2}$.

Numerical integration of these equations reveals a stable limit cycle, plotted below. The cycle is composed of a swing phase (top) and a stance phase (bottom), punctuated by two instantaneous changes in velocity which correspond to the ground collisions. The dependence of this limit cycle on the system parameters has been studied extensively in Goswami96a.

Limit cycle trajectory of the compass gait. ($m=5$kg,$m_h=10$kg,$a=b=0.5$m,$\phi=0.03$deg. $\bx(0)=[0,0,2,-0.4]^T$). $\theta_{lPitch}$ is the pitch angle of the left leg, which is recovered from $\theta_{st}$ and $\theta_{sw}$ in the simulation with some simple book-keeping.

The basin of attraction of the stable limit cycle is a narrow band of states surrounding the steady state trajectory. Although the simplicity of this model makes it analytically attractive, this lack of stability makes it difficult to implement on a physical device.

The Kneed Walker

To achieve a more anthropomorphic gait, as well as to acquire better foot clearance and ability to walk on rough terrain, we want to model a walker that includes kneeHsu07. For this, we model each leg as two links with a point mass each.

The Kneed Walker

At the beginning of each step, the system is modeled as a three-link pendulum, like the ballistic walkerMcMahon80+Mochon80+Spong03. The stance leg is the one in front, and it is the first link of a pendulum, with two point masses. The swing leg has two links, with the joint between them unconstrained until knee-strike. Given appropriate mass distributions and initial conditions, the swing leg bends the knee and swings forward. When the swing leg straightens out (the lower and upper length are aligned), knee-strike occurs. The knee-strike is modeled as a discrete inelastic collision, conserving angular momentum and changing velocities instantaneously.

After this collision, the knee is locked and we switch to the compass gait model with a different mass distribution. In other words, the system becomes a two-link pendulum. Again, the heel-strike is modeled as an inelastic collision. After the collision, the legs switch instantaneously. After heel-strike then, we switch back to the ballistic walker's three-link pendulum dynamics. This describes a full step cycle of the kneed walker, which is shown above.

Limit cycle trajectory for kneed walker

By switching between the dynamics of the continuous three-link and two-link pendulums with the two discrete collision events, we characterize a full point-feed kneed walker walking cycle. After finding appropriate parameters for masses and link lengths, a stable gait is found. As with the compass gait's limit cycle, there is a swing phase (top) and a stance phase (bottom). In addition to the two heel-strikes, there are two more instantaneous velocity changes from the knee-strikes as marked in the figure. This limit cycle is traversed clockwise.

Limit cycle (phase portrait) of the kneed walker

Highly-articulated Legged Robots

The passive dynamic walkers and hopping robots described in the last chapter capture the fundamental dynamics of legged locomotion -- dynamics which are fundamentally nonlinear and punctuated by impulsive events due to making and breaking contact with the environment. But if you start reading the literature on humanoid robots, or many-legged robots like quadrupeds, then you will find a quite different set of ideas taking center stage: ideas like the "zero-moment point" and footstep planning. The goal of this chapter is to penetrate that world.

I'd like to start the discussion with a model that might seem quite far from the world of legged robots, but I think it's a very useful way to think about the problem.

Center of Mass Dynamics

A hovercraft model

flip the directions of the force vector in the image

Imagine you have a flying vehicle modeled as a single rigid body in a gravity field with some number of force "thrusters" attached. We'll describe the configuration of the vehicle by its orientation, $\theta$ and the location of its center of mass $(x,z)$. The vector from the center of mass to thruster $i$ is given by $r_i$, yielding the equations of motion: \begin{align*} m\ddot{x} =& \sum_i F_{i,x},\\ m\ddot{z} =& \sum_i F_{i,z} - mg,\\ I\ddot\theta =& \sum_i \left[ r_i \times F_i \right]_y, \end{align*} where I've used $F_{i,x}$ to represent the component of the force in the $x$ direction and have taken just the $y$-axis component of the cross product on the last line.

Our goal is to move the hovercraft to a desired position/orientation and to keep it there. If we take the inputs to be $F_i$, then the dynamics are affine (linear plus a constant term). As such, we can stabilize a stabilizable fixed point using a change of coordinates plus LQR or even plan/track a desired trajectory using time-varying LQR. If I were to add additional linear constraints, for instance constraining $F_{min} \le F_i \le F_{max}$, then I can still use linear model-predictive control (MPC) to plan and stabilize a desired motion. By most accounts, this is a relatively easy control problem. (Note that it would be considerably harder if my control input or input constraints depend on the orientation, $\theta$, but we don't need that here yet).

Now imagine that you just a small number of thrusters (let's say two) each with severe input constraints. To make things more interesting, let us say that you're allowed to move the thrusters, so $\dot{r}_i$ becomes an additional control input, but with some important limitations: you have to turn the thrusters off when they are moving (e.g. $|F_i||\dot{r}_i| = 0$) and there is a limit to how quickly you can move the thrusters $|\dot{r}|_i \le v_{max}$. This problem is now a lot more difficult, due especially to the fact that constraints of the form $u_1 u_2 = 0$ are non-convex.

I find this a very useful thought exercise; somehow our controller needs to make an effectively discrete decision to turn off a thruster and move it. The framework of optimal control should support this - you are sacrificing a short-term loss of control authority for the long-term benefit of having the thruster positioned where you would like, but we haven't developed tools yet that deal well with this discrete plus continuous decision making. We'll need to address that here.

Unfortunately, although the problem is already difficult, we need to continue to add constraints. Now let's say additionally, that the thrusters can only be turned on in certain locations, as cartooned here:

The union of these regions need not form a convex set. Furthermore, these locations could be specified in the robot frame, but may also be specified in the world frame, which I'll call ${\bf p}_i$: $${\bf p}_i = {\bf r}_i - \begin{bmatrix} x \\ 0 \\ z \end{bmatrix}.$$ This problem still feels like it should be tractable, but it's definitely getting hard.

Robots with (massless) legs

In my view, the hovercraft problem above is a central component of the walking problem. If we consider a walking robot with massless legs, then the feet are exactly like movable thrusters. As above, they are highly constrained - they can only produce force when they are touching the ground, and (typically) they can only produce forces in certain directions, for instance as described by a "friction cone" (you can push on the ground but not pull on the ground, and with Coulomb friction the forces tangential to the ground must be smaller than the forces normal to the ground, as described by a coefficient of friction, e.g. $|F_{\parallel}| < \mu |F_{\perp}|$).

The constraints on where you can put your feet / thrusters will depend on the kinematics of your leg, and the speed at which you can move them will depend on the full dynamics of the leg -- these are difficult constraints to deal with. But the actual dynamics of the rigid body are actually still affine, and still simple!

Capturing the full robot dynamics

We don't actually need to have massless legs for this discussion to make sense. If we use the coordinates $x,z$ to describe the location of the center of mass (CoM) of the entire robot, and $m$ to represent the entire mass of the robot, then the first two equations remain unchanged. The center of mass is a configuration dependent point, and does not have an orientation, but one important generalization of the orientation dynamics is given by the centroidal momentum matrix, $A(\bq)$, where $A(\bq)\dot{\bq}$ captures the linear and angular momentum of the robot around the center of mass Orin13. Note that the center of mass dynamics are still affine -- even for a big complicated humanoid -- but the centroidal momentum dynamics are nonlinear.

Impact dynamics

In the previous chapter we devoted relatively a lot of attention to dynamics of impact, characterized for instance by a guard that resets dynamics in a hybrid dynamical system. In those models we used impulsive ground-reaction forces (these forces are instantaneously infinite, but doing finite work) in order to explain the discontinuous change in velocity required to avoid penetration with the ground. This story can be extended naturally to the dynamics of the center of mass.

For an articulated robot, though, there are a number of possible strategies for the legs which can effect the dynamics of the center of mass. For example, if the robot hits the ground with a stiff leg like the rimless wheel, then the angular momentum about the point of collision will be conserved, but any momentum going into the ground will be lost. However, if the robot has a spring leg and a massless toe like the SLIP model, then no energy need be lost.

One strategy that many highly-articulated legged robots use is to keep their center of mass at a constant height, $$z = c \quad \Rightarrow \quad \dot{z} = \ddot{z} = 0,$$ and minimize their angular momentum about the center of mass (here $\ddot\theta=0$). Using this strategy, if the swinging foot lands roughly below the center of mass, then even with a stiff leg there is no energy dissipated in the collision - all of the momentum is conserved. This often (but does not always) justify ignoring the impacts in the center of mass dynamics of the system.

The special case of flat terrain

While not the only important case, it is extremely common for our robots to be walking over flat, or nearly flat terrain. In this situation, even if the robot is touching the ground in a number of places (e.g., two heels and two toes), the constraints on the center of mass dynamics can be summarized very efficiently.

External forces acting on a robot pushing on a flat ground

First, on flat terrain $F_{i,z}$ represents the force that is normal to the surface at contact point $i$. If we assume that the robot can only push on the ground (and not pull), then this implies $$\forall i, F_{i,z} \ge 0 \Rightarrow \sum_i F_{i,z} \ge 0 \Rightarrow \ddot{z} \ge -g.$$ In other words, if I cannot pull on the ground, then my center of mass cannot accelerate towards the ground faster than gravity.

Furthermore, if we use a Coulomb model of friction on the ground, with friction coefficient $\mu$, then $$\forall i, |F_{i,x}| \le \mu F_{i,z} \Rightarrow \sum_i |F_{i,x}| \le \mu \sum_i F_z \Rightarrow |\ddot{x}| \le \mu (\ddot{z} + g).$$ For instance, if I keep my center of mass at a constant height, then $\ddot{z}=0$ and $|\ddot{x}| \le \mu g$; this is a nice reminder of just how important friction is if you want to be able to move around in the world.

Even better, let us define the "center of pressure" (CoP) as the point on the ground where $$x_{cop} = \frac{\sum_i p_{i,x} F_{i,z}}{\sum_i F_{i,z}},$$ and since all $p_{i,z}$ are equal on flat terrain, $z_{cop}$ is just the height of the terrain. It turns out that the center of pressure is a "zero-moment point" (ZMP) -- a property that we will demonstrate below -- and moment-balance equation gives us a very important relationship between the location of the CoP and the dynamics of the CoM: \[ (m\ddot{z} + mg) (x_{cop} - x) = (z_{cop} - z) m\ddot{x} - I\ddot\theta. \] If we use the strategy proposed above for ignoring collision dynamics, $\ddot{z} = \ddot{\theta} = 0$, then we have $z_{cop} - z$ is a constant height $h$, and the result is the famous "ZMP equations": \[ \ddot{x} = -\frac{g}{h} (x_{cop}-x). \] So the location of the center of pressure completely determines the acceleration of the center of mass, and vice versa! What's more, this relationship is affine -- a property that we can exploit in a number of ways.

As an example, we can easily relate constraints on the CoP to constraints on $\ddot{x}$. It is easy to verify from the definition that the CoP must live inside the convex hull of the ground contact points. Therefore, if we use the $\ddot{z}=\ddot\theta=0$ strategy, then this directly implies strict bounds on the possible accelerations of the center of mass given the locations of the ground contacts.

An aside: the zero-moment point derivation

The zero-moment point (ZMP) is discussed very frequently in the current literature on legged robots. It also has an unfortunate tendency to be surrounded by some confusion; a number of people have defined the ZMP is slightly different ways (see e.g. Goswami99 for a summary). Therefore, it makes sense to provide a simple derivation here.

First let us recall that for rigid body systems I can always summarize the contributions from many external forces as a single wrench (force and torque) on the object -- this is simply because the $F_i$ terms enter our equations of motion linearly. Specifically, given any point in space, $r$, in coordinates relative to $(x,z)$:

update diagram to have r instead of p

I can re-write the equations of motion as \begin{align*} m\ddot{x} =& \sum_i F_{i,x} = F_{net,x},\\ m\ddot{z} =& \sum_i F_{i,z} - mg = F_{net,z},\\ I\ddot\theta =& \sum_i \left[ r_i \times F_i \right]_y = ({\bf r} \times {\bf F}_{net})_y + \tau_{net}, \end{align*} where ${\bf F}_{net} = \sum_i {\bf F}_i$ and the value of $\tau_{net}$ depends on the location ${\bf r}$. For some choices of ${\bf r}$, we can make $\tau_{net}=0$. Examining \[ ({\bf r} \times {\bf F}_{net})_y = r_z F_{net,x} - r_x F_{net,z} = \left[ r_i \times F_i \right]_y, \] we can see that when $F_{net,z} > 0$ there is an entire line of solutions, $r_x = a r_z + b$, including one that will intercept the terrain. For walking robots, it is this point on the ground from which the external wrench can be described by a single force vector (and no moment) that is the famous "zero-moment point" (ZMP). Substituting the back in to replace $F_{net}$, we can see that \[ r_x = \frac{r_z m \ddot{x} - I \ddot\theta}{m\ddot{z} + mg}. \] If we assume that $\ddot{z}=\ddot{theta}=0$ and replace the relative coordinates with the global coordinates ${\bf r} = {\bf p} - [x,0,z]^T$, then we arrive at precisely the equation presented above.

Furthermore, since \[\left[ r_i \times F_i \right]_y = \sum_i \left( r_{i,z} F_{i,x} - r_{i,x} F_{i,z} \right), \] and for flat terrain we have \[ r_z F_{net,x} = \sum_i r_{i,z} F_{i,x}, \] then we can see that this ZMP is exactly the CoP: \[ r_x = \frac{\sum_i r_{i,x} F_{i,z}}{ F_{net,z} }. \]

In three dimensions, we solve for the point on the ground where $\tau_{net,y} = \tau_{net,x} = 0$, but allow $\tau_{net,z} \ne 0$ to extract the analogous equations in the $y$-axis: \[ r_y = \frac{r_z m \ddot{y} - I \ddot\theta}{m\ddot{z} + mg}. \]

ZMP planning

From a CoM plan to a whole-body plan

Whole-Body Control

Coming soon. For a description of our approach with Atlas, see Kuindersma13+Kuindersma14.

Footstep planning and push recovery

Coming soon. For a description of our approach with Atlas, see Deits14a+Kuindersma14. For nice geometric insights on push recovery, see Koolen12.

Beyond ZMP planning

Coming soon. For a description of our approach with Atlas, see Dai14+Kuindersma14.


Model Systems with Fluid Dynamics

Model Systems with Stochasticity

My goals for this chapter are to build intuition for the beautiful and rich behavior of nonlinear dynamical system that are subjected to random (noise/disturbance) inputs. So far we have focused primarily on systems described by \[ \dot{\bx}(t) = f(\bx(t),\bu(t)) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n]). \] In this chapter, I would like to broaden the scope to think about \[ \dot{\bx}(t) = f(\bx(t),\bu(t),\bw(t)) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n],\bw[n]), \] where this additional input $\bw$ is the (vector) output of some random process. In other words, we can begin thinking about stochastic systems by simply understanding the dynamics of our existing ODEs subjected to an additional random input.

This form is extremely general as written. $\bw(t)$ can represent time-varying random disturbances (e.g. gusts of wind), or even constant model errors/uncertainty. One thing that we are not adding, yet, is measurement uncertainty. There is a great deal of work on observability and state estimation that study the question of how you can infer the true state of the system given noise sensor readings. For this chapter we are assuming perfect measurements of the full state, and are focused instead on the way that "process noise" shapes the long-term dynamics of the system.

I will also stick primarily to discrete time dynamics for this chapter, simply because it is easier to think about the output of a discrete-time random process, $\bw[n]$, than a $\bw(t)$. But you should know that all of the ideas work in continuous time, too. Also, most of our examples will take the form of additive noise: \[ \bx[n+1] = f(\bx[n],\bu[n]) + \bw[n], \] which is a particular useful and common specialization of our general form. And this form doesn't give up much -- the disturbance on step $k$ can pass through the nonlinear function $f$ on step $k+1$ giving rich results -- but is often much easier to work with.

The Master Equation

Let's start by looking at some simple examples.

A Bistable System + Noise

Let's consider (a time-reversed version of) one of my favorite one-dimensional systems: \[ \dot{x} = x - x^3. \] add plot This deterministic system has stable fixed points at $x^* = \{-1,1\}$ and an unstable fixed point at $x^* = 0$.

A reasonable discrete-time approximation of these dynamics, now with additive noise, is \[ x[n+1] = x[n] + h (x[n] - x[n]^3 + w[n]). \] When $w[n]=0$, this system has the same fixed points and stability properties as the continuous time system. But let's examine the system when $w[n]$ is instead the result of a zero-mean Gaussian white noise process, defined by: \begin{gather*} \forall n, E\left[ w[n] \right] = 0,\\ E\left[ w[i]w[j] \right] = \begin{cases} \sigma^2, & \text{ if } i=j,\\ 0, & \text{ otherwise.} \end{cases} \end{gather*} Here $\sigma$ is the standard deviation of the Gaussian.

When you simulate this system for small values of $\sigma$, you will see trajectories move roughly towards one of the two fixed points (for the deterministic system), but every step is modified by the noise. In fact, even if the trajectory were to arrive exactly at what was once a fixed point, it is almost surely going to move again on the very next step. In fact, if we plot many runs of the simulation from different initial conditions all on the same plot, we will see something like the figure below.

Simulation of the bistable system with noise ($\sigma = 0.01$) from many initial conditions.

You can run this example yourself

        options.plot_histogram = false;
        options.time_reversed_system = true;

During any individual simulation, the state jumps around randomly for all time, and could even transition from the vicinity of one fixed point to the other fixed point. Another way to visualize this output is to animate a histogram of the particles over time.

Histogram of the states of the bistable system with noise ($\sigma = 0.01$) after simulating from random initial conditions until $t=30$.
Click here to see the animation

        options.plot_histogram = true;
        options.time_reversed_system = true;

Let's take a moment to appreciate the implications of what we've just observed. Every time that we've analyzed a system to date, we've asked questions like "given x[0], what is the long-term behavior of the system, $\lim_{n\rightarrow\infty} x[n]$?", but now $x[n]$ is a random variable. The trajectories of this system do not converge, and the system does not exhibit any form of stability that we've introduced so far.

All is not lost. If you watch the animation closely, you might notice the distribution of this random variable is actually very well behaved. This is the key idea for this chapter.

Let us use $p_n(x)$ to denote the probability density function over the random variable $x$ at time $n$. Note that this density function must satisfy \[ \int_{-\infty}^\infty p_n(x) dx = 1.\] It is actually possible to write the dynamics of the probability density with the simple relation \[ p_{n+1}(x) = \int_{-\infty}^\infty p(x|x') p_n(x') dx', \] where $p(x|x')$ encodes the stochastic dynamics as a conditional distribution of the next state (here $x$) as a function of the current state (here $x'$). Dynamical systems that can be encoded in this way are known as continuous-state Markov Processes, and the governing equation above is often referred to as the "master equation" for the stochastic process. In fact this update is even linear(!) ; a fact that can enable closed-form solutions to some impressive long-term statistics, like mean time to failure or first passage timesByl08a. Unfortunately, it is often difficult to perform the integral in closed form, so we must often resort to discretation or numerical analysis.

The slightly more general form of the master equation, which works for multivariable distributions with state-domain ${\cal X}$, and systems with control inputs $\bu$, is \[ p_n(\bx) = \int_{\cal X} p(\bx|\bx',\bu) p_n(\bx') d\bx'. \] This is a (continuous-state) Markov Decision Process.

Stationary Distributions

In the example above, the histogram is our numerical approximation of the probability density. If you simulate it a few times, you will probably believe that, although the individual trajectories of the system do not converge, the probability distribution actually does converge to what's known as a stationary distribution -- a fixed point of the master equation. Instead of thinking about the dynamics of the trajectories, we need to start thinking about the dynamics of the distribution.

The Linear Gaussian System

Let's consider the one-dimensional linear system with additive noise: \[ x[n+1] = a x[n] + w[n]. \] When $w[n]=0$, the system is stable (to the origin) for $-1 < a < 1$. Let's make sure that we can understand the dynamics of the master equation for the case when $w[n]$ is Gaussian white noise with standard deviation $\sigma$.

First, recall that the probability density function of a Gaussian with mean $\mu$ is given by \[ p(w) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(w-\mu)^2}{2\sigma^2}}. \] When conditioned on $x[n]$, the distribution given by the dynamics subjected to mean-zero Gaussian white noise is simply another Gaussian, with the mean given by $ax[n]$: \[ p(x[n+1]|x[n]) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x[n+1]-ax[n])^2}{2\sigma^2}}, \] yielding the master equation \[ p_{n+1}(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty e^{-\frac{(x-ax')^2}{2\sigma^2}} p_n(x') dx'. \]

Now here's the magic. Let's push a distribution, $p_n(x)$, which is zero-mean, with standard deviation $\sigma_n$, through the master equation: \begin{align*} p_{n+1}(x) =& \frac{1}{\sqrt{2 \pi \sigma^2}}\frac{1}{\sqrt{2 \pi \sigma_n^2}} \int_{-\infty}^\infty e^{-\frac{(x-ax')^2}{2\sigma^2}} e^{-\frac{(x')^2}{2\sigma_n^2}} dx',\\ =& \frac{1}{\sqrt{2 \pi (\sigma^2+a^2 \sigma_n^2)}} e^{-\frac{x^2}{2(\sigma^2 + a^2 \sigma_n^2)}}. \end{align*} The result is another mean-zero Gaussian with $\sigma_{n+1}^2 = \sigma^2 + a^2 \sigma_n^2$. This result generalizes to the multi-variate case, too, and might be familiar to you e.g. from the process update of the Kalman filter.

Taking it a step further, we can see that a stationary distribution for this system is given by a mean-zero Gaussian with \[ \sigma_*^2 = \frac{\sigma}{1-a^2}. \] Note that this distribution is well defined when $-1 < a < 1$ (only when the system is stable).

The stationary distribution of the linear Gaussian system reveals the fundamental and general balance between two terms in the governing equations of any stochastic dynamical system: the stability of the deterministic system is bringing trajectories together (smaller $a$ means faster convergence of the deterministic system and results in a more narrow distribution), but the noise in the system is forcing trajectories apart (larger $\sigma$ means larger noise and results in a wider distribution).

Given how rich the dynamics can be for deterministic nonlinear systems, you can probably imagine that the possible long-term dynamics of the probability are also extremely rich. If we simply flip the signs in the dynamics we examined above, we'll get our next example:

The Cubic Example + Noise

Now let's consider the discrete-time approximation of \[ \dot{x} = -x + x^3, \] again with additive noise: \[ x[n+1] = x[n] + h (-x[n] + x[n]^3 + w[n]). \] With $w[n]=0$, the system has only a single stable fixed point (at the origin), whose deterministic region of attraction is given by $x \in (-1,1)$. If we again simulate the system from a set of random initial conditions and plot the histogram, we will see something like the figure below.

Histogram of the states of the bistable system with noise ($\sigma = 0.01$) after simulating from random initial conditions until $t=30$.
Click here to see the animation

Be sure to watch the animation (or run the simulation):

        options.plot_histogram = true;
        options.time_reversed_system = false;

Now try increasing the noise covariance:

        options.plot_histogram = true;
        options.time_reversed_system = false;
        options.noise_covariance = 2;

Click here to see the animation

What is the stationary distribution for this system? In fact, there isn't one. Although we initially collect probability density around the stable fixed point, you should notice a slow leak -- on every step there is some probability of transitioning past unstable fixed points and getting driven by the unstable dynamics off towards infinity. If we run the simulation long enough, there won't be any probability density left at $x=0$.

The Stochastic Van der Pol Oscillator

One more example; this is a fun one. Let's think about one of the simplest examples that we had for a system that demonstrates limit cycle stability -- the Van der Pol oscillator -- but now we'll add Gaussian white noise, $$\ddot{q} + (q^2 - 1)\dot{q} + q = w(t),$$ Here's the question: if we start with a small set of initial conditions centered around one point on the limit cycle, then what is the long-term behavior of this distribution?

Randomly sampled initial conditions pictured with the stable limit cycle of the Van der Pol oscillator.

Since the long-term behavior of the deterministic system is periodic, it would be very logical to think that the state distribution for this stochastic system would fall into a stable periodic solution, too. But think about it a little more, and then watch the animation (or run the simulation yourself).


Click here to see the animation (first 20 seconds)

Click here to see the particles after 2000 seconds of simulation

The explanation is simple: the periodic solution of the system is only orbitally stable; there is no stability along the limit cycle. So any disturbances that push the particles along the limit cycle will go unchecked. Eventually, the distribution will "mix" along the entire cycle. Perhaps surprisingly, this system that has a limit cycle stability when $w=0$ eventually reaches a stationary distribution (fixed point) in the master equation.


Coming soon. Discretize, then closed form solutions using the transition matrix. Finite-time verification with sums of squares.

Extended Example: The Rimless Wheel on Rough Terrain

Coming soon. Read the paper Byl08f and/or watch the video.

Control Design (a preview)

Dynamic Programming

In chapter 2, we spent some time thinking about the phase portrait of the simple pendulum, and concluded with a challenge: can we design a nonlinear controller to reshape the phase portrait, with a very modest amount of actuation, so that the upright fixed point becomes globally stable? With unbounded torque, feedback linearization solutions (e.g., invert gravity) can work well, but can also require an unnecessarily large amount of control effort. The energy-based swing-up control solutions presented for the acrobot and cart-pole systems are considerably more appealing, but required some cleverness and might not scale to more complicated systems. Here we investigate another approach to the problem, using computational optimal control to synthesize a feedback controller directly.

Formulating control design as an optimization

In this chapter, we will introduce optimal control - a control design process using optimization. This approach is powerful for a number of reasons. First and foremost, it is very general - allowing us to specify the goal of control equally well for fully- or under-actuated, linear or nonlinear, deterministic or stochastic, and continuous or discrete systems. Second, it permits concise descriptions of potentially very complex desired behaviours, specifying the goal of control as an scalar objective (plus a list of constraints). Finally, and most importantly, optimal control is very amenable to numerical solutions. Bertsekas00a is a fantastic reference on this material.

The fundamental idea in optimal control is to formulate the goal of control as the long-term optimization of a scalar cost function. Let's introduce the basic concepts by considering a system that is even simpler than the simple pendulum.

Optimal Control Formulations for the Double Integrator

Consider the double integrator system $$\ddot{q} = u, \quad |u| \le 1.$$ If you would like a mechanical analog of the system (I always do), then you can think about this as a unit mass brick moving along the x-axis on a frictionless surface, with a control input which provides a horizontal force, $u$. The task is to design a control system, $u = \pi(\bx,t)$, $\bx=[q,\dot{q}]^T$ to regulate this brick to $\bx = [0,0]^T$.

The double integrator as a unit-mass brick on a frictionless surface

In order to formulate this control design problem using optimal control, we must define a scalar objective which scores the long-term performance of running each candidate control policy, $\pi(\bx,t)$, from each initial condition, $(\bx_0,t_0)$, and a list of constraints that must be satisfied. For the task of driving the double integrator to the origin, one could imagine a number of optimal control formulations which would accomplish the task, e.g.:

  • Minimum time: $\min_\pi t_f,$ subject to $\bx(t_0) = \bx_0,$ $\bx(t_f) = {\bf 0}.$
  • Quadratic cost: $\min_\pi \int_{t_0}^{\infty} \left[ \bx^T(t) {\bf Q} \bx(t) \right] dt,$ ${\bf Q}\succ0$.

where the first is a constrained optimization formulation which optimizes time, and the second accrues a penalty at every instance according to the distance that the state is away from the origin (in a metric space defined by the matrix ${\bf Q}$), and therefore encourages trajectories that go more directly towards the goal, possibly at the expense of requiring a longer time to reach the goal (in fact it will result in an exponential approach to the goal, where as the minimum-time formulation will arrive at the goal in finite time). Note that both optimization problems are only well defined if it is possible for the system to actually reach the origin, otherwise the minimum-time problem cannot satisfy the terminal constraint, and the integral in the quadratic cost would not converge to a finite value as time approaches infinity (fortunately the double integrator system is controllable, and therefore can be driven to the goal in finite time).

Note that the input limits, $|u|\le1$ are also required to make this problem well-posed; otherwise both optimizations would result in the optimal policy using infinite control input to approach the goal infinitely fast. Besides input limits, another common approach to limiting the control effort is to add an additional quadratic cost on the input (or "effort"), e.g. $\int \left[ \bu^T(t) {\bf R} \bu(t) \right] dt,$ ${\bf R}\succ0$. This could be added to either formulation above. We will examine many of these formulations in some details in the examples worked out at the end of this chapter.

Optimal control has a long history in robotics. For instance, there has been a great deal of work on the minimum-time problem for pick-and-place robotic manipulators, and the linear quadratic regulator (LQR) and linear quadratic regulator with Gaussian noise (LQG) have become essential tools for any practicing controls engineer. With increasingly powerful computers and algorithms, the popularity of numerical optimal control has grown at an incredible pace over the last few years.

The minimum time problem for the double integrator

For more intuition, let's do an informal derivation of the solution to the minimum time problem for the double integrator with input constraints: \begin{align*} \minimize_{\pi} \quad & t_f\\ \subjto \quad & \bx(t_0) = \bx_0, \\ & \bx(t_f) = {\bf 0}, \\ & \ddot{q}(t) = u(t), \\ & |u| \le 1. \end{align*} What behavior would you expect an optimal controller exhibit?

Your intuition might tell you that the best thing that the brick can do, to reach the goal in minimum time with limited control input, is to accelerate maximally towards the goal until reaching a critical point, then hitting the brakes in order to come to a stop exactly at the goal. This would be called a bang-bang control policy; these are often optimal for systems with bounded input, and it is in fact optimal for the double integrator, although we will not prove it until we have developed more tools.

Let's work out the details of this bang-bang policy. First, we can figure out the states from which, when the brakes are fully applied, the system comes to rest precisely at the origin. Let's start with the case where $q(0) < 0$, and $\dot{q}(0)>0$, and "hitting the brakes" implies that $u=-1$ . Integrating the equations, we have \begin{gather*} \ddot{q}(t) = u = -1 \\\dot{q}(t) = \dot{q}(0) - t \\ q(t) = q(0) + \dot{q}(0) t - \frac{1}{2} t^2. \end{gather*} Substituting $t = \dot{q}(0) - \dot{q}$ into the solution give $\dot{q}$ reveals that the system orbits are parabolic arcs: \[ q = -\frac{1}{2} \dot{q}^2 + c_{-}, \] with $c_{-} = q(0) + \frac{1}{2}\dot{q}^2(0)$.

Two solutions for the system with $u=-1$

Similarly, the solutions for $u=1$ are $q = \frac{1}{2} \dot{q}^2 + c_{+}$, with $c_{+}=q(0)-\frac{1}{2}\dot{q}^2(0)$.

Perhaps the most important of these orbits are the ones that pass directly through the origin (e.g., $c_{-}=0$). Following our initial logic, if the system is going slower than this $\dot{q}$ for any $q$, then the optimal thing to do is to slam on the accelerator ($u=-\text{sgn}(q)$). If it's going faster than the $\dot{q}$ that we've solved for, then still the best thing to do is to brake; but inevitably the system will overshoot the origin and have to come back. We can summarize this policy with: \[ u = \begin{cases} +1 & \text{if } (\dot{q} < 0 \text{ and } q \le \frac{1}{2} \dot{q}^2) \text{ or } (\dot{q}\ge 0 \text{ and } q < -\frac{1}{2} \dot{q}^2) \\ 0 & \text{if } q=0 \text{ and } \dot{q}=0 \\ -1 & \text{otherwise} \end{cases} \]

Candidate optimal (bang-bang) policy for the minimum-time double integrator problem.
and illustrate some of the optimal solution trajectories:
Solution trajectories of system using the optimal policy

And for completeness, we can compute the optimal time to the goal by solving for the amount of time required to reach the switching surface plus the amount of time spent moving along the switching surface to the goal. With a little algebra, you will find that the time to goal, $T(\bx)$, is given by \[ T(\bx) = \begin{cases} 2\sqrt{\frac{1}{2}\dot{q}^2-q} - \dot{q} & \text{for } u=+1 \text{ regime}, \\ 0 & \text{for } u=0, \\ \dot{q} + 2\sqrt{\frac{1}{2}\dot{q}^2+q} & \text{for } u=-1, \end{cases} \] plotted here:

Time to the origin using the bang-bang policy
Notice that the function is continuous (though not smooth), even though the policy is discontinuous.

Additive cost

As we begin to develop theoretical and algorithmic tools for optimal control, we will see that some formulations are much easier to deal with than others. One important example is the dramatic simplification that can come from formulating objective functions using additive cost, because they often yield recursive solutions. In the additive cost formulation, the long-term "score" for a trajectory can be written as $$\int_0^T g(x(t),u(t)) dt,$$ where $g()$ is the instantaneous cost, and $T$ can be either a finite real number or $\infty$. We will call a problem specification with a finite $T$ a "finite-horizon" problem, and $T=\infty$ an "infinite-horizon" problem. Problems and solutions for infinite-horizon problems tend to be more elegant, but care is required to make sure that the integral converges for the optimal controller (typically by having an achievable goal that allows the robot to accrue zero-cost).

At first glance, our minimum-time problem formulation for the double integrator does not look like an additive cost problem. However, we can write in as an additive cost problem using an infinite horizon and the instantaneous cost $$g(x,u) = \begin{cases} 0 & \text{if } x=0, \\ 1 & \text{otherwise.} \end{cases}$$

We will examine a number of approaches to solving optimal control problems throughout the next few chapters. For the remainder of this chapter, we will focus on additive-cost problems and their solution via dynamic programming.

Optimal control as graph search

For systems with continuous states and continuous actions, dynamic programming is a set of theoretical ideas surrounding additive cost optimal control problems. For systems with a finite, discrete set of states and a finite, discrete set of actions, dynamic programming also represents a set of very efficient numerical algorithm which can compute optimal feedback controllers. Many of you will have learned it before as a tool for graph search.

Imagine you have a directed graph with states (or nodes) $\{s_1,s_2,...\} \in S$ and "actions" associated with edges labeled as $\{a_1,a_2,...\} \in A$, as in the following trivial example:

A simple directed graph.

Let us also assume that each edge has an associate weight or cost, using $g(s,a)$ to denote the cost of being in state $s$ and taking action $a$. Furthermore we will denote the transition "dynamics" using \[ s[n+1] = f(s[n],a[n]). \] For instance, in the graph above, $f(s_1,a_1) = s_2$.

There are many algorithms for finding (or approximating) the optimal path from a start to a goal on directed graphs. In dynamic programming, the key insight is that we can find the shortest path from every node by solving recursively for the optimal cost-to-go from every node to the goal. One such algorithm starts by initializing an estimate $\hat{J}^*=0$ for all $s_i$, then proceeds with an iterative algorithm which sets \begin{equation} \hat{J}^*(s_i) \Leftarrow \min_{a \in A} \left[ g(s_i,a) + \hat{J}^*\left({f(s_i,a)}\right) \right]. \label{eq:value_update} \end{equation} In software, $\hat{J}^*$ can be represented as a vector with dimension equal to the number of discrete states. This algorithm, appropriately known as value iteration, is guaranteed to converge to the optimal cost-to-go, $\hat{J}^* \rightarrow J^*$ Bertsekas96, and in practice does so rapidly. Typically the update is done in batch -- e.g. the estimate is updated for all $i$ at once -- but the asynchronous version where states are updated one at a time is also known to converge, so long as every state is eventually updated infinitely often. Assuming the graph has a goal state with a zero-cost self-transition, then this cost-to-go function represents the weighted shortest distance to the goal.

Value iteration is an amazingly simple algorithm, but it accomplishes something quite amazing: it efficiently computes the long-term cost of an optimal policy from every state by iteratively evaluating the one-step cost. If we know the optimal cost-to-go, then it's easy to extract the optimal policy, $a = \pi^*(s)$: \begin{equation} \pi^*(s_i) = \argmin_a \left[ g(s_i,a) + J^*\left( f(s_i,a) \right) \right]. \label{eq:policy_update} \end{equation} It's a simple algorithm, but playing with an example can help our intuition.

Grid World

Imagine a robot living in a grid (finite state) world. Wants to get to the goal location. Possibly has to negotiate cells with obstacles. Actions are to move up, down, left, right, or do nothing. Sutton98

The one-step cost for the grid-world minimum-time problem. The goal state has a cost of zero, the obstacles have a cost of 10, and every other state has a cost of 1. Click the image to watch the value iteration algorithm in action.
insert value iteration animation for min time and quadratic cost

You can experiment with the grid world example in using


I recommend trying to edit the cost function and obstacles.

TODO: figure/text for graph approximation of a continuous state space.

Dynamic Programming for the Double Integrator

You can run value iteration for the double integrator (using barycentric interpolation to interpolate between nodes) in using:


Again, you can easily try different cost functions by editing the code yourself.

Let's take a minute to appreciate how amazing this is. Our solution to finding the optimal controller for the double integrator wasn't all that hard, but it required some mechanical intuition and solutions to differential equations. The resulting policy was non-trivial -- bang-bang control with a parabolic switching surface. The value iteration algorithm doesn't use any of this directly -- it's a simple algorithm for graph search. But remarkably, it can generate effectively the same policy with just a few moments of computation.

It's important to note that there are some differences between the computed policy and the optimal policy that we derived, due to discretization errors. We will ask you to explore these in the problems.

The real value of this numerical solution, however, is unlike our analytical solution for the double integrator, we can apply this same algorithm to any number of dynamical systems virtually without modification. Let's apply it now to the simple pendulum, which was intractable analytically.

Dynamic Programming for the Simple Pendulum

You can run value iteration for the simple pendulum (using barycentric interpolation to interpolate between nodes) in using:


Again, you can easily try different cost functions by editing the code yourself.

Continuous dynamic programming

I find the graph search algorithm extremely satisfying as a first step, but also become quickly frustrated by the limitations of the discretization required to use it. In many cases, we can do better; coming up with algorithms which work more natively on continuous dynamical systems. We'll explore those extensions in this section.

The Hamilton-Jacobi-Bellman Equation

It's important to understand that the value iteration equations, equations (\ref{eq:value_update}) and (\ref{eq:policy_update}), are more than just an algorithm. They are also sufficient conditions for optimality: if we can produce a $J^*$ and $\pi^*$ which satisfy these equations, then $\pi^*$ must be an optimal controller. There are an analogous set of conditions for the continuous systems. For a system $\dot{\bx} = f(\bx,\bu)$ and an infinite-horizon additive cost $\int_0^\infty g(\bx,\bu)dt$, we have: \begin{gather} 0 = \min_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu) \right], \label{eq:HJB} \\ \pi^*(\bx) = \argmin_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu) \right]. \end{gather} Equation \ref{eq:HJB} is known as the Hamilton-Jacobi-Bellman (HJB) equation. Bertsekas05 gives an informal derivation of these equations as the limit of a discrete-time approximation of the dynamics, and also gives the following sufficiency theorem:

HJB Sufficiency Theorem

The following statement is adapted from Proposition 3.2.1 of Bertsekas05. Consider a system $\dot{\bx}=f(\bx,\bu)$ and an infinite-horizon additive cost $\int_0^\infty g(\bx,\bu)dt$. Suppose $J(\bx)$ is a solution to the HJB equation; that is $J$ is continuously differentiable in $\bx$ and is such that \[ 0 = \min_{u\in U} \left[ g(\bx,\bu) + \pd{J}{\bx}f(\bx,\bu) \right],\quad \text{for all } \bx. \] Suppose also that $\pi^*(\bx)$ attains the minimum in the equation for all $\bx$. Further assume that the differential equation described by $f$ has a unique solution starting from any state $\bx$, and that the control input trajectory, $\bu^*(t)$, obtained by evaluating $\pi^*$ along any solution is piecewise continuous as a function of $t$. Then $J$ is equal to the optimal cost-to-go function, $J(\bx)=J^*(\bx)$ for all $\bx$, and the control trajectories $\bu^*(t)$ are optimal.

As a tool for verifying optimality, the HJB equations are actually surprisingly easy to work with: we can verify optimality for an infinite-horizon objective without doing any integration; we simply have to check a derivative condition on the optimal cost-to-go function $J^*$. Let's see this play out on the double integrator example.

HJB for the Double Integrator

Consider the problem of regulating the double integrator (this time without input limits) to the origin using a quadratic cost: $$ g(\bx,\bu) = q^2 + \dot{q}^2 + u^2. $$ I claim (without derivation) that the optimal controller for this objective is $$\pi(\bx) = -q - \sqrt{3}\dot{q}.$$ To convince you that this is indeed optimal, I have produced the following cost-to-go function: $$J(\bx) = \sqrt{3} q^2 + 2 q \dot{q} + \sqrt{3} \dot{q}^2.$$

Taking \begin{gather*} \pd{J}{q} = 2\sqrt{3} q + 2\dot{q}, \qquad \pd{J}{\dot{q}} = 2q + 2\sqrt{3}\dot{q}, \end{gather*} we can write \begin{align*} g(\bx,\bu) + \pd{J}{\bx}f(\bx,\bu) &= q^2 + \dot{q}^2 + u^2 + (2\sqrt{3} q + 2\dot{q}) \dot{q} + (2q + 2\sqrt{3}\dot{q}) u \end{align*} This is a convex quadratic function in $u$, so we can find the minimum with respect to $u$ by finding where the gradient with respect to $u$ evaluates to zero. \[ \pd{}{u} \left[ g(\bx,\bu) + \pd{J}{\bx} f(\bx,\bu) \right] = 2u + 2q + 2\sqrt{3}\dot{q}. \] Setting this equal to $0$ and solving for $u$ yields: $$u^* = -q - \sqrt{3} \dot{q},$$ thereby confirming that our policy $\pi$ is in fact the minimizer. Substituting $u^*$ back into the HJB reveals that the right side does in fact simplify to zero. I hope you are convinced!

Note that evaluating the HJB for the time-to-go of the minimum-time problem for the double integrator will also reveal that the HJB is satisfied wherever that gradient is well-defined. This is certainly mounting evidence in support of our bang-bang policy being optimal, but since $\pd{J}{\bx}$ is not defined everywhere, it does not actually satisfy the requirements of the sufficiency theorem as stated above.

Solving for the minimizing control

We still face a few barriers to actually using the HJB in an algorithm. The first barrier is the minimization over $u$. When the action set was discrete, as in the graph search version, we could evaluate the one-step cost plus cost-to-go for every possible action, and then simply take the best. For continuous action spaces, in general we cannot rely on the strategy of evaluating a finite number of possible $\bu$'s to find the minimizer.

All is not lost. In the quadratic cost double integrator example above, we were able to solve explicitly for the minimizing $\bu$ in terms of the cost-to-go. It turns out that this strategy will actually work for a number of the problems we're interested in, even when the system (which we are given) or cost function (which we are free to pick, but which should be expressive) gets more complicated.

Recall that I've already tried to convinced you that a majority of the systems of interest are control affine, e.g. I can write \[ f(\bx,\bu) = f_1(\bx) + f_2(\bx)\bu. \] We can make another dramatic simplification by restrict ourselves to instantaneous cost functions of the form \[ g(\bx,\bu) = g_1(\bx) + \bu^T {\bf R} \bu, \qquad {\bf R}={\bf R}^T \succ 0. \] In my view, this is not very restrictive - many of the cost functions that I find myself choosing to write down can be expressed in this form. Given these assumptions, we can write the HJB as \[ 0 = \min_{\bu} \left[ g_1(\bx) + \bu^T {\bf R} \bu + \pd{J}{\bx} \left[ f_1(\bx) + f_2(\bx)\bu \right]\right]. \] Since this is a quadratic function in $\bu$, if the system does not have any constraints on $\bu$, then we can solve in closed-form for the minimizing $\bu$ by taking the gradient of the right-hand size: \[ \pd{}{\bu} = 2\bu^T {\bf R} + \pd{J}{\bx} f_2(\bx) = 0, \] and setting it equal to zero to obtain \[ \bu^* = -\frac{1}{2}{\bf R}^{-1}f_2^T(\bx) \pd{J}{\bx}^T.\] If there are linear constraints on the input, such as torque limits, then more generally this could be solved (at any particular $\bx$) as a quadratic program.

What happens in the case where our system is not control affine or if we really do need to specify an instantaneous cost function on $\bu$ that is not simply quadratic? If the goal is to produce an iterative algorithm, like value iteration, then one common approach is to make a (positive-definite) quadratic approximation in $\bu$ of the HJB, and updating that approximation on every iteration of the algorithm. This broad approach is often referred to as differential dynamic programming (c.f. Jacobson70).

Numerical solutions for $J^*$

The other major barrier to using the HJB in a value iteration algorithm is that the estimated optimal cost-to-go function, $\hat{J}^*$, must somehow be represented with a finite set of numbers, but we don't yet know anything about the potential form it must take. In fact, knowing the time-to-goal solution for minimum-time problem with the double integrator, we see that this function might need to be non-smooth for even very simple dynamics and objectives.

One natural way to parameterize $\hat{J}^*$ -- a scalar valued-function defined over the state space -- is to define the values on a mesh. This approach then admits algorithms with close ties to the relatively very advanced numerical methods used to solve other partial differential equations (PDEs), such as the ones that arrive in finite element modeling or fluid dynamics. One important difference, however, is that our PDE lives in the dimension of the state space, while many of the mesh algorithms from the other disciplines are optimized for two or three dimensional space.

A slightly more general view of the problem would describe the mesh (and the associated interpolation functions) as just one form of representations for function approximation.

(see Appendix C for a brief background on function approximation)

Representing the cost-to-go with function approximation

If we approximate $J^*$ with a finitely-parameterized function $\hat{J}_\balpha^*$, with parameter vector $\balpha$, then this immediately raises many important questions:

  • What if the true cost-to-go function does not live in the prescribed function class; e.g., there does not exist an $\balpha$ which satisfies the sufficiency conditions for all $\bx$? (This seems very likely to be the case.)
  • What update should we apply in the iterative algorithm?
  • What can we say about it's convergence?
add quote from neuro-dynamic programming preface

In general, many of these questions will have to go unanswered. But there are some answers available for the special case of linear function approximators. A linear function approximator takes the form: \[ \hat{J}^*_\balpha(\bx) = \sum_i \alpha_i \psi_i(\bx) = \balpha^T {\bf\psi}(\bx), \] where ${\bf \psi}(\bx)$ is a vector of potentially nonlinear features. Common examples of features include polynomials, radial basis functions, or the most interpolation schemes used on a mesh.

add citations for convergence results
try just running snopt on quartic objective

Stochastic Optimal Control

One of the most amazing features of the dynamic programming, additive cost approach to optimal control is the relative ease with which it extends to optimizing stochastic systems.

Graph search -- stochastic shortest path problems

For discrete systems, we can generalize our dynamics on a graph by adding action-dependent transition probabilities to the edges. This new dynamical system is known as a Markov Decision Process (MDP), and we write the dynamics completely in terms of the transition probabilities \[ \Pr(s[n+1] = s' | s[n] = s, a[n] = a). \] For discrete systems, this is simply a big lookup table. The cost that we incur for any execution of system is now a random variable, and so we formulate the goal of control as optimizing the expected cost, e.g. \[ J^*(s[0]) = \min_{a[\cdot]} E \left[ \sum_{n=0}^\infty g(s[n],a[n]) \right]. \] Note that there are many other potential objectives, such as minimizing the worst-case error, but the expected cost is special because it preserves the dynamic programming recursion: \[ J^*(s) = \min_a E \left[ g(s,a) + J^*(s')\right] = \min_a \left[ g(s,a) + \sum_{s'} \Pr(s'|s,a) J^*(s') \right].\] Remarkably, if we use these optimality conditions to construct our value iteration algorithm \[ \hat{J}(s) \Leftarrow \min_a \left[ g(s,a) + \sum_{s'} \Pr(s'|s,a) \hat{J}(s') \right],\] then this algorithm has the same strong convergence guarantees of its counterpart for deterministic systems. And it is essentially no more expensive to compute!

MDP approximations of deterministic, continuous-state systems

There is a particularly cute trick that takes advantage of the computational simplicity of the discrete MDP value iteration algorithm. Let's assume that we have discrete control inputs and discrete-time dynamics, but a continuous state space. Imagine that we use a mesh to approximate the cost-to-go function over that state space with $K$ mesh points $\hat{J}(x_k)$. We would like to perform the value iteration update: \[ \forall k, \hat{J}^*(x_k) \Leftarrow \min_{a \in A} \left[ g(x_k,a) + \hat{J}^*\left({f(x_k,a)}\right) \right], \] but must deal with the fact that $f(x_k,a)$ might not result in a next state that is directly at a mesh point. Most interpolation schemes for a mesh can be written as some weighted combination of the values at nearby mesh points, e.g. \[ \hat{J}^*(x) = \sum_i \beta_i \hat{J}^*(x_i), \quad \sum_i \beta_i = 1 \] with $\beta_i$ the relative weight of the $i$th mesh point. The resulting update is exactly the same as if we treated the system as a discrete state MDP with $\beta_i$ representing the probability of transitioning to state $x_i$!

If you looked at the code powering the numerical value iteration results above, you will see that we used precisely this trick -- the continuous problem is discretized into the corresponding MDP and then solved as a discrete system. This results in nice clean code, but also sheds light on the impact of discretization on the solutions -- discretization error here will cause a sort of diffusion corresponding to the probability of spreading across neighboring nodes.

Note that the update above could also be viewed as an approximation of the fitted value iteration algorithm for linear function approximators described above. It is missing a few terms, but is easier to implement and remarkably, is still known to converge.

add citation


add section on extensions. discounting. finite-horizon / time-varying dynamics or cost.

Linear Quadratic Regulators

While solving the dynamic programming problem for continuous systems is very hard in general, there are a few very important special cases where the solutions are very accessible. Most of these involve variants on the case of linear dynamics and quadratic cost. The simplest case, called the linear quadratic regulator (LQR), is formulated as stabilizing a time-invariant linear system to the origin.

The linear quadratic regulator is likely the most important and influential result in optimal control theory to date. In this chapter we will derive the basic algorithm and a variety of useful extensions.

Basic Derivation

Consider a linear time-invariant system in state-space form, $$\dot{\bx} = {\bf A}\bx + {\bf B}\bu,$$ with the infinite-horizon cost function given by $$J = \int_0^\infty \left[ \bx^T {\bf Q} \bx + \bu^T {\bf R} \bu \right] dt, \quad {\bf Q} = {\bf Q}^T \ge {\bf 0}, {\bf R} = {\bf R}^T > 0.$$ Our goal is to find the optimal cost-to-go function $J^*(\bx)$ which satisfies the HJB: $$\forall \bx, \quad 0 = \min_\bu \left[ \bx^T {\bf Q} \bx + \bu^T {\bf R} \bu + \pd{J^*}{\bx} \left( {\bf A}\bx + {\bf B}\bu \right) \right].$$

There is one magic step here -- it is well known that for this problem the optimal cost-to-go function is quadratic. This is easy to verify. Let us choose the form: $$J^*(\bx) = \bx^T {\bf S} \bx, \quad {\bf S} = {\bf S}^T \succeq 0.$$ The gradient of this function is $$\pd{J^*}{\bx} = 2 \bx^T {\bf S}.$$

Since we have guaranteed, by construction, that the terms inside the $\min$ are quadratic and convex (because ${\bf R} \succ 0$), we can take the minimum explicitly by finding the solution where the gradient of those terms vanishes: $$\pd{}{\bu} = 2\bu^T {\bf R} + 2 \bx^T {\bf S} {\bf B} = 0.$$ This yields the optimal policy $$\bu^* = \pi^*(\bx) = - {\bf R}^{-1} {\bf B}^T {\bf S} \bx = - {\bf K} \bx.$$

Inserting this back into the HJB and simplifying yields $$0 = \bx^T \left[ {\bf Q} - {\bf S B R}^{-1}{\bf B}^T{\bf S} + 2{\bf SA} \right]\bx.$$ All of the terms here are symmetric except for the $2{\bf SA}$, but since $\bx^T{\bf SA}\bx = \bx^T{\bf A}^T{\bf S}\bx$, we can write $$0 = \bx^T \left[ {\bf Q} - {\bf S B R}^{-1}{\bf B}^T{\bf S} + {\bf SA} + {\bf A}^T{\bf S} \right]\bx.$$ and since this condition must hold for all $\bx$, it is sufficient to consider the matrix equation $$0 = {\bf S} {\bf A} + {\bf A}^T {\bf S} - {\bf S} {\bf B} {\bf R}^{-1} {\bf B}^T {\bf S} + {\bf Q}.$$ This extremely important equation is a version of the algebraic Riccati equation. Note that it is quadratic in ${\bf S}$, making its solution non-trivial, but it is well known that the equation has a single positive-definite solution if and only if the system is controllable and there are good numerical methods for finding that solution, even in high-dimensional problems. Both the optimal policy and optimal cost-to-go function are available from MATLAB's Control Systems Toolbox by calling [K,S] = lqr(A,B,Q,R).

It is worth examining the form of the optimal policy more closely. Since the value function represents cost-to-go, it would be sensible to move down this landscape as quickly as possible. Indeed, $-{\bf S}\bx$ is in the direction of steepest descent of the value function. However, not all directions are possible to achieve in state-space. $-{\bf B}^T {\bf S} \bx$ represents precisely the projection of the steepest descent onto the control space, and is the steepest descent achievable with the control inputs $\bu$. Finally, the pre-scaling by the matrix ${\bf R}^{-1}$ biases the direction of descent to account for relative weightings that we have placed on the different control inputs. Note that although this interpretation is straight-forward, the slope that we are descending (in the value function, ${\bf S}$) is a complicated function of the dynamics and cost.

Local stabilization of nonlinear systems

LQR is extremely relevant to us even though our primary interest is in nonlinear dynamics, because it can provide a local approximation of the optimal control solution for the nonlinear system. Given the nonlinear system $\dot{\bx} = f(\bx,\bu)$, and a stabilizable operating point, $(\bx_0,\bu_0)$, with $f(\bx_0,\bu_0) = 0$. We can define a relative coordinate system $$\bar\bx = \bx - \bx_0, \quad \bar\bu = \bu - \bu_0,$$ and observe that $$\dot{\bar\bx} = \dot{\bx} = f(\bx,\bu),$$ which we can approximate with a first-order Taylor expansion to $$\dot{\bar\bx} \approx f(\bx_0,\bu_0) + \pd{f(\bx_0,\bu_0)}{\bx} (\bx - \bx_0) + \pd{f(\bx_0,\bu_0)}{\bu} (\bu - \bu_0) = {\bf A}\bar{\bx} + {\bf B}\bar\bu.$$

Similarly, we can define a quadratic cost function in the error coordinates, or take a (positive-definite) second-order approximation of a nonlinear cost function about the operating point (linear and constant terms in the cost function can be easily incorporated into the derivation by parameterizing a full quadratic form for $J^*$, as seen in the Linear Quadratic Tracking derivation below).

The resulting controller takes the form $\bar\bu^* = -{\bf K}\bar\bx$ or $$\bu^* = \bu_0 - {\bf K} (\bx - \bx_0).$$

add discrete time derivation?

Finite-horizon formulations

Recall that the cost-to-go for finite-horizon problems is time-dependent, and therefore the HJB sufficiency condition requires an additional term for $\pd{J^*}{t}$. $$ \forall \bx, \forall t\in[t_0,t_f],\quad 0 = \min_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu) + \pd{J^*}{t} \right]. $$

Finite-horizon LQR

Consider systems governed by an LTI state-space equation of the form $$\dot{\bx} = {\bf A}\bx + {\bf B}\bu,$$ and a finite-horizon cost function with \begin{gather*} h(\bx) = \bx^T {\bf Q}_f \bx,\quad {\bf Q}_f = {\bf Q}_f^T \ge {\bf 0} \\ g(\bx,\bu,t) = \bx^T {\bf Q} \bx + \bu^T {\bf R} \bu, \quad {\bf Q} = {\bf Q}^T \ge 0, {\bf R}={\bf R}^T>0 \end{gather*} Writing the HJB, we have $$ 0 = \min_\bu \left[\bx^T {\bf Q} \bx + \bu^T {\bf R}\bu + \pd{J^*}{\bx} \left({\bf A} \bx + {\bf B} \bu \right) + \pd{J^*}{t} \right]. $$ Due to the positive definite quadratic form on $\bu$, we can find the minimum by setting the gradient to zero: \begin{gather*} \pd{}{\bu} = 2 \bu^T {\bf R} + \pd{J^*}{\bx} {\bf B} = 0 \\ \bu^* = \pi^*(\bx,t) = - \frac{1}{2}{\bf R}^{-1} {\bf B}^T \pd{J^*}{\bx}^T \end{gather*} In order to proceed, we need to investigate a particular form for the cost-to-go function, $J^*(\bx,t)$. Let's try a solution of the form: $$J^*(\bx,t) = \bx^T {\bf S}(t) \bx, \quad {\bf S}(t) = {\bf S}^T(t) > {\bf 0}.$$ In this case we have $$\pd{J^*}{\bx} = 2 \bx^T {\bf S}(t), \quad \pd{J*}{t} = \bx^T \dot{\bf S}(t) \bx,$$ and therefore \begin{gather*} \bu^* = \pi^*(\bx,t) = - {\bf R}^{-1} {\bf B}^T {\bf S}(t) \bx \\ 0 = \bx^T \left[ {\bf Q} - {\bf S}(t) {\bf B} {\bf R}^{-1} {\bf B}^T {\bf S}(t) + {\bf S}(t) {\bf A} + {\bf A}^T {\bf S}(t) + \dot{\bf S}(t) \right] \bx.\end{gather*} Therefore, ${\bf S}(t)$ must satisfy the condition (known as the continuous-time differential Riccati equation): $$-\dot{\bf S}(t) = {\bf S}(t) {\bf A} + {\bf A}^T {\bf S}(t) - {\bf S}(t) {\bf B} {\bf R}^{-1} {\bf B}^T {\bf S}(t) + {\bf Q},$$ and the terminal condition $${\bf S}(T) = {\bf Q}_f.$$ Since we were able to satisfy the HJB with the minimizing policy, we have met the sufficiency condition, and have found the optimal policy and optimal cost-to-go function.

Note that the infinite-horizon LQR solution described in the prequel is exactly the steady-state solution of this equation, defined by $\dot{\bf S}(t) = 0$. Indeed, for controllable systems this equation is stable (backwards in time), and as expected the finite-horizon solution converges on the infinite-horizon solution as the horizon time limits to infinity.

Time-varying LQR

The derivation above holds even if the dynamics are given by $$\dot{\bx} = {\bf A}(t)\bx + {\bf B(t)}\bu.$$ Similarly, the cost functions ${\bf Q}$ and ${\bf R}$ can also be time-varying. This is quite surprising, as the class of time-varying linear systems is a quite general class of systems. It requires essentially no assumptions on how the time-dependence enters, except perhaps that if ${\bf A}$ or ${\bf B}$ is discontinuous in time then one would have to use the proper techniques to accurately integrate the differential equation. As we will see in the next chapter, one of the most powerful applications involves linearizing around a nominal trajectory of a nonlinear system and using LQR to provide a trajectory controller.

Linear Quadratic Optimal Tracking

For completeness, we consider a slightly more general form of the linear quadratic regulator. The standard LQR derivation attempts to drive the system to zero. Consider now the problem: \begin{gather*} \dot{\bx} = {\bf A}\bx + {\bf B}\bu \\ h(\bx) = (\bx - \bx^d(T))^T {\bf Q}_f (\bx - \bx^d(T)), \quad {\bf Q}_f = {\bf Q}_f^T \ge 0 \\ g(\bx,\bu,t) = (\bx - \bx^d(t))^T {\bf Q} (\bx - \bx^d(t)) + (\bu - \bu^d(t))^T {\bf R} (\bu - \bu^d(t)),\\ {\bf Q} = {\bf Q}^T \ge 0, {\bf R}={\bf R}^T>0 \end{gather*} Now, guess a solution \begin{gather*} J^*(\bx,t) = \bx^T {\bf S_2}(t) \bx + \bx^T {\bf s_1}(t) + s_0(t), \quad {\bf S_2}(t) = {\bf S_2}^T(t) > {\bf 0}. \end{gather*} In this case, we have $$\pd{J^*}{\bx} = 2 \bx^T {\bf S_2}(t) + {\bf s_1}^T(t),\quad \pd{J^*}{t} = \bx^T \dot{\bf S_2}(t) \bx + \bx^T \dot{\bf s_1}(t) + \dot{s_0}(t).$$ Using the HJB, $$ 0 = \min_\bu \left[(\bx - \bx^d(t))^T {\bf Q} (\bx - \bx^d(t)) + (\bu - \bu^d(t))^T {\bf R} (\bu - \bu^d(t)) + \pd{J^*}{\bx} \left({\bf A} \bx + {\bf B} \bu \right) + \pd{J^*}{t} \right], $$ we have \begin{gather*} \pd{}{\bu} = 2 (\bu - \bu^d(t))^T{\bf R} + (2\bx^T{\bf S_2}(t) + {\bf s_1}^T(t)){\bf B} = 0,\\ \bu^*(t) = \bu^d(t) - {\bf R}^{-1} {\bf B}^T\left[{\bf S_2}(t)\bx + \frac{1}{2}{\bf s_1}(t)\right] \end{gather*} The HJB can be satisfied by integrating backwards \begin{align*} -\dot{\bf S_2}(t) =& {\bf Q} - {\bf S_2}(t) {\bf B} {\bf R}^{-1} {\bf B}^T {\bf S_2}(t) + {\bf S_2}(t) {\bf A} + {\bf A}^T {\bf S_2}(t)\\ -\dot{\bf s_1}(t) =& -2 {\bf Q} \bx^d(t) + \left[{\bf A}^T - {\bf S}_2 {\bf B} {\bf R}^{-1} {\bf B}^T \right]{\bf s_1}(t) + 2{\bf S_2}(t) {\bf B} \bu^d(t)\\ -\dot{s_0}(t) =& \bx^d(t)^T {\bf Q} \bx^d(t) - \frac{1}{4} {\bf s_1}^T(t) {\bf B} {\bf R}^{-1} {\bf B}^T {\bf s_1}(t) + {\bf s_1}(t)^T {\bf B} \bu^d(t), \end{align*} from the final conditions \begin{align*} {\bf S_2}(T) =& {\bf Q}_f\\ {\bf s_1}(T) =& -2 {\bf Q}_f \bx^d(T) \\ s_0(T) =& \left[\bx^d(T)\right]^T {\bf Q}_f \left[ \bx^d(T) \right]. \end{align*} Notice that the solution for ${\bf S_2}$ is the same as the simpler LQR derivation, and is symmetric (as we assumed). Note also that $s_0(t)$ has no effect on control (even indirectly), and so can often be ignored.

A quick observation about the quadratic form, which might be helpful in debugging. We know that $J(x,t)$ must be uniformly positive. This is true iff ${\bf S}_2>0$ and $s_0 > \frac{1}{4} {\bf s}_1^T {\bf S}_2^{-1} {\bf s}_1$, which comes from evaluating the function at $x_{min}$ defined by $\pd{}{x} = 0$.

test this on an example, get my notation consistent (s(t)^T vs s^T(t), etc.

Minimum-time LQR

\begin{gather*} \dot{\bx} = {\bf A}\bx + {\bf B}\bu \\ h(\bx) = \bx^T {\bf Q}_f \bx, \quad {\bf Q}_f = {\bf Q}_f^T \ge 0 \\ g(\bx,\bu,t) = 1 + \bx^T {\bf Q} \bx + \bu {\bf R} \bu,\\ {\bf Q} = {\bf Q}^T \ge 0, {\bf R}={\bf R}^T>0 \end{gather*}

Linear Final Boundary Value Problems

The finite-horizon LQR formulation can be used to impose a strict final boundary value condition by setting an infinite ${\bf Q}_f$. % ${\bf Q}_f = \infty {\bf I}$.. is this the correct $Q_f$? However, integrating the Riccati equation backwards from an infinite initial condition isn't very practical. To get around this, let us consider solving for ${\bf P}(t) = {\bf S}(t)^{-1}$. Using the matrix relation $\frac{d {\bf S}^{-1}}{dt} = - {\bf S}^{-1} \frac{d {\bf S}}{dt} {\bf S}^{-1}$, we have: $$-\dot{\bf P}(t) = - {\bf P}(t){\bf Q P}(t) + {\bf B R}^{-1} {\bf B} - {\bf A P}(t) - {\bf P}(t){\bf A}^T,$$ with the final conditions $${\bf P}(T) = 0.$$ This Riccati equation can be integrated backwards in time for a solution.

It is very interesting, and powerful, to note that, if one chooses ${\bf Q} = 0$, therefore imposing no position cost on the trajectory before time $T$, then this inverse Riccati equation becomes a linear ODE which can be solved explicitly. % add explicit solution here These relationships are used in the derivation of the controllability Grammian, but here we use them to design a feedback law.

Lyapunov Analysis

Optimal control provides a powerful framework for formulating control problems using the language of optimization. But solving optimal control problems for nonlinear systems is hard! In many cases, we don't really care about finding the optimal controller, but would be satisfied with any controller that is guaranteed to accomplish the specified task. In many cases, we still formulate these problems using computational tools from optimization, and in this chapter we'll learn about tools that can provide guaranteed control solutions for systems that are beyond the complexity for which we can find the optimal feedback.

There are many excellent books on Lyapunov analysis; for instance Slotine90 is an excellent and very readable reference and Khalil01 can provide a rigorous treatment. In this chapter I will summarize (without proof) some of the key theorems from Lyapunov analysis, but then will also introduce a number of numerical algorithms... many of which are new enough that they have not yet appeared in any mainstream textbooks.

Lyapunov Functions

Let's start with our favorite simple example.

Stability of the Damped Pendulum

Recall that the equations of motion of the damped simple pendulum are given by \[ ml^2 \ddot{\theta} + mgl\sin\theta = -b\dot{\theta}, \] which I've written with the damping on the right-hand side to remind us that it is an external torque that we've modeled.

These equations represent a simple second-order differential equation; in chapter 2 we discussed at some length what was known about the solutions to this differential equation--even without damping the equation for $\theta(t)$ as a function of $\theta(0)$, $\dot{\theta}(0)$ involves elliptic integrals of the first kind, and with damping we don't even have that. Since we couldn't provide a solution analytically, in chapter 2 we resorted to a graphical analysis, and confirmed the intuition that there are fixed points in the system (at $\theta = k\pi$ for every integer $k$) and that the fixed points at $\theta = 2\pi k$ are asymptotically stable with a large basin of attraction. The graphical analysis gave us this intuition, but can we actually prove this stability property? In a way that might also work for much more complicated systems?

One route forward was from looking at the total system energy (kinetic + potential), which we can write down: \[ E(\theta,\dot{\theta}) = \frac{1}{2} ml^2\dot{\theta}^2 - mgl \cos\theta. \] Recall that the contours of this energy function are the orbits of the undamped pendulum.

A natural route to proving the stability of the downward fixed points is by arguing that energy decreases for the damped pendulum (with $b>0$) and so the system will eventually come to rest at the minimum energy, $E = -mgl$, which happens at $\theta=2\pi k$. Let's make that argument slightly more precise.

Evaluating the time derivative of the energy reveals \[ \frac{d}{dt} E = - b\dot\theta^2 \le 0. \] This is sufficient to demonstrate that the energy will never increase, but it doesn't actually prove that the energy will converge to the minimum. To take that extra step, we must observe that for most of the states with $\dot{E} = 0$, which for $b>0$ is the manifold where $\dot\theta=0$, the system can't actually remain in that state, but will rather pass through and enter another region where it continues to lose energy. In fact, the fixed points are the only subset of this $\dot{E}=0$ manifold where the system can stay on the manifold. Therefore, unless the system is at a fixed point, it will continue to dissipate energy. And therefore we can conclude that as $t\rightarrow \infty$, the system will indeed come to rest at a fixed point (though it could be any fixed point with an energy less than or equal to the initial energy in the system, $E(0)$).

This is an important example. It demonstrated that we could use a relatively simple function, the system energy, to describe something about the long-term dynamics of the pendulum even though the actual trajectories of the system are (analytically) very complex. It also demonstrated one of the subtleties of using an energy-like function that is non-increasing (instead of strictly decreasing) to prove asymptotic stability.

Lyapunov functions generalize this notion of an energy function to more general systems, which might not be stable in the sense of some mechanical energy. If I can find any positive function, call it $V(\bx)$, that gets smaller over time as the system evolves, then I can potentially use $V$ to make a statement about the long-term behavior of the system. $V$ is called a Lyapunov function.

Recall that we defined three separate notions for stability of a fixed-point of a nonlinear system: stability i.s.L., asymptotic stability, and exponential stability. We can use Lyapunov functions to demonstrate each of these, in turn.

Lyapunov's Direct Method

Given a system $\dot{\bx} = f(\bx)$, with $f$ continuous, and for some region ${\cal B}$ around the origin (specifically an open subset of $\mathbf{R}^n$ containing the origin), if I can produce a scalar, continuously-differentiable function $V(\bx)$, such that \begin{gather*} V(\bx) > 0, \forall \bx \in {\cal B} \ne 0 \quad V(0) = 0, \text{ and} \\ \dot{V}(\bx) = \pd{V}{\bx} f(\bx) \le 0, \forall \bx \in {\cal B} \ne 0 \quad \dot{V}(0) = 0, \end{gather*} then the origin $(\bx = 0)$ is stable in the sense of Lyapunov (i.s.L.).

If, additionally, we have $$\dot{V}(\bx) = \pd{V}{\bx} f(\bx) < 0, \forall \bx \in {\cal B} \ne 0 \quad \dot{V}(0) = 0,$$ then the origin is (locally) asymptotically stable. And if we have $$\dot{V}(\bx) = \pd{V}{\bx} f(\bx) \le -\alpha V(x), \forall \bx \in {\cal B} \ne 0 \quad \dot{V}(0) = 0,$$ for some $\alpha>0$, then the origin is (locally) exponentially stable.

Note that for the sequel we will use the notation $V \succ 0$ to denote a positive-definite function, meaning that $V(0)=0$ and $V(\bx)>0$ for all $\bx\ne0$ (and also $V \succeq 0$ for positive semi-definite, $V \prec 0$ for negative-definite functions).

The intuition here is exactly the same as for the energy argument we made in the pendulum example: since $\dot{V}(x)$ is always zero or negative, the value of $V(x)$ will only get smaller (or stay the same) as time progresses. Inside the subset ${\cal B}$, for every $\epsilon$-ball, I can choose a $\delta$ such that $|x(0)|^2 < \delta \Rightarrow |x(t)|^2 < \epsilon, \forall t$ by choosing $\delta$ sufficiently small so that the sublevel-set of $V(x)$ for the largest value that $V(x)$ takes in the $\delta$ ball is completely contained in the $\epsilon$ ball. Since the value of $V$ can only get smaller (or stay constant) in time, this gives stability i.s.L.. If $\dot{V}$ is strictly negative away from the origin, then it must eventually get to the origin (asymptotic stability). The exponential condition is implied by the fact that $\forall t>0, V(\bx(t)) \le V(\bx(0)) e^{-\alpha t}$.

Notice that the system analyzed above, $\dot{\bx}=f(\bx)$, did not have any control inputs. Therefore, Lyapunov analysis is used to study either the passive dynamics of a system or the dynamics of a closed-loop system (system + control in feedback). We will see generalizations of the Lyapunov functions to input-output systems later in the text.

Global Stability

The notion of a fixed point being stable i.s.L. is inherently a local notion of stability (defined with $\epsilon$- and $\delta$- balls around the origin), but the notions of asymptotic and exponential stability can be applied globally. The Lyapunov theorems work for this case, too, with only minor modification.

Lyapunov analysis for global stability

Given a system $\dot{\bx} = f(\bx)$, with $f$ continuous, if I can produce a scalar, continuously-differentiable function $V(\bx)$, such that \begin{gather*} V(\bx) \succ 0, \\ \dot{V}(\bx) = \pd{V}{\bx} f(\bx) \prec 0, \text{ and} \\ V(\bx) \rightarrow \infty \text{ whenever } ||x||\rightarrow \infty,\end{gather*} then the origin $(\bx = 0)$ is globally asymptotically stable (G.A.S.).

If additionally we have that $$\dot{V}(\bx) \preceq -\alpha V(\bx),$$ for some $\alpha>0$, then the origin is globally exponentially stable.

The new condition, on the behavior as $||\bx|| \rightarrow \infty$ is known as "radially unbounded", and is required to make sure that trajectories cannot diverge to infinity even as $V$ decreases; it is only required for global stability analysis.

LaSalle's Invariance Principle

Perhaps you noticed the disconnect between the statement above and the argument that we made for the stability of the pendulum. In the pendulum example, using the mechanical energy resulted in a Lyapunov function that was only negative semi-definite, but we eventually argued that the fixed points were asymptotically stable. That took a little extra work, involving an argument about the fact that the fixed points were the only place that the system could stay with $\dot{E}=0$; every other state with $\dot{E}=0$ was only transient. We can formalize this idea for the more general Lyapunov function statements--it is known as LaSalle's Theorem.

LaSalle's Theorem

Given a system $\dot{\bx} = f(\bx)$ with $f$ continuous. If we can produce a scalar function $V(\bx)$ with continuous derivatives for which over an open-subset ${\cal B}\in \mathbf{R}^n$ we have $$V(\bx) \succ 0,\quad \dot{V}(\bx) \preceq 0,$$ and $V(\bx)\rightarrow \infty$ as $||\bx||\rightarrow \infty$, then $\bx$ will converge to the largest invariant set where $\dot{V}(\bx) = 0$.

To be clear, an invariant set, ${\cal G}$, of the dynamical system is a set for which $\bx(0)\in{\cal G} \Rightarrow \forall t>0, \bx(t) \in {\cal G}$. In other words, once you enter the set you never leave. The "largest invariant set" need not be connected; in fact for the pendulum example each fixed point is an invariant set, so the largest invariant set is the union of all the fixed points of the system.

Finding a Lyapunov function which $\dot{V} \prec 0$ is more difficult than finding one that has $\dot{V} \preceq 0$. LaSalle's theorem gives us the ability to make a statement about asymptotic stability even in this case. In the pendulum example, every state with $\dot\theta=0$ had $\dot{E}=0$, but only the fixed points are in the largest invariant set.

Relationship to the Hamilton-Jacobi-Bellman equations

At this point, you might be wondering if there is any relationship between Lyapunov functions and the cost-to-go functions that we discussed in the context of dynamic programming. After all, the cost-to-go functions also captured a great deal about the long-term dynamics of the system in a scalar function. We can see the connection if we re-examine the HJB equation \[ 0 = \min_\bu \left[ g(\bx,\bu) + \pd{J^*}{\bx}f(\bx,\bu). \right] \]Let's imagine that we can solve for the optimizing $\bu^*(\bx)$, then we are left with $ 0 = g(\bx,\bu^*) + \pd{J^*}{\bx}f(\bx,\bu^*) $ or simply \[ \dot{J}^* = -g(\bx,\bu^*). \] In other words, in optimal control we must find a cost-to-go function which matches this gradient for every $\bx$; that's very difficult and involves solving a potentially high-dimensional partial differential equation. By contrast, Lyapunov analysis is asking for much less - any function which is going downhill (at any rate) for all states. This can be much easier! Also note that if we do manage to find the optimal cost-to-go, $J^*(\bx)$, then it can also serve as a Lyapunov function so long as $\forall \bx, g(\bx,\bu^*(\bx))>0$.

Lyapunov analysis with convex optimization

One of the primary limitations in Lyapunov analysis as I have presented it so far is that it is potentially very difficult to come up with suitable Lyapunov function candidates for interesting systems, especially for underactuated systems. ("Underactuated" is almost synonymous with "interesting" in my vocabulary.) Even if somebody were to give me a Lyapunov candidate for a general nonlinear system, the Lyapunov conditions can be difficult to check -- for instance, how would I check that $\dot{V}$ is strictly negative for all $\bx$ except the origin if $\dot{V}$ is some arbitrarily complicated nonlinear function over a vector $\bx$?

In this section, we'll look at some computational approaches to verifying the Lyapunov conditions, and even to searching for the description of the Lyapunov functions themselves.

If you're imagining numerical algorithms to check the Lyapunov conditions for complicated Lyapunov functions and complicated dynamics, the first thought is probably that we can evaluate $V$ and $\dot{V}$ at a large number of sample points and check whether $V$ is positive and $\dot{V}$ is negative. That does work, and could potentially be combined with some smoothness or regularity assumptions to generalize beyond the sample points. But in many cases we will be able to do better -- providing optimization algorithms that will rigorously check these conditions for all $\bx$ without dense sampling; these will also give us additional leverage in formulating the search for Lyapunov functions.

Lyapunov analysis for linear systems

Let's take a moment to see how things play out for linear systems.

Lyapunov analysis for stable linear systems

Imagine you have a linear system, $\dot\bx = {\bf A}\bx$, and can find a Lyapunov function $$V(\bx) = \bx^T {\bf P} \bx, \quad {\bf P} = {\bf P^T} \succ 0,$$ which also satisfies $$\dot{V}(\bx) = \bx^T {\bf PA} \bx + \bx^T {\bf A}^T {\bf P}\bx \prec 0.$$ Then the original is globally asymptotically stable.

Note that the radially-unbounded condition is satisfied by ${\bf P} \succ 0$, and that the derivative condition is equivalent to the matrix condition $${\bf PA} + {\bf A}^T {\bf P} \prec 0.$$

For stable linear systems the existence of a quadratic Lyapunov function is actually a necessary (as well as sufficient) condition. Furthermore, a Lyapunov function can be always be found by finding the positive-definite solution to the matrix Lyapunov equation \begin{equation}{\bf PA} + {\bf A}^T{\bf P} = - {\bf Q},\label{eq:algebraic_lyapunov} \end{equation} for any ${\bf Q}={\bf Q}^T\succ 0$.

add an example here. double integrator? re-analyze the LQR output?

This is a very powerful result - for nonlinear systems it will be potentially difficult to find a Lyapunov function, but for linear systems it is straight-forward. In fact, this results is often used to propose candidates for non-linear systems, e.g., by linearizing the equations and solving a local linear Lyapunov function which should be valid in the vicinity of a fixed point.

Lyapunov analysis as a semi-definite program (SDP)

Lyapunov analysis for linear systems has an extremely important connection to convex optimization. In particular, we could have also formulated the Lyapunov conditions for linear systems above using semi-definite programming (SDP). Semidefinite programming is a subset of convex optimization -- an extremely important class of problems for which we can produce efficient algorithms that are guaranteed find the global optima solution (up to a numerical tolerance and barring any numerical difficulties).

If you don't know much about convex optimization or want a quick refresher, please take a few minutes to read the optimization preliminaries in the appendix. The main requirement for this section is to appreciate that it is possible to formulate efficient optimization problems where the constraints include specifying that one or more matrices are positive semi-definite (PSD). These matrices must be formed from a linear combination of the decision variables. For a trivial example, the optimization $$\min_a a,\quad \subjto \begin{bmatrix} a & 0 \\ 0 & 1 \end{bmatrix} \succeq 0,$$ returns $a = 0$ (up to numerical tolerances).

The value in this is immediate for linear systems. For example, we can formulate the search for a Lyapunov function for the linear system $\dot\bx = {\bf A} \bx$ by using the parameters ${\bf p}$ to populate a symmetric matrix ${\bf P}$ and then write the SDP: \[ \find_{\bf p} \quad \subjto \quad {\bf P} \succeq 0, \quad {\bf PA} + {\bf A}^T {\bf P} \preceq 0.\] Note that you would probably never use that particular formulation, since there specialized algorithms for solving the simple Lyapunov equation which are more efficient and more numerically stable. But the SDP formulation does provide something new -- we can now easily formulate the search for a "common Lyapunov function" for uncertain linear systems.

Common Lyapunov analysis for linear systems

Suppose you have a system governed by the equations $\dot\bx = {\bf A}\bx$, where the matrix ${\bf A}$ is unknown, but its uncertain elements can be bounded. There are a number of ways to write down this uncertainty set; let us choose to write this by describing ${\bf A}$ as the affine combination of a number of known matrices, $${\bf A} = \sum_{i} \beta_i {\bf A}_i, \quad \sum_i \beta_i = 1, \quad \forall i, \beta_i > 0.$$ This is just one way to specify the uncertainty; geometrically it is describing a polygon of uncertain parameters (in the space of elements of ${\bf A}$ with each ${\bf A}_i$ as one of the vertices in the polygon.

Now we can formulate the search for a common Lyapunov function using \[ \find_{\bf p} \quad \subjto \quad {\bf P} \succeq 0, \quad \forall_i, {\bf PA}_i + {\bf A}_i^T {\bf P} \preceq 0.\] The solver will then return a matrix ${\bf P}$ which satisfies all of the constraints, or return saying "problem is infeasible". It can easily be verified that if ${\bf P}$ satisfies the Lyapunov condition at all of the vertices, then it satisfies the condition for every ${\bf A}$ in the set: \[ {\bf P}(\sum_i \beta_i {\bf A}_i) + (\sum_i \beta_i {\bf A}_i)^T {\bf P} = \sum_i \beta_i ({\bf P A}_i + {\bf A}_i^T {\bf P}) \preceq 0, \] since $\forall i$, $\beta_i > 0$. Note that, unlike the simple Lyapunov equation for a known linear system, this condition being satisfied is a sufficient but not a necessary condition -- it is possible that the set of uncertain matrices ${\bf A}$ is robustly stable, but that this stability cannot be demonstrated with a common quadratic Lyapunov function.

You can try this example for yourself in .


As always, make sure that you open up the code and take a look.

This example is very important because it establishes a connection between Lyapunov functions and (convex) optimization. But so far we've only demonstrated this connection for linear systems, where the PSD matrices provide a magical recipe for establishing the positivity of the (quadratic) functions for all $\bx$. Is there any hope of extending this type of analysis to more general nonlinear systems? Surprisingly, it turns out that there is.

Sums-of-squares optimization

It turns out that in the same way that we can use SDP to search over the positive quadratic equations, we can generalize this to search over the positive polynomial equations. To be clear, for quadratic equations we have \[ {\bf P}\succeq 0 \quad \Rightarrow \quad \bx^T {\bf P} \bx \ge 0, \quad \forall \bx. \] It turns out that we can generalize this to \[ {\bf P}\succeq 0 \quad \Rightarrow \quad {\bf m}^T(\bx) {\bf P} {\bf m}(\bx) \ge 0, \quad \forall \bx, \] where ${\bf m}(\bx)$ is a vector of polynomial equations, typically chosen as a vector of monomials (polynomials with only one term). The set of positive polynomials parameterized in this way is exactly the set of polynomials that can be written as a sum of squaresParrilo00. While it is known that not all positive polynomials can be written in this form, much is known about the gap (and papers have been written about trying to find polynomials which are uniformly positive but not sums of squares). For our purposes this gap is very small; we should remember that it exists but not worry about it too much for now.

Even better, there is quite a bit that is known about how to choose the terms in ${\bf m}(\bx)$. For example, if you give me the polynomial \[ p(x) = 2 - 4x + 5x^2 \] and ask if it is positive for all real $x$, I can convince you that it is by producing the sums-of-squares factorization \[ p(x) = 1 + (1-2x)^2 + x^2, \] and I know that I can formulate the problem without needing any monomials of degree greater than 1 (the square-root of the degree of $p$) in my monomial vector. In practice what this means for us is that people have authored optimization front-ends which take a high-level specification with constraints on the positivity of polynomials and they automatically generate the SDP problem for you, without having to think about what terms should appear in ${\bf m}(\bx)$ (c.f. Prajna04). This class of optimization problems is called Sums-of-Squares (SOS) optimization.

We will write optimizations using sums-of-squares constraints as \[ p(\bx) \sos \] as shorthand for the constraint that $p(\bx) \ge 0$ for all $\bx$, as demonstrated by finding a sums-of-squares decomposition.

Lyapunov analysis for polynomial systems

Now we can see that it may be possible to generalize the optimization approach using SDP to search for Lyapunov functions for linear systems to searching for Lyapunov functions for at least the polynomial systems: $\dot\bx = f(\bx),$ where $f$ is a vector-valued polynomial function. If we parameterize a fixed-degree Lyapunov candidate as a polynomial with unknown coefficients, e.g., \[ V_\alpha(\bx) = \alpha_0 + \alpha_1 x_1 + \alpha_2 x_2 + \alpha_3 x_1x_2 + \alpha_4 x_1^2 + ..., \] then the time-derivative of $V$ is also a polynomial, and I can formulate the optimization: \begin{align*} \find_\alpha, \quad \subjto \quad& V_\alpha(\bx) \sos \\ & -\dot{V}_\alpha(\bx) = -\pd{V_\alpha}{\bx} f(\bx) \sos. \end{align*} Because this is a convex optimization, the solver will return a solution if one exists.

Global stability of the simple pendulum via SOS

We opened this chapter using our intuition about energy to discuss stability on the simple pendulum. Now we'll replace that intuition with convex optimization (because it will also work for more difficult systems where our intuition fails).

But wait! The equations of motion of the simple pendulum are not polynomial, are they (they have a $\sin(\theta)$ in them)? Actually nearly all rigid-body systems can be written as polynomials given a change of coordinates. Indeed, the rigid-body-assumption is simply an assumption that points on the same body stay a constant distance apart, and Euclidean distance is certainly a polynomial. For a more thorough discussion see, for instance, Wampler11,Sommese05.

For the purposes of this example, let's change coordinates from $[\theta,\dot\theta]^T$ to $\bx = [s,c,\dot\theta]^t$, where $s \equiv \sin\theta$ and $c \equiv \cos\theta$. Then we can write the pendulum dynamics as $$\dot\bx = \begin{bmatrix} c \dot\theta \\ -s \dot\theta \\ -\frac{1}{m l^2} \left( b \dot\theta + mgls \right) \end{bmatrix}.$$

Now let's parameterize a Lyapunov candidate $V(s,c,\dot\theta)$ as the polynomial with unknown coefficients which contains all monomials up to degree 2: $$V = \alpha_1 + \alpha_2 s + \alpha_3 c + ... \alpha_{10} s^2 + \alpha_{11} sc + \alpha_{12} s\dot\theta.$$ Now we'll formulate the feasibility problem: \[ \find_{\bf \alpha} \quad \subjto \quad V \sos, \quad -\dot{V} \sos.\] In fact, this is asking too much -- really $\dot{V}$ only needs to be negative when $s^2+c^2=1$. We can accomplish this with the so-called S-procedure, which we will discuss more below, and instead write \[ \find_{{\bf \alpha},\lambda} \quad \subjto \quad V \sos, \quad -\dot{V} -\lambda(\bx)(s^2+c^2-1) \sos.\] (Here $\lambda(\bx)$ is another polynomial with free coefficients which the optimization can use to make terms arbitrarily more positive when $s^2+c^2 \neq 1$.) Finally, for style points, in the code example in we ask for exponential stability:


As always, make sure that you open up the code and take a look. This example uses our front-end "spotless" ("spot" was "systems polynomial optimization toolbox" written by Prof. Alex Megretski, and this is just a slightly leaner implementation) to construct the SDP from the description as a SOS program. The result is a Lyapunov function that looks familiar (visualized as a countour plot here):

Aha! Not only does the optimization finds us coefficients for the Lyapunov function which satisfy our Lyapunov conditions, but the result looks a lot like mechanical energy. In fact, the result is a little better than energy... there are some small extra terms added which prove exponential stability without having to invoke LaSalle's Theorem.

It is important to remember that there are a handful of gaps which make the existence of this solution a sufficient condition (for proving that every sub-level set of $V$ is an invariant set of $f$) instead of a necessary one. First, there is no guarantee that a stable polynomial system can be verified using a polynomial Lyapunov function (of any degree, and in fact there are known counter-examples add ref) and here we are only searching over a fixed-degree polynomial. Second, even if a polynomial Lyapunov function does exist, there is a gap between the SOS polynomials and the positive polynomials.

Despite these caveats, I have found this formulation to be surprisingly effective in practice. Intuitively, I think that this is because there is relatively a lot of flexibility in the Lyapunov conditions -- if you can find one function which is a Lyapunov function for the system, then there are also many "nearby" functions which will satisfy the same constraints.

Lyapunov functions for estimating regions of attraction

There is another very important connection between Lyapunov functions and the concept of an invariant set: any sub-level set of a Lyapunov function is also an invariant set. This gives us the ability to use sub-level sets of a Lyapunov function as approximations of the region of attraction for nonlinear systems.

Lyapunov invariant set and region of attraction theorem

Given a system $\dot{\bx} = f(\bx)$ with $f$ continuous, if we can find a scalar function $V(\bx) \succ 0$ and a sub-level set $${\cal G}: \{ \bx | V(\bx) < \rho \}$$ on which $$\forall \bx \in {\cal G}, \dot{V}(\bx) \preceq 0,$$ then ${\cal G}$ is an invariant set. By LaSalle, $\bx$ will converge to the largest invariant subset of ${\cal G}$ on which $\dot{V}=0$.

Furthermore, if $\dot{V}(\bx) \prec 0$ in ${\cal G}$, then the origin is locally asymptotically stable and the set ${\cal G}$ is inside the region of attraction of this fixed point. Alternatively, if $\dot{V}(\bx) \preceq 0$ in ${\cal G}$ and $\bx = 0$ is the only invariant subset of ${\cal G}$ where $\dot{V}=0$, then the origin is asymptotically stable and the set ${\cal G}$ is inside the region of attraction of this fixed point.

Region of attraction for a one-dimensional system

Consider the first-order, one-dimensional system $\dot{x} = -x + x^3.$ We can quickly understand this system using our tools for graphical analysis.

In the vicinity of the origin, $\dot{x}$ looks like $-x$, and as we move away it looks increasingly like $x^3$. There is a stable fixed point at the origin and unstable fixed points at $\pm 1$. In fact, we can deduce visually that the region of attraction to the stable fixed point at the origin is $\bx \in (-1,1)$. Let's see if we can demonstrate this with a Lyapunov argument.

First, let us linearize the dynamics about the origin and use the Lyapunov equation for linear systems to find a candidate $V(\bx)$. Since the linearization is $\dot{x} = -x$, if we take ${\bf Q}=1$, then we find ${\bf P}=\frac{1}{2}$ is the positive definite solution to the algebraic Lyapunov equation (\ref{eq:algebraic_lyapunov}). Proceeding with $$V(\bx) = \frac{1}{2} x^2,$$ we have $$\dot{V} = x (-x + x^3) = -x^2 + x^4.$$ This function is zero at the origin, negative for $|x| < 1$, and positive for $|x| > 1$. Therefore we can conclude that the sub-level set $V < \frac{1}{2}$ is invariant and the set $x \in (-1,1)$ is inside the region of attraction of the nonlinear system. In fact, this estimate is tight.

Robustness analysis using "common Lyapunov functions"

While we will defer most discussions on robustness analysis until later in the notes, there is a simple and powerful idea that can be presented now: the idea of a common Lyapunov function. Imagine that you have a model of a dynamical system but that you are uncertain about some of the parameters. For example, you have a model of a quadrotor, and are fairly confident about the mass and lengths (both of which are easy to measure), but are not confident about the moment of inertia. One approach to robustness analysis is to define a bounded uncertainty, which could take the form of $$\dot{\bx} = f_\alpha(\bx), \quad \alpha_{min} \le \alpha \le \alpha_{max},$$ with $\alpha$ a vector of uncertain parameters in your model. Richer specifications of the uncertainty bounds are also possible, but this will serve for our examples.

In standard Lyapunov analysis, we are searching for a function that goes downhill for all $\bx$ to make statements about the long-term dynamics of the system. In common Lyapunov analysis, we can make many similar statements about the long-term dynamics of an uncertain system if we can find a single Lyapunov function that goes downhill for all possible values of $\alpha$. If we can find such a function, then we can use it to make statements with all of the variations we've discussed (local, regional, or global; in the sense of Lyapunov, asymptotic, or exponential).

A one-dimensional system with gain uncertainty

Let's consider the same one-dimensional example used above, but add an uncertain parameter into the dynamics. In particular, consider the system: $$\bx = -x + \alpha x^3, \quad \frac{3}{4} < \alpha < \frac{3}{2}.$$ Plotting the graph of the one-dimensional dynamics for a few values of $\alpha$, we can see that the fixed point at the origin is still stable, but the robust region of attraction to this fixed point (shaded in blue below) is smaller than the region of attraction for the system with $\alpha=1$.

Taking the same Lyapunov candidate as above, $V = \frac{1}{2} x^2$, we have $$\dot{V} = -x^2 + \alpha x^4.$$ This function is zero at the origin, and negative for all $\alpha$ whenever $x^2 > \alpha x^4$, or $$|x| < \frac{1}{\sqrt{\alpha_{max}}} = \sqrt{\frac{2}{3}}.$$ Therefore, we can conclude that $|x| < \sqrt{\frac{2}{3}}$ is inside the robust region of attraction of the uncertain system.

Not all forms of uncertainty are as simple to deal with as the gain uncertainty in that example. For many forms of uncertainty, we might not even know the location of the fixed points of the uncertain system. In this case, we can often still use common Lyapunov functions to give some guarantees about the system, such as guarantees of robust set invariance. For instance, if you have uncertain parameters on a quadrotor model, you might be ok with the quadrotor stabilizing to a pitch of $0.01$ radians, but you would like to guarantee that it definitely does not flip over and crash into the ground.

A one-dimensional system with additive uncertainty

Now consider the system: $$\bx = -x + x^3 + \alpha, \quad -\frac{1}{4} < \alpha < \frac{1}{4}.$$ Plotting the graph of the one-dimensional dynamics for a few values of $\alpha$, this time we can see that the fixed point is not necessarily at the origin; the location of the fixed point moves depending on the value of $\alpha$. But we should be able to guarantee that the uncertain system will stay near the origin if it starts near the origin, using an invariant set argument.

Taking the same Lyapunov candidate as above, $V = \frac{1}{2} x^2$, we have $$\dot{V} = -x^2 + x^4 + \alpha x.$$ This Lyapunov function allows us to easily verify, for instance, that $V \le \frac{1}{3}$ is a robust invariant set, because whenever $V = \frac{1}{3}$, we have $$\forall \alpha \in [\alpha_{min},\alpha_{max}],\quad \dot{V}(x,\alpha) < 0.$$ Therefore $V$ can never start at less than one-third and cross over to greater than one-third. To see this, see that $$ V=\frac{1}{3} \quad \Rightarrow \quad x = \pm \sqrt{\frac{2}{3}} \quad \Rightarrow \quad \dot{V} = -\frac{2}{9} \pm \alpha \sqrt{\frac{2}{3}} < 0, \forall \alpha \in \left[-\frac{1}{4},\frac{1}{4} \right]. $$ Note that not all sub-level sets of this invariant set are invariant. For instance $V < \frac{1}{32}$ does not satisfy this condition, and by visual inspection we can see that it is in fact not robustly invariant.

robust quadrotor example

Region of attraction estimation for polynomial systems

Now we have arrived at the tool that I believe can be a work-horse for many serious robotics applications. Most of our robots are not actually globally stable (that's not because they are robots -- if you push me hard enough, I will fall down, too), which means that understanding the regions where a particular controller can be guaranteed to work can be of critical importance.

Sums-of-squares optimization effectively gives us an oracle which we can ask: is this polynomial positive for all $\bx$? To use this for regional analysis, we have to figure out how to modify our questions to the oracle so that the oracle will say "yes" or "no" when we ask if a function is positive over a certain region which is a subset of $\Re^n$. That trick is called the S-procedure. It is closely related to the Lagrange multipliers from constrained optimization, and has deep connections to "Positivstellensatz" from algebraic geometry.

The S-procedure

Consider a scalar polynomial, $p(\bx)$, and a semi-algebraic set $g(\bx) \preceq 0$, where $g$ is a vector of polynomials. If I can find a polynomial "multiplier", $\lambda(\bx)$ such that \[ p(\bx) + \lambda^T(\bx) g(\bx) \sos, \quad \text{and} \quad \lambda(\bx) \sos, \] then this is sufficient to demonstrate that $$p(\bx)\ge 0 \quad \forall \bx \in \{ g(\bx) \le 0 \}.$$ To convince yourself, observe that when $g(\bx) \le 0$, it is only harder to be positive but when $g(\bx) > 0$, it is possible for the combined function to be SOS even if $p(\bx)$ is negative.

Basic region of attraction formulation

Region of attraction for the one-dimensional cubic system

Let's return to our example from above: \[ \dot{x} = -x + x^3 \] and try to use SOS optimization to demonstrate that the region of attraction of the fixed point at the origin is $x \in (-1,1)$, using the Lyapunov candidate $V = x^2.$

First, define the multiplier polynomial, \[ \lambda(x) = c_1 + c_2 x + c_3 x^2 + c_4 x^3 + c_5 x^4. \] Then define the optimization \begin{align*} \find_{\bf c} \quad & \\ \subjto \quad& - \dot{V}(x) - \lambda(x) (1-V(x)) \sos \\ & \lambda(x) \sos \end{align*}

Try it yourself:


To see the code that sets up the optimization, try:


While the example above only verifies that the one-sub-level set of the pre-specified Lyapunov candidate is negative (certifying the ROA that we already understood), we can generalize the optimization to allow us to search for the largest sub-level set (with the objective using a convex approximation for volume). We can even search for the coefficients of the Lyapunov function using an iteration of convex optimizations. There are a number of variations and nuances in the various formulations, and some basic rescaling tricks that can help make the numerics of the problem better for the solvers. In , we have packaged all of these up into a regionOfAttraction method that is defined for objects of type PolynomialSystem. This makes it as simple as, for instance:

Searching for the region of attraction


      % create a new CubicPolynomialExample object
      p = CubicPolynomialExample();

      % compute region of attraction
      % the levelset V<1 is the region of attraction

      % display the polynomial representation of V that results
which yields the console output:

      [  (1)*x1^2  ]
The algorithm successfully found that $x \in (-1,1)$ is the region of attraction.

Remember that although we have tried to make it convenient to call these functions, they are not a black box. I highly recommend opening up the regionOfAttraction method and understanding how it works.

Region of attraction for the time-reversed Van der Pol oscillator


Seeding the optimization with linear analysis

LQR gives the cost-to-go which can be used as the Lyapunov candidate. Otherwise, use a Lyapunov equation. You may not even need to search for a better Lyapunov function, but rather just ask the question: what is the largest region of attraction that can be demonstrated for the nonlinear system using the Lyapunov function from linear analysis?

Rigid-body dynamics are polynomial

Coming soon... For a nice explanation of how rigid-body kinematics are polynomial, see Wampler11. Looking at the Lagrangian, you can see that if the kinematics are polynomial, then the dynamics will be, too.

In practice, you can also Taylor approximate any smooth nonlinear function using polynomials. This can be an effective strategy in practice, because you can limit the degrees of the polynomial, and because in many cases it is possible to provide conservative bounds on the errors due to the approximation.

implicit dynamics formulations. convex ROA w/ outer approx. control design.

Trajectory Optimization

I've argued that optimal control is a powerful framework for specifying complex behaviors with simple objective functions, letting the dynamics and constraints on the system shape the resulting feedback controller (and vice versa!). But the computational tools that we've provided so far have been limited in some important ways. The numerical approaches to dynamic programming which involve putting a mesh over the state space do not scale well to systems with state dimension more than four or five. Linearization around a nominal operating point (or trajectory) allowed us to solve for locally optimal control policies (e.g. using LQR) for even very high-dimensional systems, but the effectiveness of the resulting controllers is limited to the region of state space where the linearization is a good approximation of the nonlinear dynamics. The computational tools for Lyapunov analysis from the last chapter can provide, among other things, an effective way to compute estimates of those regions. But we have not yet provided any real computational tools for approximate optimal control that work for high-dimensional systems beyond the linearization around a goal. That is precisely the goal for this chapter.

The big change that is going to allow us to scale to high-dimensional systems is that we are going to give up the goal of solving for the optimal feedback controller for the entire state space, and instead attempt to find an optimal control solution that is valid from only a single initial condition. Instead of representing this as a feedback control function, we can represent this solution as a trajectory, $\bx(\cdot), \bu(\cdot)$, typically defined over a finite interval. In our graph-search dynamic programming algorithms, we discretized the dynamics of the system on a mesh spread across the state space. This does not scale to high-dimensional systems, and it was difficult to bound the errors due to the discretization. If we instead restrict ourselves to optimizing only a single initial condition, then a different discretization scheme emerges: we can discretize the state and input trajectories over time.

Problem Formulation

Given an initial condition, $\bx_0$, and an input trajectory $u(t)$ defined over a finite interval, $t\in[t_0,t_f]$, we can compute the long-term (finite-horizon) cost of executing that trajectory using the standard additive-cost optimal control objective, \[ J_{\bu(\cdot)}(\bx_0) = \int_{t_0}^{t_f} g(\bx(t),\bu(t)) dt. \] We will write the trajectory optimization problem as \begin{align*} \minimize_{u(\cdot)} \quad & \int_{t_0}^{t_f} g(\bx(t),\bu(t)) dt \\ \subjto \quad & \forall t, \dot{\bx}(t) = f(\bx(t),\bu(t)), \\ & \bx(t_0) = \bx_0. \\ \end{align*} Some trajectory optimization problems may also include additional constraints, such as collision avoidance ($\bx$ can not cause the robot to be inside an obstacle) or input limits (e.g. $\bu_{min} \le \bu \le \bu_{max}$ ), which can be defined for all time or some subset of the trajectory.

As written, the optimization above is an optimization over continuous trajectories. In order to formulate this as a numerical optimization, we must parameterize it with a finite set of numbers. Perhaps not surprisingly, there are many different ways to write down this parameterization, with a variety of different properties in terms of speed, robustness, and accuracy of the results. We will outline just a few of the most popular below. I would recommend Betts98+Betts01 for additional details.

Computational Tools for Nonlinear Optimization

Before we dive in, we need to take a moment to understand the optimization tools that we will be using. In the graph-search dynamic programming algorithm, we were magically able to provide an iterative algorithm that was known to converge to the optimal cost-to-go function. With LQR we were able to reduce the problem to a matrix Riccati equation, for which we have scalable numerical methods to solve. In the Lyapunov analysis chapter, we were able to formulate a very specific kind of optimization problem--a semi-definite program (or SDP)--which is a subset of convex optimization, and relied on custom solvers like SeDuMi or Mosek to solve the problems for us. Convex optimization is a hugely important subset of nonlinear optimization, in which we can guarantee that the optimization has no "local minima". In this chapter we won't be so lucky; the optimizations that we formulate may have local minima and the solution techniques will at best only guarantee that they give a locally optimal solution.

The generic formulation of a nonlinear optimization problem is \[ \minimize_z c(z) \quad \subjto \quad {\bf \phi}(z) \le 0, \] where $z$ is a vector of decision variables, $c$ is a scalar objective function and $\phi$ is a vector of constraints. Note that, although we write $\phi \le 0$, this formulation captures positivity constraints on the decision variables (simply multiply the constraint by $-1$) and equality constraints (simply list both $\phi\le0$ and $-\phi\le0$) as well.

The picture that you should have in your head is a nonlinear, potentially non-convex objective function defined over (multi-dimensional) $z$, with a subset of possible $z$ values satisfying the constraints.

One-dimensional cartoon of a nonlinear optimization problem. The red dots represent local minima. The blue dot represents the optimal solution.

Note that minima can be the result of the objective function having zero derivative or due to a sloped objective up against a constraint.

Numerical methods for solving these optimization problems require an initial guess, $\hat{z}$, and proceed by trying to move down the objective function to a minima. Common approaches include gradient descent, in which the gradient of the objective function is computed or estimated, or second-order methods such as sequential quadratic programming (SQP) which attempts to make a local quadratic approximation of the objective function and local linear approximations of the constraints and solves a quadratic program on each iteration to jump directly to the minimum of the local approximation.

While not strictly required, these algorithms can often benefit a great deal from having the gradients of the objective and constraints computed explicitly; the alternative is to obtain them from numerical differentiation. Beyond pure speed considerations, I strongly prefer to compute the gradients explicitly because it can help avoid numerical accuracy issues that can creep in with finite difference methods. The desire to calculate these gradients will be a major theme in the discussion below, and we have gone to great lengths to provide explicit gradients of our provided functions and automatic differentiation of user-provided functions in .

When I started out, I was of the opinion that there is nothing difficult about implementing gradient descent or even a second-order method, and I wrote all of the solvers myself. I now realize that I was wrong. The commercial solvers available for nonlinear programming are substantially higher performance than anything I wrote myself, with a number of tricks, subtleties, and parameter choices that can make a huge difference in practice. Some of these solvers can exploit sparsity in the problem (e.g., if the constraints operate in a sparse way on the decision variables). Nowadays, we make heaviest use of SNOPT Gill06, which now comes bundled with the precompiled distributions of , but also support fmincon from the Optimization Toolbox in MATLAB. Note that while I do advocate using these tools, you do not need to use them as a black box. In many cases you can improve the optimization performance by understanding and selecting non-default configuration parameters.

Trajectory optimization as a nonlinear program

Direct Transcription

Let us start by representing the finite-time trajectories, $\bx(t)$ and $\bu(t)$ $\forall t\in[t_0,t_f]$, by their values at a series of break points, $t_0,t_1,t_2,t_3$, and denoting the values at those points (aka the knot points) $\bx_0,...,\bx_N$ and $\bu_0,...,\bu_N$, respectively.

Then perhaps the simplest mapping of the trajectory optimization problem onto a nonlinear program is to fix the break points at even intervals, $dt$, and use Euler integration to write \begin{align*} \minimize_{\bx_1,...,\bx_N,\bu_0,...,\bu_{N-1}} & \sum_{n=0}^{N-1} g(\bx_n,\bu_n)dt \\ & \bx_{n+1} = \bx_n + f(\bx_n,\bu_n)dt, \quad \forall n\in [0,N-1]. \end{align*} Note that the decision variables here are $(\bx_1,...,\bx_N,\bu_0,...,\bu_{N-1})$, because $\bx_0$ is given, and $\bu_{N}$ does not appear in the cost nor any of the constraints. It is easy to generalize this approach to add additional costs or constraints on $\bx$ and/or $\bu$. (Note that this formulation does not actually benefit from the additive cost structure, so more general cost formulations are also possible.) Computing the gradients of the objective and constraints is essentially as simple as computing the gradients of $g()$ and $f()$.

Direct Transcription for the Double Integrator

We have implemented an optimization class hierarchy in which makes it easy to try out these algorithms. Watching the way that they perform on our simple problems is a very nice way to gain intuition. Here is some simple code to solve the (time-discretized) minimum-time problem for the double integrator.

% note: requires Drake ver >= 0.9.7

Make sure you take a look at the code:


Nothing compares to running it yourself, and poking around in the code. But you can also click here to watch the result. I hope that you recognize the parabolic trajectory from the initial condition up to the switching surface, and then the second parabolic trajectory down to the origin. You should also notice that the transition between $u=1$ and $u=-1$ is imperfect, due to the discretization error. As an exercise, try increasing the number of knot points (the variable N in the code) to see if you can get a sharper response, like this.

If you take a moment to think about what the direct transcription algorithm is doing, you will see that by satisfying the dynamic constraints, the optimization is effectively solving the (Euler approximation of the) differential equation. But instead of marching forward through time, it is minimizing the inconsistency at each of the knot points simultaneously. While it's easy enough to generalize the constraints to use higher-order integration schemes, paying attention to the trade-off between the number of times the constraint must be evaluated vs the density of the knot points in time, if the goal is to obtain smooth $\bx$ trajectory solutions then another approach quickly emerges: the approach taken by the so-called collocation methods.

Direct Collocation

In direct collocation (c.f., Hargraves87), both the input trajectory and the state trajectory are represented explicitly as piecewise polynomial functions. In particular, the sweet spot for this algorithm is taking $\bu(t)$ to be a first-order polynomial -- allowing it to be completely defined by the values at the knot points -- and $\bx(t)$ to be a Hermite cubic polynomial -- completely defined by the values and derivatives at the knot points. The state derivatives at the knot points are given as a function of $\bx$ and $\bu$ by the plant dynamics, so the entire spline is completely defined by the values of $\bx$ and $\bu$ at the knot points. To add the additional constraint that $\bx_{n+1}$ is dynamically consistent with $\bx_{n}$ and $\bu_{n}$, we add an additional set of constraints that the derivative at a set of collocation points also matches the plant dynamics. For the special case of cubic polynomial state trajectories and piecewise linear input trajectories, the derivative at the midpoint of each segment is particularly easy to compute, making it the natural choice.

Cubic spline parameters used in the direct collocation method.

Once again, direct collocation effectively integrates the equations of motion by satisfying the constraints of the optimization -- this time producing a third-order approximation of the dynamics with effectively two evaluations of the plant dynamics per segment. Hargraves87 claims, without proof, that as the break points are brought closer together, the trajectory will converge to a true solution of the differential equation. Once again it is very natural to add additional terms to the cost function or additional input/state constraints, and very easy to calculate the gradients of the objective and constraints. I personally find it very nice to explicitly account for the parametric encoding of the trajectory in the solution technique.

Direct Collocation for the Double Integrator

Direct collocation also easily solves the minimum-time problem for the double integrator. The performance is similar to direct transcription, but the convergence is visibly different. Try it for yourself:

% note: requires Drake ver >= 0.9.7

Make sure you take a look at the code:


Shooting Methods

In the methods described above, by asking the optimization package to perform the numerical integration of the equations of motion, we are effectively over-parameterizing the problem. In fact, the optimization is perfectly well defined if we restrict the decision variables to $\bu_0,...,\bu_{N-1}$ only--we can compute $\bx_1,...,\bx_{N}$ ourselves by knowing $\bx_0$, and $\bu(\cdot)$. This is exactly the approach taken in the shooting methods.

Computing gradients

Again, providing gradients of the objectives and constraints to the solver is not strictly required -- most solvers will obtain them from finite differences if they are not provided -- but I feel strongly that the solvers are faster and more robust when exact gradients are provided. Now that we have removed the $\bx$ decision variables from the program, we have to take a little extra care to compute the gradients. This is still easily accomplished using the chain rule. To be concise (and slightly more general), let us define $\bx[n+1]=f_d(\bx[n],\bu[n])$ as the discrete-time approximation of the continuous dynamics; for example, the forward Euler integration scheme used above would give $f_d(\bx[n],\bu[n]) = \bx[n]+f(\bx[n],\bu[n])dt.$ Then we have \[ \pd{J}{\bu_k} = \sum_{n=0}^{N-1} \left( \pd{g(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} + \pd{g(\bx[n],\bu[n])}{\bu_k} \right), \] where the gradient of the state with respect to the inputs can be computed during the "forward simulation", \[ \pd{\bx[n+1]}{\bu_k} = \pd{f_d(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} + \pd{f_d(\bx[n],\bu[n])}{\bu_k}. \] These simulation gradients can also be used in the chain rule to provide the gradients of any constraints. Note that there are a lot of terms to keep around here, on the order of (state dim) $\times$ (control dim) $\times$ (number of timesteps). Ouch. Note also that many of these terms are zero; for instance with the Euler integration scheme above $\pd{\bu[n]}{\bu_k} = 0$ if $k\ne n$. (If this looks like I'm mixing two notations here, recall that I'm using $\bu_k$ to represent the decision variable and $\bu[n]$ to represent the input used in the $n$th step of the simulation.)

The special case of optimization without state constraints

By solving for $\bx(\cdot)$ ourselves, we've removed a large number of constraints from the optimization. If no additional state constraints are present, and the only gradients we need to compute are the gradients of the objective, then a surprisingly efficient algorithm emerges. I'll give the steps here without derivation, but will derive it in the Pontryagin section below:

  1. Simulate forward: $$\bx[n+1] = f_d(\bx[n],\bu_n),$$ from $\bx[0] = \bx_0$
  2. Calculate backwards: $$\lambda[n-1] = \pd{g(\bx[n],\bu[n])}{\bx[n]}^T + \pd{f(\bx[n],\bu[n])}{\bx[n]}^T \lambda[n],$$ from $\lambda[N-1]=0$
  3. Extract the gradients: $$\pd{J}{\bu[n]} = \pd{g(\bx[n],\bu[n])}{\bu[n]} + \lambda[n]^T \pd{f(\bx[n],\bu[n])}{\bu[n]},$$ with $\pd{J}{\bu_k} = \sum_n \pd{J}{\bu[n]}\pd{\bu[n]}{\bu_k}$.

Here $\lambda[n]$ is a vector the same size as $\bx[n]$ which has an interpretation as $\lambda[n]=\pd{J}{\bx[n+1]}^T$. The equation governing $\lambda$ is known as the adjoint equation, and it represents a dramatic efficient improvement over calculating the huge number of simulation gradients described above. In case you are interested, the adjoint equation is known as the backpropagation algorithm in the neural networks literature and it is one of the primary reasons that training neural networks became so popular in the 1980's; super fast GPU implementations of this algorithm are also one of the reasons that deep learning is incredibly popular right now (the availability of massive training databases is perhaps the other main reason).

To take advantage of this efficiency, advocates of the shooting methods often use penalty methods instead of enforcing hard state constraints; instead of telling the solver about the constraint explicitly, you simply add an additional term to the cost function which gives a large penalty commensurate with the amount by which the constraint is violated. These are not quite as accurate and can be harder to tune (you'd like the cost to be high compared to other costs, but making it too high can lead to numerical conditioning issues), but they can work.


Although the decision about which algorithm is best may depend on the situation, in our work we have come to favor the direct collocation method (and occasionally direct transcription) for most of our work. There are a number of arguments for and against each approach; I will try to discuss a few of them here.

Solver performance

Numerical conditioning. Tail wagging the dog.

Sparse constraints. Potential for parallel evaluation of the constraints (computing the dynamics and their derivatives are often the most expensive part).

Providing an initial guess

to avoid local minima. direct transcription and collocation can take an initial guess in $\bx$, too.

Implicit dynamics

Another potential advantage of the direct transcription and collocation methods is that the dynamics constraints can be written in implicit form.

Variations in the problem formulation

There are number of useful variations to the problem formulations I've presented above. By far the most common is the addition of a terminal cost, e.g.: $$\min \quad h(\bx[N]) + \sum_{n=0}^{N-1} g(\bx[n],\bu[n]).$$ These terms are easily added to the cost function in the any of methods, and the adjoint equations of the shooting method simply require the a modified terminal condition $$\lambda[N-1] = \pd{h(\bx[N])}{\bx[N]}$$ when computing the gradients.

Another common modification is including the spacing of the breakpoints as additional decision variables. This is particularly easy in the direct transcription and collocation methods, and can also be worked into the shooting methods. Typically we add a lower-bound on the time-step so that they don't all vanish to zero.

Accuracy of numerical integration

One potential complaint about the direct transcription and collocation algorithms is that we tend to use simplistic numerical integration methods and often fix the integration timestep (e.g. by choosing Euler integration and selecting a $dt$); it is difficult to bound the resulting integration errors in the solution. One tantalizing possibility in the shooting methods is that the forward integration could be accomplished by more sophisticated methods, like variable-step integration. But I must say that I have not had much success with this approach, because while the numerical accuracy of any one function evaluation might be improved, these integration schemes do not necessarily give smooth outputs as you make incremental changes to the initial conditions and control (changing $\bu_2$ by $\epsilon$ could result in taking a different number of steps in the integration scheme). This lack of smoothness can wreak havoc on the convergence of the optimization. If numerical accuracy is a premium, then I think you will have more success by imposing consistency constraints (e.g. as in the Runge-Kutta 4th order simulation with 5th order error checking method) as additional constraints on the time-steps; shooting methods do not have any particular advantage here. reference DMOC

Pontryagin's Minimum Principle

The tools that we've been developing for numerical trajectory optimization are closely tied to theorems from (analytical) optimal control. Let us take one section to appreciate those connections.

What precisely does it mean for a trajectory, $\bx(\cdot),\bu(\cdot)$, to be locally optimal? It means that if I were to perturb that trajectory in any way (e.g. change $\bu_3$ by $\epsilon$), then I would either incur higher cost in my objective function or violate a constraint. For an unconstrained optimization, a necessary condition for local optimality is that the gradient of the objective at the solution be exactly zero. Of course the gradient can also vanish at local maxima or saddle points, but it certainly must vanish at local minima. We can generalize this argument to constrained optimization using Lagrange multipliers.

Constrained optimization with Lagrange multipliers

Given the equality-constrained optimization problem $$\minimize_\bz g(\bz) \quad \subjto \quad {\bf \phi}(\bz) = 0,$$ where ${\bf \phi}$ is a vector. Define a vector $\lambda$ of Lagrange multipliers, the same size as $\phi$, and the scalar Lagrangian function, $$L(\bz,{\bf \lambda}) = g(\bz) + \lambda^T\phi(\bz).$$ A necessary condition for $\bz^*$ to be an optimal value of the constrained optimization is that the gradients of $L$ vanish with respect to both $\bz$ and $\lambda$: $$\pd{L}{\bz} = 0, \quad \pd{L}{\lambda} = 0.$$ Note that $\pd{L}{\lambda} = \phi(\bz)$, so requiring this to be zero is equivalent to requiring the constraints to be satisfied.

Optimization on the unit circle

Consider the following optimization: $$\min_{x,y} x+y, \quad \subjto \quad x^2 + y^2 = 1.$$ The level sets of $x+y$ are straight lines with slope $-1$, and the constraint requires that the solution lives on the unit circle.

Simply by inspection, we can determine that the optimal solution should be $x=y=-\frac{\sqrt{2}}{2}.$ Let's make sure we can obtain the same result using Lagrange multipliers.

Formulating $$L = x+y+\lambda(x^2+y^2-1),$$ we can take the derivatives and solve \begin{align*} \pd{L}{x} = 1 + 2\lambda x = 0 \quad & \Rightarrow & \lambda = -\frac{1}{2x}, \\ \pd{L}{y} = 1 + 2\lambda y = 1 - \frac{y}{x} = 0 \quad & \Rightarrow & y = x,\\ \pd{L}{\lambda} = x^2+y^2-1 = 2x^2 -1 = 0 \quad & \Rightarrow & x = \pm \frac{1}{\sqrt{2}}. \end{align*} Given the two solutions which satisfy the necessary conditions, the negative solution is clearly the minimizer of the objective.

Lagrange multiplier derivation of the adjoint equations

Let us use Lagrange multipliers to derive the necessary conditions for our constrained trajectory optimization problem in discrete time \begin{align*} \minimize_{\bx_1,...,\bx_N,\bu_0,...,\bu_{N-1}} & \sum_{n=0}^{N-1} g(\bx[n],\bu[n]),\\ \subjto \quad & \bx[n+1] = f_d(\bx[n],\bu[n]). \end{align*} Formulate the Lagrangian, \[ L(\bx[\cdot],\bu[\cdot],\lambda[\cdot]) = \sum_{n=0}^{N-1} g(\bx[n],\bu[n]) + \sum_{n=0}^{N-1} \lambda^T[n] \left(f_d(\bx[n],\bu[n]) - \bx[n+1]\right), \] and set the derivatives to zero to obtain the adjoint equation method described for the shooting algorithm above: \begin{gather*} \forall n\in[0,N-1], \pd{L}{\lambda[n]} = f_d(\bx[n],\bu[n]) - \bx[n+1] = 0 \Rightarrow \bx[n+1] = f(\bx[n],\bu[n]) \\ \forall n\in[0,N-1], \pd{L}{\bx[n]} = \pd{g(\bx[n],\bu[n])}{\bx} + \lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bx} - \lambda^T[n-1] = 0 \\ \quad \Rightarrow \lambda[n-1] = \pd{g(\bx[n],\bu[n])}{\bx}^T + \pd{f_d(\bx[n],\bu[n])}{\bx}^T \lambda[n]. \\ \pd{L}{\bx[N]} = -\lambda[N-1] = 0 \Rightarrow \lambda[N-1] = 0 \\ \forall n\in[0,N-1], \pd{L}{\bu[n]} = \pd{g(\bx[n],\bu[n])}{\bu} + \lambda^T[n] \pd{f_d(\bx[n],\bu[n])}{\bu} = 0. \end{gather*} Therefore, if we are given an initial condition $\bx_0$ and an input trajectory $\bu[\cdot]$, we can verify that it satisfies the necessary conditions for optimality by simulating the system forward in time to solve for $\bx[\cdot]$, solving the adjoint equation backwards in time to solve for $\lambda[\cdot]$, and verifying that $\pd{L}{\bu[n]} = 0$ for all $n$. The fact that $\pd{J}{\bu} = \pd{L}{\bu}$ when $\pd{L}{\bx} = 0$ and $\pd{L}{\lambda} = 0$ follows from some basic results in the calculus of variations.

Necessary conditions for optimality in continuous time

You won't be surprised to hear that these necessary conditions have an analogue in continuous time. I'll state it here without derivation. Given the initial conditions, $\bx_0$, a continuous dynamics, $\dot\bx = f(\bx,\bu)$, and the instantaneous cost $g(\bx,\bu)$, for a trajectory $\bx(\cdot),\bu(\cdot)$ defined over $t\in[t_0,t_f]$ to be optimal it must satisfy the conditions that \begin{align*} \forall t\in[t_0,t_f],\quad & \dot{\bx} = f(\bx,\bu), \quad &\bx(0)=\bx_0\\ \forall t\in[t_0,t_f],\quad & -\dot\lambda = \pd{g}{\bx}^T + \pd{f}{\bx}^T \lambda, \quad &\lambda(T) = 0\\ \forall t\in[t_0,t_f],\quad & \pd{g}{\bu} + \lambda\pd{f}{\bu} = 0.& \end{align*}

In fact the statement can be generalized even beyond this to the case where $\bu$ has constraints. The result is known as Pontryagin's minimum principle -- giving necessary conditions for a trajectory to be optimal.

Pontryagin's Minimum Principle

Adapted from Bertsekas00a. Given the initial conditions, $\bx_0$, a continuous dynamics, $\dot\bx = f(\bx,\bu)$, and the instantaneous cost $g(\bx,\bu)$, for a trajectory $\bx^*(\cdot),\bu^*(\cdot)$ defined over $t\in[t_0,t_f]$ to be optimal is must satisfy the conditions that \begin{align*} \forall t\in[t_0,t_f],\quad & \dot{\bx}^* = f(\bx^*,\bu^*), \quad &\bx^*(0)=\bx_0\\ \forall t\in[t_0,t_f],\quad & -\dot\lambda = \pd{g}{\bx}^T + \pd{f}{\bx}^T \lambda, \quad &\lambda(T) = 0\\ \forall t\in[t_0,t_f],\quad & u^* = \argmin_{\bu\in{\cal U}} \left[ g(\bx^*,\bu) + \lambda^T f(\bx^*,\bu) \right].& \end{align*}

Note that the terms which are minimized in the final line of the theorem are commonly referred to as the Hamiltonian of the optimal control problem, $$H(\bx,\bu,\lambda,t) = g(\bx,\bu) + \lambda^T f(\bx,\bu).$$ It is distinct from, but inspired by, the Hamiltonian of classical mechanics. Remembering that $\lambda$ has an interpretation as $\pd{J}{\bx}^T$, you should also recognize it from the HJB.

Trajectory optimization as a convex optimization

Linear systems with convex linear constraints

An important special case. Linear/Quadratic objectives results in an LP/QP $\Rightarrow$ Convex optimization.

Differential Flatness

Mixed-integer convex optimization for non-convex constraints

Local Trajectory Feedback Design

Once we have obtained a locally optimal trajectory from trajectory optimization, there is still work to do...

Model-predictive control

Time-varying LQR

Take $\bar\bx(t) = \bx(t) - \bx_0(t)$, and $\bar\bu(t) = \bu(t)-\bu_0(t)$, then apply finite-horizon LQR (see the LQR chapter).

Iterative LQR and Differential Dynamic Programming

Case Study: A glider that can land on a perch like a bird

Algorithms for Limit Cycles

The discussion of walking and running robots in Chapter 4 motivated the notion of limit cycle stability. Linear systems are not capable of producing stable limit cycle behavior, so this rich topic is unique to nonlinear systems design and analysis. Furthermore, the tools that are required to design, stabilize, and verify limit cycles will have applicability beyond simple periodic motions.

The first natural question we must ask is, given a system $\dot{\bx} = f(\bx)$, or a control system $\dot{x} = f(\bx,\bu)$, how do we go about finding periodic solutions which may be passively stable, open-loop stable, or stabilized with closed-loop feedback? It turns out that the trajectory optimization tools that we developed in the last chapter are very well suited to this task.

Trajectory optimization

I introduced the trajectory optimization tools as means for optimizing a single trajectory of the system from a particular known initial condition. But they can be used in a slightly different way, too -- in some formulations we might choose to have the initial conditions as one of the decision variables. In fact, the simplest formulation for finding a periodic solution is to leave the entire trajectory as a decision variable, but to add periodicity constraints enforcing that the initial conditions and the final conditions match. Some care is needed here; if you're not careful then these constraints will all be met if the entire trajectory remains at a fixed point in the system. In the example below, we constrain that the initial and final conditions lie on the surface of section that we used before, and additionally constrain that the velocity at the initial/final points is non-zero (to avoid the fixed point solution).

Note that although we introduced trajectory optimization as searching over control inputs $\bu$, this formulation is now rich enough that it solves an important problem even for passive systems -- with no control input at all.

Finding the limit cycle of the Van der Pol oscillator

Recall the dynamics of the Van der Pol oscillator given by $$\ddot{q} + \mu (q^2 - 1) \dot{q} + q = 0, \quad \mu>0,$$ which exhibited a stable limit cycle.

Formulate the direct collocation optimization: \begin{align*} \find_{\bx_0,...,\bx_N,dt} \quad \\ \subjto \quad & q_0 = 0, \quad q_N = 0, \\ & \dot{q}_0 = \dot{q}_N, \quad \dot{q}_0 > 0, \\ & \text{collocation dynamic constraints} \end{align*}

Try it yourself:


Click here to see an animation of the result. The red dots are the initial and final conditions (which appear as one dot once the constraints are satisfied). As always, make sure that you take a look at the code. Poke around. Try changing some things.

One of the things that you should notice in the code is that I provide an initial guess for the solver. In most of the examples so far I've been able to avoid doing that--the solver takes small random numbers as a default initial guess and solves from there. But for this problem, I found that it was getting stuck in a local minima. Adding the initial guess that the solution moves around a circle in state space was enough.

example: simplest flapping robot

Lyapunov analysis

Recall the important distinction between stability of a trajectory in time and stability of a limit cycle was that the limit cycle does not converge in phase -- trajectories near the cycle converge to the cycle, but trajectories on the cycle may not converge with each other. This is type of stability, also known as orbital stability can be written as stability to the manifold described by cycle $\bx^*(t)$, \[ \min_\tau || x(t) - x^*(\tau) || \rightarrow 0.\] In the case of limit cycles, this manifold is a periodic solution with $\bx^*(t+t_{period}) = \bx^*(t)$. Depending on exactly how that convergence happens, we can define orbital stability in the sense of Lyapunov, asymptotic orbital stability, exponential orbital stability, or even finite-time orbital stability.

In order to prove that a system is orbitally stable (locally, over a region, or globally), or to analyze the region of attraction of a limit cycle, we can use a Lyapunov function. In particular, we would like to consider Lyapunov functions which have the form cartooned below; they vanish to zero everywhere along the cycle, and are strictly positive everywhere away from the cycle.

Cartoon of a Lyapunov function which vanishes on a limit cycle, and is strictly positive everywhere else. (a small segment has been removed solely for the purposes of visualization).

Transverse coordinates

How can we parameterize this class of functions? For arbitrary cycles this could be very difficult in the original coordinates. For simple cycles like in the cartoon, one could imagine using polar coordinates. More generally, we will define a new coordinate system relative to the orbit, with coordinates

  • $\tau$ - the phase along the orbit
  • $\bx_\perp(\tau)$ - the remaining coordinates, linearly independent from $\tau$.

Given a state $\bx$ in the original coordinates, we must define a smooth mapping $\bx \rightarrow (\tau, \bx_\perp)$ to this new coordinate system. For example, for a simple ring oscillator we might have:

A moving coordinate system along the limit cycle.
In general, for an $n$-dimensional state space, $\tau$ will always be a scalar, and $\bx_\perp$ will be an $(n-1)$-dimensional vector defining the remaining coordinates relative to $\bx^*(\tau)$. In fact, although we use the notation $\bx_\perp$ the coordinate system need not be strictly orthogonal to the orbit, but must simply be transversal (not parallel). Having defined the smooth mapping $\bx \rightarrow (\tau,\bx_\perp)$, we can always rewrite the dynamics in this new coordinate system: \begin{gather*} \dot{\tau} = f_1(\bx_\perp,\tau) \\ \dot\bx_\perp = f_2(\bx_\perp,\tau). \end{gather*}

The value of this construction for Lyapunov analysis was proposed in Hauser94a and has been extended nicely to control design in Shiriaev08 and for region of attraction estimation in Manchester10b. A particular numerical strategy for defining the transversal coordinates is given in Manchester10a.

A Lyapunov theorem for orbital stability

For a dynamical system $\dot\bx = f(\bx)$ with $\bx \in \Re^n$, $f$ continuous, a continuous periodic solution $\bx^*(\tau)$, and a smooth mapping $\bx \rightarrow (\tau,\bx_\perp)$ where $\bx_\perp$ vanishes on $\bx^*$, then for some $n-1$ dimensional ball ${\cal B}$ around the origin, if I can produce a $V(\bx_\perp,\tau)$ such that \begin{gather*} \forall \tau, V(0,\tau) = 0, \\ \forall \tau, \forall \bx_\perp \in {\cal B}, \bx_\perp \ne 0, V(\bx_\perp,\tau) > 0, \end{gather*} with \begin{gather*} \forall \tau, \dot{V}(0,\tau) = 0, \\ \forall \tau, \forall \bx_\perp \in {\cal B}, \bx_\perp \ne 0, \dot{V}(\bx_\perp,\tau) < 0, \end{gather*} then the solution $\bx^*(t)$ is locally orbitally asymptotically stable.

Simple ring oscillator

Perhaps the simplest oscillator is the first-order system which converges to the unit circle. In cartesian coordinates, the dynamics are \begin{align*} \dot{x}_1 =& x_2 -\alpha x_1 \left( 1 - \frac{1}{\sqrt{x_1^2+x_2^2}}\right) \\ \dot{x}_2 =& -x_1 -\alpha x_2 \left( 1 - \frac{1}{\sqrt{x_1^2+x_2^2}}\right), \end{align*} where $\alpha$ is a positive scalar gain.

Vector field of the ring oscillator

If we take the transverse coordinates to be the polar coordinates, shifted so that $x_\perp$ is zero on the unit circle, \begin{align*}\tau =& \text{atan2}(-x_2,x_1) \\ x_\perp =& \sqrt{x_1^2+x_2^2}-1, \end{align*} which is valid when $x_\perp>-1$, then the simple transverse dynamics are revealed: \begin{align*} \dot\tau =& 1 \\ \dot{x}_\perp =& -\alpha x_\perp. \end{align*} Taking a Lyapunov candidate $V(x_\perp,\tau) = x_\perp^2$, we can verify that $$\dot{V} = -2 \alpha x_\perp^2 \prec 0, \quad \forall x_\perp > -1.$$ This demonstrates that the limit cycle is locally asymptotically stable, and furthermore that the invariant open-set $V < 1$ is inside the region of attraction of that limit cycle. In fact, we know that all $x_\perp > -1$ are in the region of attraction that limit cycle, but this is not proven by the Lyapunov argument above.

Let's compare this approach with the approach that we used in the chapter on walking robots, where we used a Poincaré map analysis to investigate limit cycle stability. In transverse coordinates approach, there is an additional burden to construct the coordinate system along the entire trajectory, instead of only at a single surface of section. In fact, the transverse coordinates approach is sometimes referred to as a "moving Poincaré section". But the reward for this extra work is that we can check a condition that only involves the instantaneous dynamics, $f(\bx)$, as opposed to having to integrate the dynamics over an entire cycle to generate the discrete Poincaré map, $\bx_p[n+1] = P(\bx_p[n])$. While computing $P$ in closed-form happened to be possible for the simple rimless wheel model, it is rarely possible for more complex models. As we will see below, this approach will also be more compatible with designing continuous feedback controller that stabilize the limit cycle.

Transverse linearization

In the case of Lyapunov analysis around a fixed point, there was an important special case: for stable linear systems, we actually have a recipe for constructing a Lyapunov function. As a result, for nonlinear systems, we often found it convenient to begin our search by linearizing around the fixed point and using the Lyapunov candidate for the linear system as an initial guess. That same approach can be extended to limit cycle analysis.

Region of attraction estimation using sums-of-squares

Coming soon. If you are interested, see Manchester10a+Manchester10b.

Feedback design

Transverse LQR

Orbital stabilization for non-periodic trajectories

Planning and Control through Contact

The hybrid system's approach

Transverse stabilization is more natural than full LQR

The measure differential inclusions approach

Motion Planning as Search

The term "motion planning" is a hopelessly general term which almost certainly encompasses the dynamic programming, feedback design, and trajectory optimization algorithms that we have already discussed. However, there are a number of algorithms and ideas that we have not yet discussed which have grown from the idea of formulating motion planning as a search problem -- for instance searching for a path from a start to a goal in a graph which is too large solve completely with dynamic programming. Some, but certainly not all, of these algorithms sacrifice optimality in order to find any path if it exists, and the notion of a planner being "complete" -- guaranteed to find a path if one exists -- is highly valued. This is precisely our goal for this chapter, to add some additional tools that will be able to provide some form of solutions for our most geometrically complex, highly non-convex, robot control problems.

LaValle06 is a very nice book on planning algorithms in general and on motion planning algorithms in particular. Compared to other planning problems, motion planning typically refers to problems where the planning domain is continuous (e.g. continuous state space, continuous action space), but many motion planning algorithms trace their origins back to ideas in discrete domains (e.g., graph search).

For this chapter, we will consider the following problem formulation: given a system defined by the nonlinear dynamics (in continuous- or discrete-time) $$\dot{\bx} = f(\bx,\bu) \quad \text{or} \quad \bx[n+1] = f(\bx[n],\bu[n]),$$ and given a start state $\bx(0) = \bx_0$ and a goal region ${\cal G}$, find any finite-time trajectory from $\bx_0$ to to $\bx \in {\cal G}$ if such a trajectory exists.

Artificial Intelligence as Search

A long history... some people feel that the route to creating intelligent machines is to collect large ontologies of knowledge, and then perform very efficient search. (The more modern view of AI is focused instead on machine learning, the right answer probably involves pieces of both.) Samuels checker players, Deep Blue playing chess, theorem proving, Cyc, IBM Watson,...

One of the key ideas is the use of "heuristics" to guide the search. "Is it possible to find and optimal path from the start to a goal without visiting every node?". $A^*$. Admissible heuristics. Example: google maps.

Online planning. $D^*$, $D^*$-Lite.

Randomized motion planning

If you remember how we introduced dynamic programming initially as a graph search, you'll remember that there were some challenges in discretizing the state space. Let's assume that we have discretized the continuous space into some finite set of discrete nodes in our graph. Even if we are willing to discretize the action space for the robot (this might be even be acceptable in practice), we had a problem where discrete actions from one node in the graph, integrated over some finite interval $h$, are extremely unlikely to land exactly on top of another node in the graph. To combat this, we had to start working on methods for interpolating the value function estimate between nodes.

add figure illustrating the interpolation here

Interpolation can work well if you are trying to solve for the cost-to-go function over the entire state space, but it's less compatible with search methods which are trying to find just a single path through the space. If I start in node 1, and land between node 2 and node 3, then which node to I continue to expand from?

One approach to avoiding this problem is to build a search tree as the search executes, instead of relying on a predefined mesh discretization. This tree will contains nodes rooted in the continuous space at exactly the points where system can reach.

Another other problem with any fixed mesh discretization of a continuous space, or even a fixed discretization of the action space, is that unless we have specific geometric / dynamic insights into our continuous system, it very difficult to provide a complete planning algorithm. Even if we can show that no path to the goal exists on the tree/graph, how can we be certain that there is no path for the continuous system? Perhaps a solution would have emerged if we had discretized the system differently, or more finely?

One approach to addressing this second challenge is to toss out the notion of fixed discretizations, and replace them with random sampling (another approach would be to adaptively add resolution to the discretization as the algorithm runs). Random sampling, e.g. of the action space, can yield algorithms that are probabilistically complete for the continuous space -- if a solution to the problem exists, then a probabilistically complete algorithm will find that solution with probability 1 as the number of samples goes to infinity.

With these motivations in mind, we can build what is perhaps the simplest probabilistically complete algorithm for finding a path from the a starting state to some goal region with in a continuous state and action space:

Planning with a Random Tree

Let us denote the data structure which contains the tree as ${\cal T}$. The algorithm is very simple:

  • Initialize the tree with the start state: ${\cal T} \leftarrow \bx_0$.
  • On each iteration:
    • Select a random node, $\bx_{rand}$, from the tree, ${\cal T}$
    • Select a random action, $\bu_{rand}$, from a distribution over feasible actions.
    • Compute the dynamics: $\bx_{new} = f(\bx_{rand},\bu_{rand})$
    • If $\bx_{new} \in {\cal G}$, then terminate. Solution found!
    • Otherwise add the new node to the tree, ${\cal T} \leftarrow \bx_{new}$.
It can be shown that this algorithm is, in fact, probabilistically complete. However, without strong heuristics to guide the selection of the nodes scheduled for expansion, it can be extremely inefficient. For a simple example, consider the system $\bx[n] = \bu[n]$ with $\bx \in \Re^2$ and $\bu_i \in [-1,1]$. We'll start at the origin and put the goal region as $\forall i, 15 \le x_i \le 20$. Try it yourself:

T = struct('parent',zeros(1,1000),'node',zeros(2,1000));  % pre-allocate memory for the "tree"
for i=2:size(T.parent,2)
  T.parent(i) = randi(i-1);
  x_rand = T.node(:,T.parent(i));
  u_rand = 2*rand(2,1)-1;
  x_new = x_rand+u_rand;
  if (15<=x_new(1) && x_new(1)<=20 && 15<=x_new(2) && x_new(2)<=20)
    disp('Success!'); break;
  T.node(:,i) = x_new;
Again, this algorithm is probabilistically complete. But after expanding 1000 nodes, the tree is basically a mess of node points all right on top of each other:

We're nowhere close to the goal yet, and it's not exactly a hard problem.

While the idea of generating a tree of feasible points has clear advantages, we have lost the ability to cross off a node (and therefore a region of space) once it has been explored. It seems that, to make randomized algorithms effective, we are going to at the very least need some form of heuristic for encouraging the nodes to spread out and explore the space.

Rapidly-Exploring Random Trees (RRTs)

(Click here to watch the animation)
(Click here to watch the animation)

RRTs for robots with dynamics

Variations and extensions

Multi-query planning with PRMs, ...

RRT*, RRT-sharp, RRTx,...

Complexity bounds and dispersion limits


Not sure yet whether randomness is fundamental here, or whether is a temporary "crutch" until we understand geometric and dynamic planning better.

Decomposition methods

Cell decomposition...

Mixed-integer planning.

Approximate decompositions for complex environments (e.g. IRIS)

Feedback Motion Planning

Output Feedback

Robust and Stochastic Control

Stochastic optimal control. Stochastic HJB. Graph search. LQR + Gaussian noise. Whitening filters. Stochastic trajectory optimization. Stochastic verification. (Having bounds on expected value does yield bounds on system state). Robust control. Common Lyapunov functions. Dissipation inequalities. Guaranteed-cost control for linear systems. H-infinity.

Planning Under Uncertainty

System Identification

equation error vs simulation error. model-reduction for linear systems. optimal experiment design

State Estimation

Reinforcement Learning

model-free value methods, model-free policy search, actor-critic methods

Rigid-Body Dynamics

Deriving the equations of motion (an example)

The equations of motion for a standard robot can be derived using the method of Lagrange. Using $T$ as the total kinetic energy of the system, and $U$ as the total potential energy of the system, $L = T-U$, and $Q_i$ as the generalized force corresponding to $q_i$, the Lagrangian dynamic equations are: $$\frac{d}{dt}\pd{L}{\dot{q}_i} - \pd{L}{q_i} = Q_i.$$ If you are not comfortable with these equations, then any good book chapter on rigid body mechanics can bring you up to speed -- try Craig89 for a very practical guide to robot kinematics/dynamics, Goldstein02 for a more hard-core dynamics text or Thornton03 for a classical dynamics text which is a nice read -- for now you can take them as a handle that you can crank to generate equations of motion.

Simple Double Pendulum

Simple double pendulum

Consider the simple double pendulum with torque actuation at both joints and all of the mass concentrated in two points (for simplicity). Using $\bq = [\theta_1,\theta_2]^T$, and ${\bf p}_1,{\bf p}_2$ to denote the locations of $m_1,m_2$, respectively, the kinematics of this system are: \begin{eqnarray*} {\bf p}_1 =& l_1\begin{bmatrix} s_1 \\ - c_1 \end{bmatrix}, &{\bf p}_2 = {\bf p}_1 + l_2\begin{bmatrix} s_{1+2} \\ - c_{1+2} \end{bmatrix} \\ \dot{{\bf p}}_1 =& l_1 \dot{q}_1\begin{bmatrix} c_1 \\ s_1 \end{bmatrix}, &\dot{{\bf p}}_2 = \dot{{\bf p}}_1 + l_2 (\dot{q}_1+\dot{q}_2) \begin{bmatrix} c_{1+2} \\ s_{1+2} \end{bmatrix} \end{eqnarray*} Note that $s_1$ is shorthand for $\sin(q_1)$, $c_{1+2}$ is shorthand for $\cos(q_1+q_2)$, etc. From this we can write the kinetic and potential energy: \begin{align*} T =& \frac{1}{2} m_1 \dot{\bf p}_1^T \dot{\bf p}_1 + \frac{1}{2} m_2 \dot{\bf p}_2^T \dot{\bf p}_2 \\ =& \frac{1}{2}(m_1 + m_2) l_1^2 \dot{q}_1^2 + \frac{1}{2} m_2 l_2^2 (\dot{q}_1 + \dot{q}_2)^2 + m_2 l_1 l_2 \dot{q}_1 (\dot{q}_1 + \dot{q}_2) c_2 \\ U =& m_1 g y_1 + m_2 g y_2 = -(m_1+m_2) g l_1 c_1 - m_2 g l_2 c_{1+2} \end{align*} Taking the partial derivatives $\pd{T}{q_i}$, $\pd{T}{\dot{q}_i}$, and $\pd{U}{q_i}$ ($\pd{U}{\dot{q}_i}$ terms are always zero), then $\frac{d}{dt}\pd{T}{\dot{q}_i}$, and plugging them into the Lagrangian, reveals the equations of motion: \begin{align*} (m_1 + m_2) l_1^2 \ddot{q}_1& + m_2 l_2^2 (\ddot{q}_1 + \ddot{q}_2) + m_2 l_1 l_2 (2 \ddot{q}_1 + \ddot{q}_2) c_2 \\ &- m_2 l_1 l_2 (2 \dot{q}_1 + \dot{q}_2) \dot{q}_2 s_2 + (m_1 + m_2) l_1 g s_1 + m_2 g l_2 s_{1+2} = \tau_1 \\ m_2 l_2^2 (\ddot{q}_1 + \ddot{q}_2)& + m_2 l_1 l_2 \ddot{q}_1 c_2 + m_2 l_1 l_2 \dot{q}_1^2 s_2 + m_2 g l_2 s_{1+2} = \tau_2 \end{align*} As we saw in chapter 1, numerically integrating (and animating) these equations in MATLAB produces the expected result.

The Manipulator Equations

If you crank through the Lagrangian dynamics for a few simple robotic manipulators, you will begin to see a pattern emerge - the resulting equations of motion all have a characteristic form. For example, the kinetic energy of your robot can always be written in the form: $$T = \frac{1}{2} \dot{\bq}^T {\bf H}(\bq) \dot{\bq},$$ where ${\bf H}$ is the state-dependent inertial matrix. This abstraction affords some insight into general manipulator dynamics - for example we know that ${\bf H}$ is always positive definite, and symmetricAsada86(p.107) and has a beautiful sparsity patternFeatherstone05 that we'll attempt to take advantage of in our algorithms.

Continuing our abstractions, we find that the equations of motion of a general robotic manipulator (sans kinematic loops) take the form $${\bf H}(\bq)\ddot{\bq} + {\bf C}(\bq,\dot{\bq})\dot{\bq} + {\bf g}(\bq) = {\bf B}(\bq)\bu,$$ where $\bq$ is the state vector, ${\bf H}$ is the inertial matrix, ${\bf C}$ captures Coriolis forces, and ${\bf g}$ captures potentials (such as gravity). The matrix ${\bf B}$ maps control inputs $\bu$ into generalized forces.

Manipulator Equation form of the Simple Double Pendulum

The equations of motion from Example 1 can be written compactly as: \begin{align*} {\bf H}(\bq) =& \begin{bmatrix} (m_1 + m_2)l_1^2 + m_2 l_2^2 + 2 m_2 l_1l_2 c_2 & m_2 l_2^2 + m_2 l_1 l_2 c_2 \\ m_2 l_2^2 + m_2 l_1 l_2 c_2 & m_2 l_2^2 \end{bmatrix} \\ {\bf C}(\bq,\dot\bq) =& \begin{bmatrix} 0 & -m_2 l_1 l_2 (2\dot{q}_1 + \dot{q}_2)s_2 \\ m_2 l_1 l_2 \dot{q}_1 s_2 & 0 \end{bmatrix} \\ {\bf g}(\bq) =& g \begin{bmatrix} (m_1 + m_2) l_1 s_1 + m_2 l_2 s_{1+2} \\ m_2 l_2 s_{1+2} \end{bmatrix} , \quad {\bf B} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \end{align*} Note that this choice of the ${\bf C}$ matrix was not unique.

The manipulator equations are very general, but they do define some important characteristics. For example, $\ddot{\bq}$ is (state-dependent) linearly related to the control input, $\bu$. This observation justifies the control-affine form of the dynamics assumed throughout the notes.

Note that we have chosen to use the notation of second-order systems (with $\dot{\bq}$ and $\ddot{\bq}$ appearing in the equations) throughout this book. Although I believe it provides more clarity, there is an important limitation to this notation: it is impossible to describe 3D rotations in "minimal coordinates" using this notation without introducing kinematic singularities (like the famous "gimbal lock"). For instance, a common singularity-free choice for representing a 3D rotation is the unit quaternion, described by 4 real values (plus a norm constraint). However we can still represent the rotational velocity without singularities using just 3 real values. This means that the length of our velocity vector is no longer the same as the length of our position vector. For this reason, you will see that most of the software in uses the more general notation with $\bv$ to represent velocity, $\bq$ to represent positions, and the manipulator equations are written as \[ {\bf H}(\bq) \dot{\bv} + {\bf C}(\bq,\bv)\bv + {\bf g}(\bq) = {\bf B}(\bq) \bu, \] which is not necessarily a second-order system. See Duindam06 for a nice discussion of this topic.

Bilateral Position Constraints

Consider constraint equation $$\phi(q) = 0.$$ These arise, for instance, when the system has a closed kinematic chain. The (floating base) equations of motion can be written $$H(q)\ddot{q} + c(q,\dot{q}) = B u + J(q)^T \lambda,$$ where $J(q) = \pd{\phi}{q}$ and $\lambda$ is the constraint force.

To solve for $\lambda$, observe that when the constraint is imposed, $\phi(q)=0$ and therefor $\dot{\phi}=0$ and $\ddot{\phi}=0$. Writing this out, we have \begin{gather*}\dot{\phi} = J(q)\dot{q}=0,\\ \ddot{\phi} = J(q) \ddot{q} + \dot{J}(q) \dot{q} = 0.\end{gather*} Inserting the dynamics and solving for $\lambda$ yields $$\lambda =- (J H^{-1} J^T)^+ (J H^{-1} (Bu - c) + \dot{J}\dot{q}).$$ The $^+$ notation refers to a Moore-Penrose pseudo-inverse. In most cases, there are less constraints than degrees of freedom, in which case the inverse has a unique solution (and the traditional inverse could have been used). But the pseudo-inverse also works in cases where the system is over-constrained.

For numerical stability, one might like to add a restoring force to this constraint in the event that the constraint is not satisfied to numerical precision. To accomplish this, we can implement $$\ddot\phi = -\frac{2}{\epsilon} \dot\phi(q) - \frac{1}{\epsilon^2} \phi(q).$$ Carrying this through yields \begin{gather*} \lambda =- (J H^{-1} J^T)^+ (J H^{-1} (Bu - c) + (\dot{J} + \frac{2}{\epsilon} J)\dot{q} + \frac{1}{\epsilon^2} \phi). \end{gather*}

Bilateral Velocity Constraints

Consider the constraint equation $$\psi(q,\dot{q}) = 0,$$ where $\pd{\psi}{\dot{q}} \ne 0.$ These are less common, but arise when, for instance, a joint is driven through a prescribed motion. Here, the manipulator equations are given by $$H(q)\ddot{q} + c = B u + \pd{\psi}{\dot{q}}^T \lambda.$$ To solve for $\lambda$, we take $$\dot{\psi} = \pd{\psi}{q} \dot{q} + \pd{\psi}{\dot{q}}\ddot{q} = 0,$$ which yields $$\lambda = - \left( \pd{\psi}{\dot{q}} H^{-1} \pd{\psi}{\dot{q}} \right)^+ \left[ \pd{\psi}{\dot{q}} H^{-1} (Bu - c) + \pd{\psi}{q} \dot{q} \right].$$

Again, for numerical stability, we ask instead for $\dot{\psi} = -\frac{1}{\epsilon} \psi$, which yields $$\lambda = - \left( \pd{\psi}{\dot{q}} H^{-1} \pd{\psi}{\dot{q}} \right)^+ \left[ \pd{\psi}{\dot{q}} H^{-1} (Bu - c) + \pd{\psi}{q} \dot{q} + \frac{1}{\epsilon} \psi \right].$$

Impulsive Collisions

Now let $\phi(q)$ indicate the relative (signed) distance between two rigid bodies. The collision event is described by the zero-crossings (from positive to negative) of $\phi$. Let us start by assuming frictionless collisions, allowing us to write $$H\ddot{q} + c = Bu + J^T \lambda,$$ where $\lambda \ge 0$ is now a (scalar) coefficient of the impulsive force that is well-defined when integrated over the time of the collision (denoted $t_c^-$ to $t_c^+$). Integrate both sides of the equation over that (instantaneous) interval: \begin{align*}\int_{t_c^-}^{t_c^+} dt\left[ H \ddot{q} + C \right] = \int_{t_c^-}^{t_c^+} dt \left[ Bu + J^T \lambda \right] \end{align*} Since $q$ and $u$ are constants over this interval, we are left with $$H\dot{q}^+ - H\dot{q}^- = J^T \int_{t_c^-}^{t_c^+} \lambda dt,$$ where $\dot{q}^+$ is short-hand for $\dot{q}(t_c^+)$. Multiplying both sides by $J H^{-1}$, we have $$ J \dot{q}^+ - J\dot{q}^- = J H^{-1} J^T \int_{t_c^-}^{t_c^+} \lambda dt.$$ After the collision, we have $\dot\phi^+ = -e \dot\phi^-$, with $0 \le e \le 1$ denoting the coefficient of restitution, yielding: $$ J \dot{q}^+ - J\dot{q}^- = -(1+e)J\dot{q}^-,$$ and therefore $$\int_{t_c^-}^{t_c^+} \lambda dt = - (1+e)\left[ J H^{-1} J^T \right]^+ J \dot{q}^-.$$ Substituting this back in above results in $$\dot{q}^+ = \left[ I - (1+e)H^{-1} J^T \left[ J H^{-1} J^T \right]^+ J \right] \dot{q}^-.$$

We can add friction at the contact. Rigid bodies will almost always experience contact at only a point, so we typically ignore torsional friction Featherstone06, and model only tangential friction by extending $J$ to include the two vectors that span the tangential coordinates at the contact, written in joint coordinates (c.f. Stewart96). Then $\lambda$ becomes a Cartesian force vector, but the equations above remain unchanged. If $\lambda$ is restricted to a friction cone, as in Coulomb friction, in general we have to solve an optimization.

frictional contact

Hybrid Dynamics

We can put the bilateral constraint equations and the impulsive collision equations together to implement a hybrid model for unilateral constraints of the form $$\phi(q) \ge 0.$$ At every moment, some subset of the contacts are active, and can be treated as a bilateral constraint ($\phi_i=0$). The guards of the hybrid model are when an inactive constraint becomes active ($\phi_i>0 \rightarrow \phi_i=0$), or when an active constraint becomes inactive ($\lambda_i>0 \rightarrow \lambda_i=0$). Note that collision events will (almost always) only result in new active constraints when $e=0$, e.g. we have pure inelastic collisions.

If the contact involves Coulomb friction, then the transitions between sticking and sliding can be modeled as additional hybrid guards, or can be solved efficiently as a complementarity problem Steward96.

Compliant Contact Models

Recursive Dynamics Algorithms

The equations of motions for our machines get complicated quickly. Fortunately, for robots with a tree-link kinematic structure, there are very efficient and natural recursive algorithms for generating these equations of motion. For a detailed reference on these methods, see Featherstone07; some people prefer reading about the Articulated Body Method in Mirtich96. The software libraries distributed with this text include a complete implementation; you can also some matlab algorithms available on Roy Featherstone's website, or try googling "Open Dynamics Engine" or "Simulation Construction Set".

numerical integration. discrete-time approximations of continuous systems (lots of edx people had trouble with this. some found: http://web.mit.edu/10.001/Web/Course_Notes/Differential_Equations_Notes/node3.html)

Parameter Estimation

lumped parameters. identifiable lumped parameters. least-squares estimation. power formulations. trajectory design.

Optimization Preliminaries

Nonlinear optimization

General formulation

Solution techniques

Lagrange multipliers / KKT, Gradient descent, SQP (SNOPT, NLOPT, IPOPT), Global optimization

Example: Inverse Kinematics

Convex optimization

Linear Programs/Quadratic Programs/Second-Order Cones

Example: Balancing force control on Atlas

Semidefinite Programming and Linear Matrix Inequalities

Sums of squares / polynomial optimization

Solution techniques

Interior point (Gurobi, Mosek, Sedumi, ...), First order methods

Mixed-integer convex optimization

Combinatorial optimization

Search, SAT, First order logic, SMT solvers, LP interpretation
Polynomial root finding/homotopy, L1 optimization, QCQP, LCP/Variational inequalities, BMI, ...

Machine Learning Preliminaries

Function approximation


What is ?

Drake ("dragon" in Middle English) is a toolbox maintained by the Robot Locomotion Group at the MIT Computer Science and Artificial Intelligence Lab (CSAIL) which contains implementations of many of the ideas from these notes. It was designed with the goal of rapidly prototyping advanced control ideas, but has also been optimized (in places) for performance -- it provides the complete planning and control backend for most of our research robots, including MIT's entry into the DARPA Robotics Challenge, and now has users from academia and industry around the world.

is implemented using a hierarchy of software classes (in MATLAB and C++) which are designed to expose and exploit available structure in input-output dynamical systems. While some algorithms are available for general nonlinear systems, specialized algorithms are available for polynomial dynamical systems, linear dynamical systems, ..., and especially rigid-body mechanical systems; many of those algorithms operate symbolically on the governing equations. The toolbox does a lot of work behind the scenes to make sure that, for instance, feedback or cascade combinations of polynomial systems remain polynomial. Another major focus is on numerical-optimization-based design and analysis -- for instance, in addition to providing a full rigid-body dynamics engine the software also provides analytical gradients for those dynamics. parses a number of robot description formats (such as "URDF"), uses the Simulink solvers for simulation of dynamical systems, and connects with a number of external tools to facilitate design and analysis.

Relative to Simulink and SimMechanics

Roughly speaking, MATLAB's Simulink provides a very nice interface for describing dynamical systems (as S-Functions), a graphical interface for combining these systems in very nontrivial ways, and a number of powerful solvers for simulating the resulting systems. For simulation analysis, it provides everything we need. However the S-Function abstraction which makes Simulink powerful also hides some of the detailed structure available in the equations governing a dynamical system; for the purposes of control design and analysis I would like to be able to declare that a particular system is governed by analytic equations, or polynomial equations, or even linear equations, and for many of the tools it is important to be able to manipulate these equations symbolically.

You can think of as a layer built on top of the Simulink engine which allows you to defined structured dynamical systems. Every dynamical system in can be simulated using the Simulink engine, but also provides a number of tools for analysis and controller design which take advantage of the system structure. While it is possible to use the Simulink GUI with , the standard workflow makes use of command-line methods which provide a restricted set of tools for combining systems in ways that, whenever possible, preserve the structure in the equations.

Like SimMechanics, provides a number of tools for working specifically with multi-link rigid body systems. In SimMechanics, you can describe the system directly in the GUI whereas in you describe the system in an XML file. SimMechanics has a number of nice features, such as integration with SolidWorks, and almost certainly provides more richness and faster code for simulating complex rigid body systems. on the other hand will provide more sophisticated tools for analysis and design, but likely will never support as many gears, friction models, etc. as SimMechanics.

For controlling real hardware

also has many interfaces which allow it to connect to other components of a robotic / control system. Inputs and outputs of the dynamical systems, or static matlab functions (e.g. for trajectory planning), can be connected directly to network interfaces. We primarily use Lightweight Communications and Marshalling (LCM) Huang10 to make these connections; support for other protocols (such as ROS) can be added easily or accomplished via an independent network translator. This approach has been used extensively on real hardware experiments at MIT, and formed the foundation of our solution for controlling a complex humanoid for MIT's entry into the DARPA Robotics Challenge (with Drake nodes running in desktop MATLAB instances inside the real-time feedback loops). In addition to the primary MATLAB front end, a number of methods for kinematics and dynamics of rigid-body systems are also available directly as a C++ library.

Getting started

For everything you need to get started, including installation instructions, a discussion forum, and a (modest but growing) gallery of examples, please visit the website: http://drake.mit.edu.

The remainder of this chapter will give you an important conceptual overview of defining and working with dynamical systems in drake, and will give a number of simple examples for you to start with.

If you have questions, please search the previous questions on the github issues page, and post a new question if you aren't able to find the solution.

Modeling Input-Output Dynamical Systems

The fundamental object in is a dynamical system. Robots, controllers, state estimators, etc. are all instances of dynamical systems. Algorithms in operate on dynamical systems. This chapter introduces what you need to know to instantiate the dynamical systems that you are interested in, and to simulate and visualize their outputs.

Dynamical systems in are represented by their dynamics in state-space form, with $x$ denoting the state vector. In addition, every dynamical system can have an input vector, $u$, and an output vector $y$.

As we will see, dynamical systems in can be instantiated in a number of different ways. A system can be described by a small block of MATLAB or C++ code deriving from our DrakeSystem class interface, or -- in the case of rigid-body systems -- even by simple XML file (see the section below on the URDF format) that can be parsed by our existing class implementations.

Writing your own dynamics in MATLAB

In this section, we will describe how you can write your own dynamics class in MATLAB by deriving from the DrakeSystem interface class.

Continuous-Time Systems. Consider a basic continuous-time nonlinear input-output dynamical system described by the following state-space equations: \begin{gather*} \dot{x} = f(t,x,u), \\ y = g(t,x,u). \end{gather*} In , you can instantiate a system of this form where $f()$ and $g()$ are anything that you can write into a MATLAB function. This is accomplished by deriving from the DrakeSystem class, defining the size of the input, state, and output vectors in the constructor, and overloading the dynamics and output methods, as illustrated by the following example:

A simple continuous-time system

Consider the system \begin{gather*}\dot{x} = -x + x^3,\\ y = x.\end{gather*} This system has zero inputs, one (continuous) state variable, and one output. It can be implemented in using the following code:

classdef SimpleCTExample < DrakeSystem
    function obj = SimpleCTExample()
      % call the parent class constructor:
      obj = obj@DrakeSystem(...
         1, ... % number of continuous states
         0, ... % number of discrete states
         0, ... % number of inputs
         1, ... % number of outputs
         false, ... % because the output does not depend on u
         true);  % because the dynamics and output do not depend on t
    function xdot = dynamics(obj,t,x,u)
      xdot = -x+x^3;
    function y=output(obj,t,x,u)

Discrete-Time Systems. Implementing a basic discrete-time system in is very analogous to implementing a continuous-time system. The discrete-time system given by: \begin{gather*} x[n+1] = f(n,x,u),\\ y[n] = g(n,x,u), \end{gather*} can be implemented by deriving from DrakeSystem and defining the update and output methods, as seen in the following example.

A simple discrete-time system

Consider the system \begin{gather*}x[n+1] = x^3[n],\\ y[n] = x[n].\end{gather*} This system has zero inputs, one (discrete) state variable, and one output. It can be implemented in using the following code:

classdef SimpleDTExample < DrakeSystem
    function obj = SimpleDTExample()
      % call the parent class constructor:
      obj = obj@DrakeSystem(...
         0, ... % number of continuous states
         1, ... % number of discrete states
         0, ... % number of inputs
         1, ... % number of outputs
         false, ... % because the output does not depend on u
         true);  % because the update and output do not depend on t
    function xnext = update(obj,t,x,u)
      xnext = x^3;
    function y=output(obj,t,x,u)

Mixed Discrete and Continuous Dynamics. It is also possible to implement systems that have both continuous dynamics and discrete dynamics. There are two subtleties that must be addressed. First, we'll denote the discrete states as $x_d$ and the continuous states as $x_c$, and the entire state vector $x = [x_d^T, x_c^T]^T$. Second, we must define the timing of the discrete dynamics update relative to the continuous time variable $t$; we'll denote this period with $\Delta_t$. Then a mixed system can be written as:\begin{gather*} \dot{x}_c = f_c(t,x,u),\\ x_d(t+t') = f_d(t,x,u), \quad \forall t \in \{0,\Delta_t,2\Delta_t,...\},\forall t' \in (0,\Delta_t] \\ y=g(t,x,u). \end{gather*} Note that, unlike the purely discrete time example, the solution of the discrete time variable is defined for all $t$. To implement this, derive from DrakeSystem and implement the dynamics, update, output methods for $f_c()$, $f_d()$, and $g()$, respectively. To define the timing, you must also implement the getSampleTime method. Type help DrakeSystem/getSampleTime at the MATLAB prompt for more information. Note that currently only supports a single DT sample time for each individual DrakeSystem (but multiple systems can be combined with different sampling times).

A mixed discrete- and continuous-time example

Consider the system \begin{gather*}x_1(t+t') = x_1^3(t), \quad \forall t \in \{0,1,2,..\}, \forall t' \in (0,1], \\\dot{x}_2 = -x_2(t) + x_2^3(t), \end{gather*} which is the combination of the previous two examples into a single system. It can be implemented in using the following code:

classdef SimpleMixedCTDTExample < DrakeSystem
    function obj = SimpleMixedCTDTExample()
      % call the parent class constructor:
      obj = obj@DrakeSystem(...
         1, ... % number of continuous states
         1, ... % number of discrete states
         0, ... % number of inputs
         2, ... % number of outputs
         false, ... % because the output does not depend on u
         true);  % because the update and output do not depend on t
    function ts = getSampleTime(obj)
      ts = [[0;0], ...  % continuous and discrete sample times
        [1;0]];         % with dt = 1
    function xdnext = update(obj,t,x,u)
      xdnext = x(1)^3;       % the DT state is x(1)
    function xcdot = dynamics(obj,t,x,u);
      xcdot = -x(2)+x(2)^3;  % the CT state is x(2)
    function y=output(obj,t,x,u)

Systems with Constraints. Nonlinear input-output systems with constraints can also be defined. There are two distinct types of constraints supported: state constraints that can be modeled as a function $\phi(x) = 0$ and input constraints which can be modeled as $u_{min} \le u \le u_{max}$. For instance, we would write a continuous-time system with state and input constraints as: \begin{gather*} \dot{x} = f(t,x,u),\quad y = g(t,x,u), \\ \text{subject to } \phi(x) = 0, u_{min} \le u \le u_{max}. \end{gather*} These two types of constraints are handled quite differently.

Input constraints are designed to act in the same way that an actuator limit might work for a mechanical system. These act as a saturation nonlinearity system attached to the input, where for each element: $$y_i = \begin{cases} u_{max,i} & \text{if } u_i > u_{max,i} \\ u_{min,i} & \text{if } u_i < u_{min,i} \\ u_i & \text{otherwise.}\end{cases}$$ The advantage of using the input limits instead of implementing the saturation in your own code is that the discontinuity associated with hitting or leaving a saturation is communicated to the solver correctly, allowing for more efficient and accurate simulations. Input constraints are set by calling the setInputLimits method.

State constraints are additional information that you are providing to the solver and analysis routines. They should be read as "this dynamical system will only ever visit states described by $\phi(x)=0$". Evaluating the dynamics at a vector $x$ for which $\phi(x) \ne 0$ may lead to an undefined or non-sensible output. Telling about this constraint will allow it to select initial conditions which satisfy the constraint, simulate the system more accurately (with the constraint imposed), and restrict attention during analysis to the manifold defined by the constraints. However, \emph{the state constraints function should not be used to enforce an additional constraint that is not already imposed on the system by the governing equations.}. The state constraint should be simply announcing a property that the system already has, if simulated accurately. Examples might include a passive system with a known total energy, or a four-bar linkage in a rigid body whose dynamics are written as a kinematic tree + constraint forces. State constraints are implemented by overloading the stateConstraints method \emph{and} by calling setNumStateConstraints to tell the solver what to expect in that method.

Hybrid (Event-Driven) Systems. supports systems that are modeled as smooth, discrete- or continuous- time systems which transition between discrete modes based on some event. Simulink calls these models "State Machines". Examples include a walking robot model which undergoes a discrete impulsive collision event (and possibly a change to a different model) when a foot hits the ground, or the switching controller for swinging up the underactuated double pendulum (Acrobot) which switches from an energy-shaping swing-up controller to a linear balancing controller at the moment when the state arrives in a pre-specified neighborhood around the upright. An example hybrid system is illustrated in the figure below. Note that the internal mode dynamics could also have discrete-time dynamics or mixed discrete- and continuous-time dynamics.

Event-driven systems in are described using the language from the Hybrid Systems community. Transitions between individual modes are described by a guard function, denoted by $\phi(t,x,u)$ in the figure, which triggers a transition out of the current mode when $\phi \le 0$. The dynamics of the transition are given by the reset function, $x^+ = \Delta(t,x^-,u)$. Event-driven systems are constructed by creating (or inheriting from) an empty HybridDrakeSystem, then populating the system with modes (nodes in the graph) by calling addMode, and populating the system with transitions (edges in the graph) by calling addTransition. Note that it is often useful to create guard out of a logical combination of smooth guards (e.g. $x(1)>0$ and $x(2)<.5$); to accomplish this you should use the andGuard and notGuard methods. The output of a HybridDrakeSystem is the output of the active mode.

The bouncing ball: a hybrid system example

The dynamics of a vertically bouncing ball can be described by a HybridDrakeSystem with a single continuous mode to model the flight of the ball through the air and a discrete collision event at the moment that the ball hit the ground. This can be accomplished with the following two classes (which can be found in the examples/BouncingBall directory):

classdef BallFlightPhasePlant < DrakeSystem

    function obj = BallFlightPhasePlant()
      obj = obj@DrakeSystem(...
        2, ... % number of continuous states
        0, ... % number of discrete states
        0, ... % number of inputs
        1, ... % number of outputs
        false, ... % not direct feedthrough
        true); % time invariant

    function xdot = dynamics(obj,t,x,u)
      xdot = [x(2); -obj.g];  % qddot = -g; x=[q,qdot]

    function y = output(obj,t,x,u)
      y = x(1); % height of the ball

    g = 9.81;  % gravity

classdef BallPlant < HybridDrakeSystem

    function obj = BallPlant()
      obj = obj@HybridDrakeSystem(...
        0, ...  % number of inputs
        1);     % number of outputs

      % create flight mode system
      sys = BallFlightPhasePlant();
      obj = setInputFrame(obj,sys.getInputFrame);
      obj = setOutputFrame(obj,sys.getOutputFrame);
      [obj,flight_mode] = addMode(obj,sys);  % add the single mode

      g1=inline('x(1)-obj.r','obj','t','x','u');  % q-r<=0
      g2=inline('x(2)','obj','t','x','u'); % qdot<=0
      obj = addTransition(obj, ...
        flight_mode, ...           % from mode
        andGuards(obj,g1,g2), ...  % q-r<=0 & qdot<=0
        @collisionDynamics, ...    % transition method
        false, ...                 % not direct feedthrough
        true);                     % time invariant

    function [xn,m,status] = collisionDynamics(obj,m,t,x,u)
      xn = [x(1); -obj.cor*x(2)];      % qdot = -cor*qdot

      if (xn(2)<0.01) status = 1; % stop simulating if ball has stopped
      else status = 0; end


    r = 1;  % radius of the ball
    cor = .8;  % coefficient of restitution

Stochastic Systems. also provides limited support for working with stochastic systems. This includes continuous-time stochastic models of the form \begin{gather*} \dot{x}(t) = f(t,x(t),u(t),w(t)) \\ y(t) = g(t,x(t),u(t),w(t)), \end{gather*} where $w(t)$ is the vector output of a random process which generates Gaussian white noise. It also supports discrete-time models of the form \begin{gather*} x[n+1] = f(n,x[n],u[n],w[n]) \\ y[n] = g(n,x[n],u[n],w[n]), \end{gather*} where $w[n]$ is Gaussian i.i.d. noise, and mixed continuous- and discrete-time systems as described above. These are quite general models, since nearly any distribution can be approximated by a white noise input into a nonlinear dynamical system. Note that for simulation purposes, any continuous-time white noise, $w(t)$, is approximated by a band-limited white noise signal.

Stochastic models can be implemented in by deriving from StochastiDrakeSystem and implementing the stochasticDynamics, stochasticUpdate, and stochasticOutput methods.

A simple continuous-time stochastic system

Consider the system \begin{gather*}\dot{x} = -x + w,\\ y = x.\end{gather*} This system has zero inputs, one (continuous) state variable, and one output. It can be implemented in using the following code:

classdef LinearGaussianExample < StochasticDrakeSystem

    function obj = LinearGaussianExample
      obj = obj@StochasticDrakeSystem(...
        1, ... % number of continuous states
        0, ... % number of discrete states
        0, ... % number of inputs
        1, ... % number of outputs
        false, ...  % not direct feedthrough
        true, ...   % time invariant
        1, ... % number of noise inputs
        .01);  % time constant of w(t)

    function xcdot = stochasticDynamics(obj,t,x,u,w)
      xcdot = -x + w;

    function y = stochasticOutput(obj,t,x,u,w);

Important Special Cases. There are many special cases of dynamical systems with structure in the dynamics which can be exploited by our algorithms. Special cases of dynamical systems implemented in include

  • Second-order systems, given by $\ddot{q} = f(t,q,\dot{q},u),$ $y = g(t,q,\dot{q},u)$, and $\phi_1(q)=0$ and $\phi_2(q,\dot{q})=0$.
  • Rigid-body systems, governed by the manipulator equations, \begin{gather*}H(q)\ddot{q} + C(q,\dot{q})\dot{q} + G(q) = Bu + \pd{\phi_1}{q}^T \lambda_1 + \pd{\phi_2}{\dot{q}}^T \lambda_2\\ \phi_1(q) = 0, \quad \phi_2(q,\dot{q})=0\end{gather*} where $\lambda_1$ and $\lambda_2$ are forces defined implicitly by the constraints.
  • (Rational) polynomial systems, given by $e(x)\dot{x} = f(t,x,u)$, $y = g(t,x,u)$, subject to $\phi(x)=0$, where $e(), f(), g(),$ and $\phi()$ are all polynomial.
  • Linear time-invariant systems, given by \begin{gather*} \dot{x}_c = A_c x + B_c u,\\ x_d[n+1] = A_d x + B_d u, \\ y = Cx + Du, \end{gather*} and $\phi(x)=\{\}$.

These special cases are implemented by classes derived from DrakeSystem. You should always attempt to derive from the deepest class in the hierarchy that describes your system.

In some cases it is possible to convert between these derived classes. For example, it is possible to convert a rigid-body system to a rational polynomial system (e.g., by changing coordinates through a stereographic projection). Methods which implement these conversions are provided whenever possible.

Describing a rigid-body system with URDF

URDF = Universal Robot Description Format. Details coming soon.

Using existing Simulink models/blocks

Although most users of Drake never open a Simulink GUI, is built on top of the MATLAB Simulink engine. As such, systems can be used in any Simulink model, and any existing Simulink block or Simulink model (an entire Simulink diagram) which has a single (vector) input and single (vector) output can be used with the infrastructure.

Combinations of Systems

Feedback combination
Cascade combination

Whenever possible, structure in the equations is preserved on combination. A polynomial system that is feedback combined with another polynomial system produces a new polynomial system. However, if a polynomial system is feedback combined with a Simulink Block, then the new system is a DynamicalSystem, but not a PolynomialSystem.

A combination of two systems should return a system of the type that is the least common ancestor in the class hierarchy. There are two exceptions: combinations with a hybrid system stay hybrid, and combinations with a stochastic system stay stochastic.

Coordinate frames

Every dynamical system in Drake has a number of coordinate frames associated with it, including frames which describe the inputs, states, and outputs of the system. These coordinate frames provide extra information about the vector signals, including a name for each element. Must match to allow combination. Hybrid modes automatically inherit the input and output coordinate frame of the hybrid system; the hybrid system does not have a coordinate frame for the state.


Once you have acquired a DynamicalSystem object describing the dynamics of interest, the most basic thing that you can do is to simulate it. This is accomplished with the simulate method in the DynamicalSystem class, which takes the timespan of the simulation as a 1x2 vector of the form [t0 tf] and optionally a vector initial condition as input. Type help DynamicalSystem/simulate for further documentation about simulation options.

Every simulate method outputs a instance of the Trajectory class. To inspect the output, you may want to evaluate the trajectory at any snapshot in time using eval, plot the output using fnplt, or hand the trajectory to a visualizer (described below).

Simulating the simple systems and plotting the results

Use the following code to instantiate the SimpleCTExample, simulate it, and plot the results:

sys = SimpleCTExample;
traj = simulate(sys, [0 10], .99);

The arguments passed to simulate set the initial time to $t_0=0$, final time to $t_f=5$, and initial condition to $x_0=.99$. The code looks the same for the SimpleDTExample:

sys = SimpleDTExample;
traj = simulate(sys, [0 10], .99);

These generate the plots below

Simulation of SimpleCTSystem (left) and SimpleDTSystem (right).

The simulate method sets the input, $u$, to the system to zero. In order to simulate with an input tape, you should cascade a Trajectory object describing $u(t)$ with the system, then simulate. If you do not specify the initial conditions on your call to simulate, then the getInitialState() method is called on your DynamicalSystem object. The default getInitialState method sets $x_0 = 0$; you should consider overloading this method in your DynamicalSystem class. In special cases, you may need to set initial conditions based on the initial time or input -- in this case you should overload the method getInitialStateWInput.

By default, uses the Simulink backend for simulation, and makes use of a number of Simulink's advanced features. For example, input limits can cause discontinuities in the derivative of a continuous time system, which can potentially lead to inaccuracy and inefficiency in simulation with variable-step solvers; avoids this by registering "zero-crossings" for each input saturation which allow the solver to handle the derivative discontinuity event explicitly, without reducing the step-size to achieve the accuracy tolerance. You can change the Simulink solver parameters using the setSimulinkParam method. For DrakeSystems, you can optionally use the MATLAB's ode45 suite of solvers using the method simulateODE, which can also understand (most of) the Simulink solver parameters.


Visualizers are dynamical systems, too.


Use ball bouncing as an example.

Cascading. Use the realtime block.

Outputting to a movie format

There are a number of options for creating a movie file from your simulation, depending on which Visualizer you used. Given a trajectory, traj, and a Visualizer object, v, use


If you specify an extension in your filename when you are prompted, then playbackMovie will attempt to write in the specified format. If you specify a filename with no extension, then you will be prompted with the output formats supported by your current visualizer.

Listing of examples throughout this text

The best overview we have for the remaining capabilities in are the main chapters of these notes. For convenience, here is a list of software examples that are sprinkled throughout the text:

Coming soon!

External interfaces

This section explains the most common ways that we interface with real robots - using simple network protocols. Typically one or more instances of (desktop) MATLAB are up running methods/systems in as nodes in a planning/estimation/control network. However we are rapidly porting the more mature functionality down into C++ so that it is available as a library or for stand-alone applications.

Lightweight Communications and Marshalling (LCM)

LCM is a simple, elegant, and efficient network protocol developed at MIT during the DARPA Urban Challenge (http://lcm-proj.github.io/) Huang10. For , we primarily interact with LCM via the MATLAB/Java interface and lcm-java. For some high-performance applications, we write custom C mex function which send and receive LCM messages.

LCM Coders

LCM passes messages defined in an lcmtype file which can describe an almost arbitrary data structure. Drake traffics primarily in signals, which are (timestamped) vectors of doubles. To convert between the two, we define an LCMCoder class which defines how to encode a double as the rich data structure and decode the data structure back into a double.

You can implement your LCMCoder directly in MATLAB or in Java. The LCMCoderFromType class can also construct one automatically using Java reflection -- it simply concatenates the elements of the structure into a double vector in the order that they appear in the lcmtype file.

LCM Coordinate Frames

Any coordinate frame can be associated with an LCMCoder by deriving from the LCMCoordinateFrame class.

In order to use any DynamicalSystem as a node in the LCM network, simply call runLCM(sys,...) instead of simulate(sys,...) -- this will automatically attach LCM input blocks and LCM output blocks on any input/output frames of the system that are of type LCMCoordinateFrame, and then simulate the system. The runLCM method takes a number of optional arguments allowing you to refine this interaction.

Interfacing with ROS

The Robot Operating System (ROS) is an immensely popular suite of tools for robotics research. Nodes in the ROS network communicate with a very similar message passing system to LCM -- one primary different is that ROS is based on TCP/IP and a server/client network configuration, where LCM is based on all-to-all multicast UDP. Typically when we need Drake to act as a node in a ROS network (or to subscribe to ROS messages), we send and receive LCM messages and have a thin "ROS-LCM translator" app that connects the two networks.

With the MathWork's recently announced support for ROS, implementing something like a ROSCoordinateFrame is now on our short-term todo list.

The online community

Again, please engage us on the forums on the website, http://drake.mit.edu if you have questions or find any problems with these examples. And please make sure you search the existing issues first.