Algorithms for Walking, Running, Swimming, Flying, and Manipulation

© Russ Tedrake, 2024

Last modified .

How to cite these notes, use annotations, and give
feedback.

**Note:** These are working notes used for a course being taught
at MIT. They will be updated throughout the Spring 2024 semester. Lecture videos are available on YouTube.

You can also download a PDF version of these notes (updated much less frequently) from here.

The PDF version of these notes are autogenerated from the HTML version. There are a few conversion/formatting artifacts that are easy to fix (please feel free to point them out). But there are also interactive elements in the HTML version are not easy to put into the PDF. When possible, I try to provide a link. But I consider the online HTML version to be the main version.

- Preface
- Chapter 1: Fully-actuated vs Underactuated Systems
- Motivation
- Honda's ASIMO vs. passive dynamic walkers
- Birds vs. modern aircraft
- Manipulation
- The common theme
- Definitions
- Feedback Equivalence
- Input and State Constraints
- Nonholonomic constraints
- Underactuated robotics
- Goals for the course
- Exercises
- Chapter 2: The Simple Pendulum
- Introduction
- Nonlinear dynamics with a constant torque
- The overdamped pendulum
- The undamped pendulum with zero torque
- Orbit calculations
- The undamped pendulum with a constant torque
- The torque-limited simple pendulum
- Energy-shaping control
- Exercises
- Chapter 3: Acrobots, Cart-Poles, and Quadrotors
- The Acrobot
- Equations of motion
- The Cart-Pole system
- Equations of motion
- Quadrotors
- The Planar Quadrotor
- The Full 3D Quadrotor
- Balancing
- Linearizing the manipulator equations
- Controllability of linear systems
- The special case of non-repeated eigenvalues
- A general solution
- Controllability vs. underactuated
- Stabilizability of a linear system
- LQR feedback
- Partial feedback linearization
- PFL for the Cart-Pole System
- Collocated
- Non-collocated
- General form
- Collocated linearization
- Non-collocated linearization
- Task-space partial feedback linearization
- Swing-up control
- Energy shaping
- Cart-Pole
- Acrobot
- Discussion
- Other model systems
- Exercises
- Chapter 4: Simple Models of Walking and Running
- Limit Cycles
- Poincaré Maps
- Simple Models of Walking
- The Rimless Wheel
- Stance Dynamics
- Foot Collision
- Forward simulation
- Poincaré Map
- Fixed Points and Stability
- Stability of standing still
- The Compass Gait
- The Kneed Walker
- Curved feet
- And beyond...
- Simple Models of Running
- The Spring-Loaded Inverted Pendulum (SLIP)
- Analysis on the apex-to-apex map
- SLIP Control
- SLIP extensions
- Hopping robots from the MIT Leg Laboratory
- The 2D Hopper
- Running on four legs as though they were one
- Towards human-like running
- A simple model that can walk and run
- Juggling
- Exercises
- Chapter 5: Highly-articulated Legged Robots
- A thought experiment
- A spacecraft model
- Robots with (massless) legs
- Center of pressure (CoP) and Zero-moment point (ZMP)
- The special case of flat terrain
- An aside: Zero-moment point derivation
- A note about impact dynamics
- ZMP-based planning
- Heuristic footstep planning
- Planning trajectories for the center of mass
- The ZMP "Stability" Metric
- From a CoM plan to a whole-body plan
- Centroidal dynamics
- Spatial momentum
- Generalization to multibody
- Whole-Body Control
- Footstep planning and push recovery
- Beyond ZMP planning
- Exercises
- Chapter 6: Model Systems with Stochasticity
- The Master Equation
- Stationary Distributions
- Finite Markov Decision Processes
- Dynamics of a Markov chain
- Extended Example: The Rimless Wheel on Rough Terrain
- Randomized smoothing of contact dynamics
- Noise models for real robots/systems.
- Chapter 7: Dynamic Programming
- Formulating control design as an optimization
- Additive cost
- Optimal control as graph search
- Continuous dynamic programming
- The Hamilton-Jacobi-Bellman Equation
- Solving for the minimizing control
- Numerical solutions for $J^*$
- Value iteration with function approximation
- Linear function approximators
- Value iteration on a mesh
- Neural fitted value iteration
- Continuous-time systems
- Extensions
- Discounted and average cost formulations
- Stochastic control for finite MDPs
- Stochastic interpretation of deterministic, continuous-state value iteration
- Linear Programming Dynamic Programming
- Sums-of-Squares Dynamic Programming
- Exercises
- Chapter 8: Linear Quadratic Regulators
- Basic Derivation
- Local stabilization of nonlinear systems
- Finite-horizon formulations
- Finite-horizon LQR
- Time-varying LQR
- Local trajectory stabilization for nonlinear systems
- Linear Quadratic Optimal Tracking
- Linear Final Boundary Value Problems
- Variations and extensions
- Discrete-time Riccati Equations
- LQR with input and state constraints
- LQR on a manifold
- LQR for linear systems in implicit form
- LQR as a convex optimization
- Finite-horizon LQR via least squares
- Minimum-time LQR
- Parameterized Riccati Equations
- Exercises
- Notes
- Finite-horizon LQR derivation (general form)
- Chapter 9: Lyapunov Analysis
- Lyapunov Functions
- Global Stability
- LaSalle's Invariance Principle
- Relationship to the Hamilton-Jacobi-Bellman equations
- Lyapunov functions for estimating regions of attraction
- Robustness analysis using "common Lyapunov functions"
- Barrier functions
- Lyapunov analysis with convex optimization
- Linear systems
- Global analysis for polynomial systems
- Region of attraction estimation for polynomial systems
- The S-procedure
- Basic region of attraction formulation
- The equality-constrained formulation
- Searching for $V(\bx)$
- Convex outer approximations
- Regions of attraction codes in Drake
- Robustness analysis using the S-procedure
- Piecewise-polynomial systems
- Rigid-body dynamics are (rational) polynomial
- Linear feedback and quadratic forms
- Alternatives for obtaining polynomial equations
- Verifying dynamics in implicit form
- Finite-time Reachability
- Time-varying dynamics and Lyapunov functions
- Finite-time reachability
- Reachability via Lyapunov functions
- Control design
- Control design via alternations
- Global stability
- Maximizing the region of attraction
- State feedback for linear systems
- Control-Lyapunov Functions
- Approximate dynamic programming with SOS
- Upper and lower bounds on cost-to-go
- Linear Programming Dynamic Programming
- Sums-of-Squares Dynamic Programming
- Alternative computational approaches
- Sampling Quotient-Ring Sum-of-Squares
- "Satisfiability modulo theories" (SMT)
- Mixed-integer programming (MIP) formulations
- Continuation methods
- Neural Lyapunov functions
- Contraction metrics
- Other variations and extensions
- Exercises
- Chapter 10: Trajectory Optimization
- Problem Formulation
- Convex Formulations for Linear Systems
- Direct Transcription
- Direct Shooting
- Computational Considerations
- Continuous Time
- Nonconvex Trajectory Optimization
- Direct Transcription and Direct Shooting
- Direct Collocation
- Pseudo-spectral Methods
- Dynamic constraints in implicit form
- Solution techniques
- Efficiently computing gradients
- The special case of direct shooting without state constraints
- Penalty methods and the Augmented Lagrangian
- Zero-order optimization
- Getting good solutions... in practice.
- Local Trajectory Feedback Design
- Finite-horizon LQR
- Model-Predictive Control
- Receding-horizon MPC
- Recursive feasibility
- MPC and Lyapunov functions
- Case Study: A glider that can land on a perch like a bird
- The Flat-Plate Glider Model
- Trajectory optimization
- Trajectory stabilization
- Trajectory funnels
- Beyond a single trajectory
- Pontryagin's Minimum Principle
- Lagrange multiplier derivation of the adjoint equations
- Necessary conditions for optimality in continuous time
- Variations and Extensions
- Differential Flatness
- Iterative LQR and Differential Dynamic Programming
- Leveraging combinatorial optimization
- Explicit model-predictive control
- Exercises
- Chapter 11: Policy Search
- Problem formulation
- Linear Quadratic Regulator
- Policy Evaluation
- A nonconvex objective in ${\bf K}$
- No local minima
- True gradient descent
- More convergence results and counter-examples
- Trajectory-based policy search
- Infinite-horizon objectives
- Search strategies for global optimization
- Policy Iteration
- Chapter 12: Sampling-based motion planning
- Large-scale Incremental Search
- Probabilistic RoadMaps (PRMs)
- Getting smooth trajectories
- Rapidly-exploring Random Trees (RRTs)
- RRTs for robots with dynamics
- Variations and extensions
- Decomposition methods
- Exercises
- Chapter 13: Robust and Stochastic Control
- Stochastic models
- Costs and constraints for stochastic systems
- Finite Markov Decision Processes
- Linear optimal control
- Stochastic LQR
- Non-i.i.d. disturbances
- Stochastic linear MPC
- Worst-case control w/ bounded uncertainty
- Common Lyapunov functions
- Polytope dynamics
- Robust MPC
- Polytopic containment
- Robust constrained LQR
- Disturbance-based feedback parameterizations
- $L_2$ gain
- Dissipation inequalities
- Small-gain theorem
- Model uncertainty as a special case.
- Robust LQR as $\mathcal{H}_\infty$
- Linear Exponential-Quadratic Gaussian (LEQG)
- Adaptive control
- Structured uncertainty
- Linear parameter-varying (LPV) control
- Trajectory optimization
- Monte-carlo trajectory optimization
- Iterative $\mathcal{H}_2$/iLQG
- Nonlinear analysis and control
- Domain randomization
- Extensions
- Alternative risk/robustness metrics
- Chapter 14: Feedback Motion Planning
- Parameterized feedback policies as "skills"
- The rules of composition
- Parameterized controllers and Lyapunov functions
- Probabilistic feedback coverage
- Online planning
- Chapter 15: Output Feedback (aka Pixels-to-Torques)
- Background
- The classical perspective
- From pixels to torques
- Static Output Feedback
- A hardness result
- Perhaps a history of observations?
- Partially-observable Markov Decision Processes (POMDPs)
- Linear systems w/ Gaussian noise
- Linear Quadratic Regulator w/ Gaussian Noise (LQG)
- Trajectory optimization with Iterative LQG
- Observer-based Feedback
- Luenberger Observer
- Disturbance-based feedback
- Optimizing dynamic policies
- Convex reparameterizations of $H_2$, $H_\infty$, and LQG
- Policy gradient for LQG
- Sums-of-squares alternations
- Teacher-student learning
- Feedback from pixels
- Chapter 16: Algorithms for Limit Cycles
- Trajectory optimization
- Lyapunov analysis
- Transverse coordinates
- Transverse linearization
- Region of attraction estimation using sums-of-squares
- Feedback design
- For underactuation degree one.
- Transverse LQR
- Orbital stabilization for non-periodic trajectories
- Chapter 17: Planning and Control through Contact
- (Autonomous) Hybrid Systems
- Hybrid trajectory optimization
- Given a fixed mode sequence
- Direct shooting
- Deriving hybrid models: minimal vs floating-base coordinates
- Discrete control (between events)
- Hybrid LQR
- Hybrid Lyapunov analysis
- Contact-implicit trajectory optimization
- Leveraging combinatorial optimization
- Exercises
- Chapter 18: System Identification
- Problem formulation
- Equation error vs simulation error
- Online optimization
- Learning models for control
- Parameter Identification for Mechanical Systems
- Kinematic parameters and calibration
- Estimating inertial parameters (and friction)
- Simultaneous kinematic and inertial identification via lumped parameters.
- Identification using energy instead of inverse dynamics.
- Residual physics models with linear function approximators
- Experiment design as a trajectory optimization
- Online estimation and adaptive control
- Identification with contact
- Identifying (time-domain) linear dynamical systems
- From state observations
- Model-based Iterative Learning Control (ILC)
- Compression using the dominant eigenmodes
- Linear dynamics in a nonlinear basis
- From input-output data (the state-realization problem)
- Adding stability constraints
- Autoregressive models
- Statistical analysis of learning linear models
- Identification of finite (PO)MDPs
- From state observations
- Identifying Hidden Markov Models (HMMs)
- Neural network models
- Generating training data
- From state observations
- State-space models from input-output data (recurrent networks)
- Input-output (autoregressive) models
- Particle-based models
- Object-centric models
- Modeling stochasticity
- Control design for neural network models
- Alternatives for nonlinear system identification
- Identification of hybrid systems
- Task-relevant models
- Exercises
- Chapter 19: State Estimation
- Chapter 20: Model-Free Policy Search
- Policy Gradient Methods
- The Likelihood Ratio Method (aka REINFORCE)
- Sample efficiency
- Stochastic Gradient Descent
- The Weight Pertubation Algorithm
- Weight Perturbation with an Estimated Baseline
- REINFORCE w/ additive Gaussian noise
- Summary
- Sample performance via the signal-to-noise ratio.
- Performance of Weight Perturbation
- Chapter 21: Imitation Learning
- Behavior cloning
- Visuomotor policies (aka control from pixels)
- Supervised learning in a feedback loop: dealing with distribution shift
- Dealing with suboptimal and multimodal demonstrations
- Architectures for visuomotor policies
- (Multi-modal) input encoders
- Output/action decoders
- Diffusion Policy
- Denoising Diffusion models
- Diffusion Policy
- Diffusion Policy for LQG
- Inverse reinforcement learning
- Vistas
- Multitask / foundation models for control
- Distributed decentralized learning (aka "fleet learning")
- Be rigorous
- Appendix A: Drake
- Pydrake
- Online Jupyter Notebooks
- Running on Deepnote
- Running on Google Colab
- Enabling licensed solvers
- Running on your own machine
- Getting help
- Appendix B: Multi-Body Dynamics
- Deriving the equations of motion
- The Manipulator Equations
- Recursive Dynamics Algorithms
- Bilateral Position Constraints
- Bilateral Velocity Constraints
- Hybrid models via constraint forces
- The Dynamics of Contact
- Compliant Contact Models
- Rigid Contact with Event Detection
- Impulsive Collisions
- Putting it all together
- Time-stepping Approximations for Rigid Contact
- Complementarity formulations
- Anitescu's convex formulation
- Todorov's regularization
- The Semi-Analytic Primal (SAP) solver
- Beyond Point Contact
- Variational mechanics
- Virtual work
- D'Alembert's principle and the force of inertia
- Principle of Stationary Action
- Hamiltonian Mechanics
- Exercises
- Appendix C: Optimization and Mathematical Programming
- Optimization software
- General concepts
- Convex vs nonconvex optimization
- Constrained optimization with Lagrange multipliers
- Convex optimization
- Linear Programs/Quadratic Programs/Second-Order Cones
- Semidefinite Programming and Linear Matrix Inequalities
- Semidefinite programming relaxation of general quadratic optimization
- Sums-of-squares optimization
- Sums of squares on a Semi-Algebraic Set
- Sums of squares optimization on an Algebraic Variety
- DSOS and SDSOS
- Solution techniques
- Nonlinear programming
- Second-order methods (SQP / Interior-Point)
- First-order methods (SGD / ADMM)
- Penalty methods
- Projected Gradient Descent
- Zero-order methods (CMA)
- Example: Inverse Kinematics
- Mixed-discrete (combinatorial) and continuous optimization
- Search, SAT, First order logic, SMT solvers, LP interpretation
- Mixed-integer convex optimization
- Graphs of Convex Sets
- Shortest path problems
- Applications
- "Black-box" optimization
- Appendix D: An Optimization Playbook
- Matrices
- Ellipsoids
- Polytopes
- Perspective functions
- (Mixed-)Integer Programming
- Bilinear Matrix Inequalities (BMIs)
- Geometry (SE(3), Penetration, and Contact)
- Appendix E: Miscellaneous

**Model Systems**

**Nonlinear Planning and Control**

**Estimation and Learning**

**Appendix**

You can find documentation for the source code supporting these notes here.

This book is about nonlinear dynamics and control, with a focus on mechanical systems. I've spent my career thinking about how to make robots move robustly, but also with speed, efficiency, and grace. I believe that this is best achieved through a tight coupling between mechanical design, passive dynamics, and nonlinear control synthesis. These notes contain selected material from dynamical systems theory, as well as linear and nonlinear control. But the dynamics of our robots quickly get too complex for us to handle with a pencil-and-paper approach. As a result, the primary focus of these notes is on computational approaches to control design, especially using optimization and machine learning.

When I started teaching this class, and writing these notes, the computational approach to control was far from mainstream in robotics. I had just finished my Ph.D. focused on reinforcement learning (applied to a bipedal robot), and was working on optimization-based motion planning. I remember sitting at a robotics conference dinner as a young faculty, surrounded by people I admired, talking about optimization. One of the senior faculty said "Russ: the people that talk like you aren't the people that get real robots to work." Wow, have things changed. Now almost every advanced robot is using optimization or learning in the planning/control system.

Today, the conversations about reinforcement learning (RL) are loud and
passionate enough to drown out almost every other conversation in the room.
Ironically, now I am the older professor and I find myself still believing in
RL, but not with the complete faith of my youth. There is so much one can
understand about the structure of the equations that govern our mechanical
systems; algorithms which don't make use of that structure are missing
obvious opportunities for data efficiency and robustness. The dream is to
make the learning algorithms discover this structure on their own; but even
then it pays for you (the designer) to understand the optimization landscape
the learning systems are operating on. That's why my goal for this course is
to help *you* discover this structure, and to learn how to use this
structure to develop stronger algorithms and to guide your scientific
endeavors into learning-based control.

I'll go even further. I'm willing to bet that our views of intelligence
in 10-20 years will look less like feedforward networks with a training mode
and a test mode, and more like a *system* with dynamics that ebb and
flow in a beautiful dance with streams of incoming data and the ever-changing
dynamics of the environment. These systems will move more flexibly between
perception, forward prediction / sequential decision making, storing and
retrieving long-term memories, and taking action. Dynamical systems theory
offers us a way to understand and harness the complexity of these systems
that we are building.

Although the material in the book comes from many sources, the presentation is targeted very specifically at a handful of robotics problems. Concepts are introduced only when and if they can help progress the capabilities we are trying to develop. Many of the disciplines that I am drawing from are traditionally very rigorous, to the point where the basic ideas can be hard to penetrate for someone that is new to the field. I've made a conscious effort in these notes to keep a very informal, conversational tone even when introducing these rigorous topics, and to reference the most powerful theorems but only to prove them when that proof would add particular insights without distracting from the mainstream presentation. I hope that the result is a broad but reasonably self-contained and readable manuscript that will be of use to any enthusiastic roboticist.

The material in these notes is organized into a few main parts. "Model Systems" introduces a series of increasingly complex dynamical systems and overviews some of the relevant results from the literature for each system. "Nonlinear Planning and Control" introduces quite general computational algorithms for reasoning about those dynamical systems, with optimization theory playing a central role. Many of these algorithms treat the dynamical system as known and deterministic until the last chapters in this part which introduce stochasticity and robustness. "Estimation and Learning" follows this up with techniques from statistics and machine learning which capitalize on this viewpoint to introduce additional algorithms which can operate with less assumptions on knowing the model or having perfect sensors. The book closes with an "Appendix" that provides slightly more background (and references) for the main topics used in the course.

The order of the chapters was chosen to make the book valuable as a reference. When teaching the course, however, I take a spiral trajectory through the material, introducing robot dynamics and control problems one at a time, and introducing only the techniques that are required to solve that particular problem.

All of the examples and algorithms in this book, plus many more, are now
available as a part of our open-source software project:

Please see the appendix
for specific instructions for using