# Underactuated Robotics

Algorithms for Walking, Running, Swimming, Flying, and Manipulation

Russ Tedrake

How to cite these notes, use annotations, and give feedback.

Note: These are working notes used for a course being taught at MIT. They will be updated throughout the Spring 2023 semester. Lecture videos are available on YouTube.

# Output Feedback (aka Pixels-to-Torques)

In this chapter we will finally start considering systems of the form: \begin{gather*} \bx[n+1] = {\bf f}(\bx[n], \bu[n], \bw[n]) \\ \by[n] = {\bf g}(\bx[n], \bu[n], \bv[n]),\end{gather*} where most of these symbols have been described before, but we have now added $\by[n]$ as the output of the system, and $\bv[n]$ which is representing "measurement noise" and is typically the output of a random process. In other words, we'll finally start addressing the fact that we have to make decisions based on sensor measurements -- most of our discussions until now have tacitly assumed that we have access to the true state of the system for use in our feedback controllers (and that's already been a hard problem).

In some cases, we will see that the assumption of "full-state feedback" is not so bad -- we do have good tools for state estimation from raw sensor data. But even our best state estimation algorithms do add some dynamics to the system in order to filter out noisy measurements; if the time constants of these filters is near the time constant of our dynamics, then it becomes important that we include the dynamics of the estimator in our analysis of the closed-loop system.

In other cases, it's entirely too optimistic to design a controller assuming that we will have an estimate of the full state of the system. Some state variables might be completely unobservable, others might require specific "information-gathering" actions on the part of the controller.

For me, the problem of robot manipulation is one important application domain where more flexible approaches to output feedback become critically important. Imagine you are trying to design a controller for a robot that needs to button the buttons on your shirt. Our current tools would require us to first estimate the state of the shirt (how many degrees of freedom does my shirt have?); but certainly the full state of my shirt should not be required to button a single button. Or if you want to program a robot to make a salad -- what's the state of the salad? Do I really need to know the positions and velocities of every piece of lettuce in order to be successful? These questions are (finally) getting a lot of attention in the research community these days, under the umbrella of "learning state representations". But what does it mean to be a good state representation? There are a number of simple lessons from output feedback in control that can shed light on this fundamental question.

# The classical perspective

To some extent, this idea of calling out "output feedback" as an advanced topic is a relatively new problem. Before state-space and optimization-based approaches to control ushered in "modern control", we had "classical control". Classical control focused predominantly (though not exclusively) on linear time-invariant (LTI) systems, and made very heavy use of frequency-domain analysis (e.g. via the Fourier Transform/Laplace Transform). There are many excellent books on the subject; Hespanha09+Astrom10 are nice examples of modern treatments that start with state-space representations but also treat the frequency-domain perspective. "Pole placement" and "loop shaping" are some of the tools of this trade.

What's important for us to acknowledge here is that in classical control, basically everything was built around the idea of output feedback. The fundamental concept is the transfer function of a system, which is a input-to-output map (in frequency domain) that can completely characterize an LTI system. Core concepts like pole placement and loop shaping were fundamentally addressing the challenge of output feedback that we are discussing here. Sometimes I feel that, despite all of the things we've gained with modern, optimization-based control, I worry that we've lost something in terms of considering rich characterizations of closed-loop performance (rise time, dwell time, overshoot, ...) and perhaps even in practical robustness of our systems to unmodeled errors.

Add a few examples here that capture it.

# From pixels to torques

Just like some of our oldest approaches to control were fundamentally solving an output feedback problem, some of our newest approaches to control are doing it, too. Deep learning has revolutionized computer vision, and "deep imitation learning" and "deep reinforcement learning" have been a recent source of many impressive demonstrations of control systems that can operate directly from pixels (e.g. consuming the output of a deep perception system), without explicitly representing nor estimating the full state of the system. Unfortunately, the success or failure of these methods are not yet well understood, and they often require a great deal of artisanal tuning and an embarrassing (sometimes prohibitive) amount of computation.

The synthesis of ideas between machine learning (both theoretical and applied) and control theory is one of the most exciting and productive frontiers for research today. I am highly optimistic that we will be able to uncover the underlying principles and help transition this budding field into a technology. I hope that summarizing some of the key lessons from control here can help.

# Static Output Feedback

One of the extremely important almost unstated lessons from dynamic programming with additive costs and the Bellman equation is that the optimal policy can always be represented as a function $\bu^* = \pi^*(\bx).$ So far in these notes, we've assumed that the controller has direct access to the true state, $\bx$. In this chapter, we are finally removing that assumption. Now the controller only has direct access to the potentially noisy observations $\by$.

So the natural first question to ask might be, what happens if we write our policies now as a function, $\bu = \pi(\by)?$ This is known as "static" output feedback, in contrast to "dynamic" output feedback where the controller is not a static function, but is itself another input-output dynamical system. Unfortunately, in the general case it is not the case that optimal policies can be perfectly represented with static output feedback. But one can still try to solve an optimal control problem where we restrict our search to static policies; our goal will be to find the best controller in this class to minimize the cost.

# A hardness result

We've already seen an example of a very simple linear control problem where the set of stabilizing feedback gains formed a disconnected set -- which is suggestive that it could be a difficult problem for optimization. For some other problems in control, we've been able to find a convex reparametrization.

Unfortunately, Blondel97 showed that the question of whether a stabilizing static output feedback $\bu = -\bK \by$ even exists for a given system of the form $$\dot\bx = \bA\bx + \bB\bu,\quad \by = \bC \bx,$$ is NP-hard. Many of the strongest results from $H_2$ and $H_\infty$ design, for instance, are limited to dynamic controllers that can effectively reconstruct the entire state.

Just because this problem is NP-hard doesn't mean we can't find good controllers in practice. Some of the recent results from reinforcement learning have reminded us of this. We should not expect an efficient globally optimal algorithm that works for every problem instance; but we should absolutely keep working on the problem. Perhaps the class of problems that our robots will actually encounter in the real world is a easier than this general case (the standard examples of bad cases in linear systems, e.g. with interleaved poles and zeros, do feel a bit contrived and unlikely to occur in practice).

# Via policy search

Searching for the best controller within a parametric class of policies is generally referred to as policy search. If we do policy search on a class of static output feedback policies, how well does it perform? Of course, the answer depends on the particular governing equations (for instance, $\by = \bx$ is a perfectly reasonable output, and in this case the policy can be optimal). But we also have very simple counter-examples which demonstrate that the set of even stabilizing static output feedback controllers can form a disconnected set.

Jack had another nice example in his poly search lecture: https://youtu.be/JhjROrZxBhM?t=1099 and "what goes wrong in output feedback?" https://youtu.be/JhjROrZxBhM?t=5066 Bilinear alternations with SOS

# Observer-based Feedback

Since we know so much about designing full-state feedback controllers, one of the most natural (and dominant) approaches to control is to first design an observer (aka "state estimator"), and then to use state feedback. Famously, this approach is actually optimal for the quadratic regular objective on linear-Gaussian systems (LQG) -- this is known as the "separation principle". But it is certainly not optimal in general!

# Partially-observable Markov Decision Processes

Defer most of the discussion to the state estimation chapter

# Trajectory optimization with Iterative LQG

Note: I also have a draft chapter on planning under uncertainty

# Disturbance-based feedback

There is an interesting alternative to trying to observe/estimate the true state of the system, which can in some cases lead to convex formulations of the output-feedback objective. Rather than estimate the state (or belief state), one can try to estimate instead the disturbances which cause the state to deviate from the nominal trajectory, and design a feedback controller as a function of the disturbance. This is an old but important idea which was made famous first as the Youla parameterization (alternatively "Q-parameterization"). In the time domain this typically leads to controllers which are "unrolled in time" and depend on a potentially infinite history of disturbances; common practice it to approximate these with a finite-impulse response (FIR) truncation. One could imagine extracting a state-space realization of these FIR responses using the techniques from linear system identification Anderson17.

We can understand the essence of this idea with a simple extension of the LQR with least-squares derivation... (it's a work in progress!)

Given the state space equations: \begin{gather*} \bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bw[n],\end{gather*} Consider parameterizing an output feedback policy of the form $$\bu[n] = \bK_0[n] \bx_0 + \sum_{i=1}^{n-1}\bK_i[n]\bw[n-i],$$ then the closed-loop state is convex in the control parameters, $\bK$: \begin{align*}\bx[n] =& \left( {\bf A}^n + \sum_{i=0}^{n-1}{\bf A}^{n-i-1}{\bf B}{\bf K}_0[i] \right) \bx_0 + \sum_{j=0}^{n-1} \sum_{i=0}^{n-1}{\bf A}^{n-i-1}{\bf B}{\bf K}_{j}[i] \bw[i-j],\end{align*} and therefore objectives that are convex in $\bx$ and $\bu$ (like LQR) are also convex in $\bK$. Moreover, we can calculate $\bw[n]$ by the time that it is needed given our observations of $\bx[n+1], \bx[n],$ and knowledge of $\bu[n].$

We can extend this to output disturbance-based feedback: \begin{gather*} \bx[n+1] = \bA\bx[n] + \bB\bu[n] + \bw[n],\\ \by[n] = \bC\bx[n] + \bv[n], \end{gather*} by parameterizing an output feedback policy of the form $$\bu[n] = \bK_0[n] \by + \sum_{i=1}^{n-1}\bK_i[n]{\bf e}[n-i],$$ where ${\bf e}[n] = ...$ Sadraddini20.

# Optimizing dynamic policies

State-space models. ARX Models.

# Convex reparameterizations of $H_2$, $H_\infty$, and LQG

DGKF (solving two Riccati equations)Doyle88

Scherer's convex reparameterizations of LQGScherer15.

# Sums-of-squares alternations

Coming soon. See, for instance, Chou23.

# Teacher-student learning

as seen in Marco Hutter, Pulkit, ...
Task-relevant variables / learning state representations (reference "task-relevant models" section of the sysid notes.

# Feedback from pixels

In my opinion, one of the most important advances in control in the last decade has been the introduction of high-rate feedback from cameras. This advance was enabled by the revolution in computer vision that came with deep learning. Especially in the domain of robotic manipulation, the value of this feedback is undeniable. Unfortunately, these sensors break many of the synthesis tools that we've discussed in the notes -- not only are they very high dimensional, but the space of RGB images is horrible and non-smooth. As of this writing, conventional wisdom is that model-based control does not have a lot to offer to this problem -- to design control from cameras, we are often limited to either imitation learning or black-box reinforcement learning. (I personally think that we have thrown the baby out with the bathwater, and consider a highly important research area to close this gap.)

# Visuomotor policies

Visuomotor policies. Sergey's original paper; Pete/Lucas paper...

Diffusion policies Chi23. Transformer parameterizations Zhao23.