Underactuated Robotics

Algorithms for Walking, Running, Swimming, Flying, and Manipulation

Russ Tedrake

© Russ Tedrake, 2024
Last modified .
How to cite these notes, use annotations, and give feedback.

Note: These are working notes used for a course being taught at MIT. They will be updated throughout the Spring 2024 semester. Lecture videos are available on YouTube.

Previous Chapter Table of contents Next Chapter

Sampling-based motion planning

The term "motion planning" is a quite general term which almost certainly encompasses the dynamic programming, feedback design, and trajectory optimization algorithms that we have already discussed. However, there are a number of algorithms and ideas that we have not yet discussed which have grown from the idea of formulating motion planning as a search problem -- for instance searching for a path from a start to a goal in a graph which is too large to fit in memory. In order to deal with the continuous domains that are common in robotics, these algorithms often rely on randomized sampling.

My goal for this chapter is to introduce these additional tools into our toolkit. For robotics, they will play a particularly valuable role when the planning problem is geometrically complex (e.g. a robot moving around obstacles in 3D) or the optimization landscape is very nonconvex, because these are the problems where the nonlinear trajectory optimization formulations that we've studied before will potentially suffer badly from local minima. Many of these algorithms were developed initially for discrete or purely kinematic planning problems; a major theme of this chapter will be the adaptations that allow them to work for underactuated systems.

LaValle06 is a very nice book on planning algorithms in general and on motion planning algorithms in particular. Compared to other planning problems, motion planning typically refers to problems where the planning domain is continuous (e.g. continuous state space, continuous action space), but many motion planning algorithms trace their origins back to ideas in discrete domains. The term kinodynamic motion planning is often used when the plant is subject to dynamic constraints like underactuation, in addition to kinematic constraints like obstacle avoidance.

Large-scale Incremental Search

For decades, many AI researchers felt that the route to creating intelligent machines was to collect large ontologies of knowledge, and then perform very efficient search. Some notable examples from this storied history include Samuel's checker players, theorem proving, Cyc, and Deep Blue playing chess.

Indeed, thanks to decades of research, planning algorithms in AI have also scaled to impressive heights, making efficient use of heuristics and factorizations to solve impressively large planning instances. Since 1998, the International Conference on Automated Planning and Scheduling (ICAPS) has been hosting regular planning competitions which have helped to solidify problem specification formats and to benchmark the state of the art.

These algorithms have focused on primarily on logical/discrete planning (although they do support "objects", and so can have a sort of open vocabulary). In the language of underactuated, this connects back to the discrete-state, discrete-action, discrete-time planning problems that we discussed to introduce dynamic programming as graph search. Dynamic programming is a very efficient algorithm to solve for the cost-to-go from every state, but if we only need to find an (optimal) path from a single start state to the goal, we can potentially do better. In particular, in order to scale to very large planning instances, one of the essential ideas is the idea of "incremental" search, which can avoid every putting the entire (often exponentially large and sometimes even infinite) graph into memory.

Is it possible to find an optimal path from the start to a goal without visiting every node? Yes! Indeed, one of the key insights that powers these incremental algorithms is the use of admissible heuristics to guide the search -- an approximate cost-to-go which is guaranteed to never over-estimate the cost-to-go. A great example of this would be searching for directions on a map -- the straight-line distance from the start to the goal ("as the crow flies") is guaranteed to never over-estimate the true cost to go (which has to stay on the roads). The most famous search algorithm to leverage these heuristics is A*. In robotics, we often use online planning extensions, such as D* and D*-Lite.

Discrete A*

The A* algorithm solves a shortest-path problem on a graph defined by vertices $v_i \in V$, with source vertex, $v_s$ and target vertex, $v_t$, and (weighted) edges defined implicitly by a function GetSuccessors($v_i$) which returns the set of vertices reachable from $v$. We additionally define:

  • $g(v)$ is the optimal "cost to come" from $v_s$ to $v$,
  • $h(v)$ is the optimal "cost to go" from $v$ to $v_t$,
  • $f(v) = g(v) + h(v)$ is the cost of the optimal path from $v_s$ to $v_t$ that is constrained to go through $v$.
Let's also define a path $p$ as a sequence of vertices (connected by edges in the graph), starting with $v_s$. We can then define the cost-to-come and cost-to-go for $p$:
  • $\tilde{g}(p)$ is the cost of traversing the path $p$,
  • $\tilde{h}(v)$ is a "heuristic" approximation the optimal cost to go from $v$ to $v_t,$
  • $\tilde{f}(p) = \tilde{g}(p) + \tilde{h}(v)$, where $v$ is the last vertex in the sequence $p$.
We say that a heuristic is admissible if it never over-estimates the cost to go: $$\tilde{h}(v) \le h(v), \forall v \in V.$$

Using these definitions, the A* algorithm can be written as follows:


\(p = [v_s]\)
current path
\(S = \{ v_s : p \}\)
"visited" dictionary
\(Q.\text{insert}(p)\)
priority queue prioritized by lowest $\tilde{f}$
while not \(Q.\text{empty}()\):
  \(p = Q.\text{pop}()\)
  \(u = p.\text{last}()\)
  if \(u = v_t,\) then return \(p\)
  for all \(v \in\) GetSuccessors\((u)\), where \(v \notin p\):
    \(p' = \text{Path}(p, u)\)
    if \(v \notin S\) or \(\tilde{g}(p') < \tilde{g}(S[v]),\):
      \(S[v] = p'\)
      \(Q.\text{insert}(p')\)
return failure

A* has a remarkable property: if $\tilde{h}(v)$ is an admissible heuristic, then A* will never expand paths for which $\tilde{f}(p) > f(v_s).$ Take a minute to appreciate that! If the heuristic $\tilde{h}(v)$ is the optimal cost to go, then A* will only visit vertices that are on the optimal path!

Unfortunately, the flip side of this is that A* will expand all paths in the graph for which $\tilde{f}(p) < f(v_s).$ Stronger heuristics lead to more performant/scalable search.

In addition to their empirical performance, there are a handful of features that one might like to understand about incremental planning algorithms like A*. The first is the notion of completeness; a planning algorithm is complete if it is guaranteed to find a feasible path from the start to the goal if one exists. A second is what guarantees we can obtain about optimality; an optimal planning algorithm is one that is guaranteed to find the optimal path. The A* algorithm is both complete and optimal.

In addition to approximate cost-to-go functions, there are numerous other heuristics that power state-of-the-art large-scale logical search algorithms. Another important one is factorization. For a robotics example, consider a robot manipulating many possible objects -- it's reasonable to plan the manipulation of one object assuming it's independent of the other objects and then to revise that plan only when the optimal plan ends up putting two objects on intersecting paths. Many of these heuristics are summarized nicely in the Fast Downward paperHelmert06; Fast Downward has been at the forefront of the ICAPS planning competitions for many years.

Some of the most visible success stories in deep learning today still make use of planning. For example: DeepMind's AlphaGo and AlphaZero combine the planning algorithms developed over the years in discrete games , notably Monte-Carlo Tree Search (MCTS), with learned heuristics in the form of policies and value functions. Finding ways to connect Large-Language Models (LLMs) with planning to support longer-term reasoning is an extremely active area of research.

Probabilistic RoadMaps (PRMs)

How can we generalize powerful incremental algorithms like A* to continuous domains?

If you remember how we introduced dynamic programming initially as a graph search, you'll remember that there were some challenges in discretizing the state space. Let's assume that we have discretized the continuous space into some finite set of discrete nodes in our graph. Even if we are willing to discretize the action space for the robot (this might even be acceptable in practice), we had a problem where discrete actions from one node in the graph, integrated over some finite interval $h$, are extremely unlikely to land exactly on top of another node in the graph. To combat this, we had to start working on methods for interpolating the value function estimate between nodes.

add figure illustrating the interpolation here

Interpolation can work well if you are trying to solve for the cost-to-go function over the entire state space, but it's less compatible with search methods which are trying to find just a single path through the space. If I start in node 1, and land between node 2 and node 3, then which node do I continue to expand from?

Fortunately, the incremental search algorithms from logical search already give us a way to think about this -- we can simply build a search tree as the search executes, instead of relying on a predefined mesh discretization. This tree will contain nodes rooted in the continuous space at exactly the points where system can reach.

Another potential problem with any fixed-mesh discretization of a continuous space, or even a fixed discretization of the action space, is that unless we have specific geometric / dynamic insights into our continuous system, it can be very difficult to provide a complete planning algorithm. A planning algorithm is complete if it always finds a path (from the start to the goal) when a path exists. Even if we can show that no path to the goal exists on the tree/graph, how can we be certain that there is no path for the continuous system? Perhaps a solution would have emerged if we had discretized the system differently, or more finely?

One approach to addressing this second challenge is to toss out the notion of a fixed discretizations, and replace them with random sampling (another approach would be to adaptively add resolution to the discretization as the algorithm runs). Random sampling, e.g. of the action space, can yield algorithms that are probabilistically complete for the continuous space -- if a solution to the problem exists, then a probabilistically complete algorithm will find that solution with probability 1 as the number of samples goes to infinity.

Rapidly-exploring Random Trees (RRTs)

With these motivations in mind, we can build what is perhaps the simplest probabilistically complete algorithm for finding a path from a starting state to some goal region within a continuous state and action space:

Planning with a Random Tree

Let us denote the data structure which contains the tree as ${\cal T}$. The algorithm is very simple:

  • Initialize the tree with the start state: ${\cal T} \leftarrow \bx_0$.
  • On each iteration:
    • Select a random node, $\bx_{rand}$, from the tree, ${\cal T}$
    • Select a random action, $\bu_{rand}$, from a distribution over feasible actions.
    • Compute the dynamics: $\bx_{new} = f(\bx_{rand},\bu_{rand})$
    • If $\bx_{new} \in {\cal G}$, then terminate. Solution found!
    • Otherwise add the new node to the tree, ${\cal T} \leftarrow \bx_{new}$.
It can be shown that this algorithm is, in fact, probabilistically complete. However, without strong heuristics to guide the selection of the nodes scheduled for expansion, it can be extremely inefficient. For a simple example, consider the system $\bx[n] = \bu[n]$ with $\bx \in \Re^2$ and $\bu_i \in [-1,1]$. We'll start at the origin and put the goal region as $\forall i, 15 \le x_i \le 20$.

Although this "straw-man" algorithm is probabilistically complete, it is certainly not efficient. After expanding 1000 nodes, the tree is basically a mess of node points all right on top of each other:

We're nowhere close to the goal yet, and this is a particularly easy problem instance.

The idea of generating a tree of feasible points has clear advantages, but it seems that we have lost the ability to mark a region of space as having been sufficiently explored. It seems that, to make randomized algorithms effective, we are going to at the very least need some form of heuristic for encouraging the nodes to spread out and explore the space.

(Click here to watch the animation)
(Click here to watch the animation)

RRTs for robots with dynamics

Variations and extensions

RRT*, RRT-sharp, RRTx, ...

Kinodynamic-RRT*, LQR-RRT(*)

Complexity bounds and dispersion limits

Decomposition methods

Cell decomposition...

Approximate decompositions for complex environments (e.g. IRIS)

Exercises

RRT Planning

In this we will write code for the Rapidly-Exploring Random Tree (RRT). Building on this implementation we will also implement RRT*, a variant of RRT that converges towards an optimal solution.

  1. Implement RRT
  2. Implement RRT*

References

  1. Steven M. LaValle, "Planning Algorithms", Cambridge University Press , 2006.

  2. Malte Helmert, "The fast downward planning system", Journal of Artificial Intelligence Research, vol. 26, pp. 191--246, 2006.

Previous Chapter Table of contents Next Chapter