In trying to learn trajectory optimization, I've found it
difficult to understand the terminology. This is in part due to the
complexity of the field, and in part due to conflicting information
on the topic. I've organized this page by trying to look at
different aspects that all methods have in common, and what the
different options are. In general, all topics discussed here have
to to with **transcription**, the process by which an optimal
control problem is transformed into a constrained non-linear
programming problem.

My information comes from a variety of sources: journal papers (mostly either by Anil Rao or Russ Tedrake), the documentation for GPOPS II, the source code of Drake, video lectures by Russ Tedrake, John Bett's book on optimal control, and a few websites.

- Matthew Kelly

All trajectory methods can either be described as
**shooting** methods or **simultaneous (collocation)**
methods. The key difference is that shooting methods use an
explicit integration scheme, where as simultaneous methods use an
implicit integration scheme to solve the dynamics.

Simple shooting schemes might use Euler integration or the mid-point method, while fancy shooting schemes would use high-order Runge-Kutta methods. Variable-order methods, such as Matlab's ode45, can also be used, but do so with caution: they can cause problems with the optimization solver. This occurs when a change in the decision variables causes the integration method to add an extra integration step.

Simultaneous methods, also called collocation methods, use implicit integration schemes. Low-order schemes would be backwards-Euler or the trapazoid method. High-order methods could include symplectic integrators or orthogonal quadrature.

As far as I can tell, there is no clear winner between the two schemes. The most important thing is that the integration scheme be well matched to the dynamics.

In general, all transcription methods represent a continuous trajectory using piece-wise polynomial interpolation. There are exceptions, but we can ignore them for now. In this framework there are two parameters: the number of polynomial segments, and the order of the interpolating polynomial.

One class of transcription methods, know as **h-methods**,
use low-order polynomials for interpolation, and a large number of
trajectory segments. Methods such as direct transcription and
direct collocation typically fall under this scheme, as well as
some implementations of multiple shooting. Convergence in a
h-method is typically obtained by increasing the number of
trajectory segments.

A second class of method, known as **p-methods**, use
high-order polynomials for interpolation, and a small number of
trajectory segments. Orthogonal collocation methods fall into this
category. Convergence in a p-method is typically obtained by
increasing the order of the interpolating polynomial.

There is no sharp division between these two methods, but rather a continuous spectrum. For example, the transcription program GPOPS II uses a hp-adaptive meshing method. It uses an intermediate number of trajectory segments, and medium-order interpolating polynomials. Convergence is obtained through increasing both the number of segments and the order of the polynomials, based on a special algorithm.

Which method is better? It depends on the problem. Even if the dynamics of the system are continuous, it is possible to have a discontinuous solution to the problem - imagine a bang-bang controller. If the underlying solution to the problem is analytic (smooth), then the a h-method will be superior. If the solution has discontinuities, in any derivative, then a p-method is a better choice. The papers by Anil Rao discuss these trade-offs in great deal.

A decision variable is something that the optimization program
can adjust to arrive at the desired solution. In trajectory
optimization, there are two types of decision variables. The
dynamics equations are differential equations in the **state**
variables and algebraic equations in the **control** variables.
Unknown parameters can be thought of as control variables with a
constraint that they must be constant. An example of a state
variable would the the position or velocity of a point-mass, while
a control would be the force applied to the mass.

All transcription methods enforce the dynamics by some sort of
defect matching. This can be done in two ways: **integral
methods** work by matching the state at the end-points of each
trajectory segment, while **derivative methods** work by
matching derivatives at special points inside of each trajectory
segment. If the defect constraints are satisfied, then the
trajectory satisfies the dynamics.

Integral methods match defects (differences) between the state at the end-points of each trajectory segment. The first version of the state is the values of the current guess at the trajectory. The second version of the state is obtained by integrating the dynamics of the interpolated states between the two end-points. All shooting methods and some collocation methods use this type of defect constraint.

Derivative methods match defects at the collocation points along each trajectory segment. These points correspond to the roots of the underlying orthogonal polynomials, and may or may not include the end-points of the trajectory segment. One version of the derivatives is obtained by analytically differentiating the interpolating polynomials. The second version of the derivatives is obtained by evaluating the dynamics at each collocation point. Some collocation methods use this type of defect constraint.

So far, we've assume that the dynamics function is continuous.
While this makes the problem (a lot) easier, it is not true for all
problem. For example, I'm interested in walking robots. The
dynamics of the system change whenever a foot makes or breaks
contact with the ground. There are two ways to deal with
discontinuous dynamics. The first is called **through-contact
optimization**, and it is similar to the time-stepping physics
engines found in most modern simulators. The second method I will
call **phase-based optimization** , and is similar to
event-based simulation.

Through-contact optimization is relatively new, with similar methods being developed by Russ Tedrake and Emo Todorov. The basic idea is to force all discontinuities into the decision (control) variables by adding constraints. For example, consider a walking robot. The dynamics for the system are modeled as if the robot was floating in the air, with some unknown contact forces acting on the feet. A constraint is added to ensure that the contact forces are zero when the foot is in the air. The contact forces are then passed to the transcription method as control variables.

Phase-based optimization is the historical way to deal with discontinuous dynamics, and is implemented in GPOPS II. The user breaks the problem up into several phases, each of which experiences continuous dynamics. The user then defines a series of transitions between each continuous phase. We will once again use walking as an example. The first phase would the configuration where both feet are on the ground, and the second phase would be when one foot is in the air.

Which is better? It depends on the problem. Through-contact optimization can solve for arbitrary transitions and contact times, but it suffers from being slow due to the added decision variables. It also looses accuracy since the transitions must occur at a grid-point. Phase-based optimization is fast and accurate, but it cannot handle arbitrary transitions.