editing motion graphics video via motion vectorization and transformation

Motion graphics videos are widely used in Web design, digital advertising, animated logos and film title sequences, to capture a viewer’s attention.

But editing such video is challenging because the video provides a low-level sequence of pixels and frames rather than higher-level structure such as the objects in the video with their corresponding motions and occlusions.

We present a motion vectorization pipeline for converting motion graphics video into an SVG motion program that provides such structure.

The resulting SVG program can be rendered using any SVG renderer (e.g. most Web browsers) and edited using any SVG editor.

We also introduce a program transformation API that facilitates editing of a SVG motion program to create variations that adjust the timing, motions and/or appearances of objects.

We show how the API can be used to create a variety of effects including retiming object motion to match a music beat, adding motion textures to objects, and collision preserving appearance changes.

background

A more formal way of describing the problem above is to consider our data as observations of some signal which maps latent variables $(\theta_1, \ldots, \theta_m)$ to an observable space (typically high dimension Euclidean space—think image data). If each latent variable $\theta_i$ lies on a manifold $\mathcal{M}_i$, then we can model the entire latent space as a product manifold,

\[\mathcal{M} = \mathcal{M}_1 \times \cdots \times \mathcal{M}_m.\]

However, even with this information we know nothing about the geometry of the individual $\mathcal{M}_i$’s, which we call manifold factors. Is there some way of recovering this from our data? For example, can we determine that the swiss roll structure in Figure 1 is actually just a rectangle that’s been rolled up?

Unrolling a swiss roll dataset — Figure 1. One way of applying dimensionality reduction to a 3D swiss roll dataset is to "unroll" it as a rectangle.¹

The answer is yes! But to do so, we need some theoretical results first.

the laplacian over product spaces

The main theoretical result we can leverage is the way the Laplacian operator decomposes over product spaces. Recall that the Laplacian $\Delta$ of a function $f$ over Euclidean space is defind as $\Delta f = \nabla \cdot (\nabla f)$. Here, $\nabla \cdot$ denotes the divergence operator and $\nabla f$ is the gradient of $f$. For Riemannian manifolds, there actually exists a more general version called the Laplace-Beltrami operator, but for the sake of simplicity we’ll just stick with the real Laplacian here. Consider the spectrum of the Laplacian operator, i.e. the set of solutions to the Helmholtz equation

\[\Delta_\mathcal{M} f(\mathbf{x}) = \lambda f(\mathbf{x}), \qquad \forall \mathbf{x} \in \mathcal{M}\]

with Neumann boundary conditions

\[\nabla_\mathcal{M} \cdot \nu(\mathbf{x}) = 0, \qquad \forall \mathbf{x} \in \partial \mathcal{M}.\]

We call these $f(\mathbf{x})$ the Laplacian (Neumann) eigenfunctions. When $\mathcal{M}$ is a product manifold with $m$ manifold factors, we can write every $f(\mathbf{x})$ as the product

\[f(\mathbf{x}) = \prod_{i=1}^m \bigg(f^{(i)}_{k_i} \circ \pi^{(i)} \bigg),\]

where $\pi^{(i)} : \mathcal{M} \rightarrow \mathcal{M}_i$ is the canonical projection. More colloquially, each eigenfunction decomposes as the product of one eigenfunction from every manifold factor. In addition, we have another nice property:

\[\lambda_{k_1, k_2, \ldots, k_m} = \sum_{i=1}^m \lambda^{(i)}_{k_i},\]

i.e. the eigenvalue of the eigenfunction is precisely the sum of the eigenvalues of the eigenfunctions which form the product. Lastly, these eigenvalues are all nonnegative and monotonically increasing.

To illustrate with an example, consider the simplest possible product space—a rectangle, which is the product of two closed line segments. Figure 2 shows the part of the spectrum of the rectangle, broken down into a sort of infinite multiplication table of the spectrum of each line segment.

Decomposition of the Laplacian over product spaces — Figure 2. Each product eigenfunction of the rectangle is the product of an eigenfunction of the vertical line segment and an eigenfunction of the horizontal line segment.

Borrowing some fundamental terminology, we call the eigenfunctions along the top row and left column “factor eigenfunctions” and the rest of the eigenfunctions “product eigenfunctions.”

our method

Suppose we are just in the two manifold factor case right now, i.e. $\mathcal{M} = \mathcal{M}_1 \times \mathcal {M}_2$. Minus some details, our factorization algorithm can be defined in three steps:

Compute the spectrum of the Laplacian. Since we don’t have infinite data samples, we need to approximate the eigenfunctions. We can do so by computing $N$ eigenvectors of the graph Laplacian instead. For more information on the asymptotics of the eigenvectors, see Stefan Lafon’s PhD dissertation.
Factorize the eigenvectors. From step 1, we obtain a set of eigenvectors sorted by increasing eigenvalue. Using our result that product eigenvectors factorize into factor eigenvectors, for each eigenvector $\varphi_k$, we find the factorization $\varphi_i \times \varphi_j$ which is closest to $\varphi_k$. This “closeness” is measured by the absolute cosine similarity,²
\[S(\varphi_k, \varphi_i\varphi_j) := \frac{|\langle \varphi_k , \varphi_i\varphi_j\rangle|}{||\varphi_k|| \cdot ||\varphi_i\varphi_j||}.\]
This gives us a list $\mathcal{T}$ of triplets $(i,j,k)$, where each triplet contains our best approximation $\varphi_i \times \varphi_j \approx \varphi_k$.
Group factor eigenvectors by manifold factor. Define a weighted graph $H = (V, E, W)$ where $V$ is the set of unique factor eigenvectors in $\mathcal{T}$. The edge weights are given by the highest similarity score
\[W_{ij} = \max_{(i,j,k) \in \mathcal{T}} S(\varphi_k, \varphi_i\varphi_j).\]
Our problem now becomes one of separating the vertices into two groups, with higher priority given to separating vertices with high edge weights. Thus, this becomes formulated as a MAX-CUT problem. By using a MAX-CUT SDP solver with $H$ as the input, we can sort our factor eigenvectors into two bins that correspond to $\mathcal{M}_1$ and $\mathcal{M}_2$.

Figure 3. We can reformulate our problem of grouping factor eigenvectors by manifold factor as a MAX-CUT problem.

some fun results!

An interesting application of this method is in structural biology—specifically, applied to cryo-electron miscroscopy (cryo-EM) data. Cryo-EM is a microscopy technique that won the Nobel Prize in chemistry and also played a role in many of the COVID-19 studies done over the past year.

Real and synthetic cryo-EM data — Figure 4. Real Cryo-EM images (left) and our synthetic cryo-EM data (right). The molecule used to generate the data is soured from here.³

The main problem in cryo-EM is a reconstruction problem: given all these images, can we recover information about the 3D structure of the molecule being imaged? In particular, a macromolecule may be composed of several different independently moving subunits. If each of these subunits moves continuously, then the shape space of the entire molecule can be modeled as a product manifold!

In our experiments, we use a model of a molecule with two independently moving subunits (see Figure 4, right). The red subunit rotates freely around the z-axis (latent space = circle), and the blue subunit stretches linearly along a limited range of the x-axis (latent space = line segment). Applying our algorithm gives two eigenvector groups, shown in Figure 5. We can see that the group on the left corresponds to the spectrum of the blue subunit (eigenvectors of a line) and the group on the right corresponds to the spectrum of the red subunit (eigenvectors of a circle).

Figure 5. Applying our method to cryo-EM data yields two groups of factor eigenvectors which describe the eigenspaces of the red and blue subunits in the original macromolecule.

So now we have eigenvector groups corresponding to each manifold factor. What’s next? Well, one thing we can do is use these eigenvectors to visualize the structure of each manifold factor. To do so, we can plot a diffusion map embedding of each eigenspace. Figure 6 shows visualizations using our method compared to standard diffusion maps and linear ICA. Without the factorization, it’s hard to get a good understanding of the individual structures within the data.

Product manifold geometry visualizations — Figure 6. Various visualizations of the geometry of different product manifolds. Note that having the factorized eigenspace is especially useful in isolating individual structures within the data.⁴

future work

A big question for future research is what this looks like in the case of $m > 2$ manifold factors. There are at least two ways to approach this—for one, we can simply use a MAX k-CUT SDP and slightly modify how we search for factorizations and construct $H$ in steps (2) and (3). This depends mostly on the quality of the SDP. Alternatively, we can apply the $m = 2$ case iteratively, and keep factorizing the groups of eigenvectors that we get until we reach the correct number of manifold factors. In any case, there is definitely room for improvement here.

On a different note, I also had a great presenting at AISTATS! Unfortunately everything was virtual this year, but despite this we actually had a good amount of visitors come to our poster and engage in some nice discussions with us. It was great to experience a conference poster session for the first time (I was definitely a little nervous, even behind a screen), and I’m looking forward to more conferences in the future.

This work was done in collaboration with Amit Moscovich and Amit Singer.

¹Figures taken from http://www.cis.jhu.edu/~cshen/html/PublishSwissRoll.html.

²We use the absolute cosine similarity rather than the signed cosine similarity because we can't distinguish between $\varphi_i$ and $-\varphi_i$.

³Left figure taken from https://en.wikipedia.org/wiki/Cryogenic_electron_microscopy.

⁴What we call a "3D rectangle" is just a 2D rectangle observed in 3D space, with Gaussian noise added in the z-direction.

sharon zhang