# how to factorize a manifold

[papers]  [data-analysis]  [statistics]  [conferences

21 apr 2021 | 8 min read

Imagine you have some sensory data of a robotic arm moving around. Given a little bit of prior info about this mechanism, we may reasonably assume that it has several joints at which the arm moves. How can we figure out what kinds of motions each of these joints exhibits? In our recent AISTATS paper, Product Manifold Learning, we tackle this question and others like it.

### background

A more formal way of describing the problem above is to consider our data as observations of some signal which maps latent variables $(\theta_1, \ldots, \theta_m)$ to an observable space (typically high dimension Euclidean space—think image data). If each latent variable $\theta_i$ lies on a manifold $\mathcal{M}_i$, then we can model the entire latent space as a product manifold,

$\mathcal{M} = \mathcal{M}_1 \times \cdots \times \mathcal{M}_m.$

However, even with this information we know nothing about the geometry of the individual $\mathcal{M}_i$’s, which we call manifold factors. Is there some way of recovering this from our data? For example, can we determine that the swiss roll structure in Figure 1 is actually just a rectangle that’s been rolled up?

The answer is yes! But to do so, we need some theoretical results first.

### the laplacian over product spaces

The main theoretical result we can leverage is the way the Laplacian operator decomposes over product spaces. Recall that the Laplacian $\Delta$ of a function $f$ over Euclidean space is defind as $\Delta f = \nabla \cdot (\nabla f)$. Here, $\nabla \cdot$ denotes the divergence operator and $\nabla f$ is the gradient of $f$. For Riemannian manifolds, there actually exists a more general version called the Laplace-Beltrami operator, but for the sake of simplicity we’ll just stick with the real Laplacian here. Consider the spectrum of the Laplacian operator, i.e. the set of solutions to the Helmholtz equation

$\Delta_\mathcal{M} f(\mathbf{x}) = \lambda f(\mathbf{x}), \qquad \forall \mathbf{x} \in \mathcal{M}$

with Neumann boundary conditions

$\nabla_\mathcal{M} \cdot \nu(\mathbf{x}) = 0, \qquad \forall \mathbf{x} \in \partial \mathcal{M}.$

We call these $f(\mathbf{x})$ the Laplacian (Neumann) eigenfunctions. When $\mathcal{M}$ is a product manifold with $m$ manifold factors, we can write every $f(\mathbf{x})$ as the product

$f(\mathbf{x}) = \prod_{i=1}^m \bigg(f^{(i)}_{k_i} \circ \pi^{(i)} \bigg),$

where $\pi^{(i)} : \mathcal{M} \rightarrow \mathcal{M}_i$ is the canonical projection. More colloquially, each eigenfunction decomposes as the product of one eigenfunction from every manifold factor. In addition, we have another nice property:

$\lambda_{k_1, k_2, \ldots, k_m} = \sum_{i=1}^m \lambda^{(i)}_{k_i},$

i.e. the eigenvalue of the eigenfunction is precisely the sum of the eigenvalues of the eigenfunctions which form the product. Lastly, these eigenvalues are all nonnegative and monotonically increasing.

To illustrate with an example, consider the simplest possible product space—a rectangle, which is the product of two closed line segments. Figure 2 shows the part of the spectrum of the rectangle, broken down into a sort of infinite multiplication table of the spectrum of each line segment.

Borrowing some fundamental terminology, we call the eigenfunctions along the top row and left column “factor eigenfunctions” and the rest of the eigenfunctions “product eigenfunctions.”

### our method

Suppose we are just in the two manifold factor case right now, i.e. $\mathcal{M} = \mathcal{M}_1 \times \mathcal {M}_2$. Minus some details, our factorization algorithm can be defined in three steps:

1. Compute the spectrum of the Laplacian. Since we don’t have infinite data samples, we need to approximate the eigenfunctions. We can do so by computing $N$ eigenvectors of the graph Laplacian instead. For more information on the asymptotics of the eigenvectors, see Stefan Lafon’s PhD dissertation.

2. Factorize the eigenvectors. From step 1, we obtain a set of eigenvectors sorted by increasing eigenvalue. Using our result that product eigenvectors factorize into factor eigenvectors, for each eigenvector $\varphi_k$, we find the factorization $\varphi_i \times \varphi_j$ which is closest to $\varphi_k$. This “closeness” is measured by the absolute cosine similarity,2

$S(\varphi_k, \varphi_i\varphi_j) := \frac{|\langle \varphi_k , \varphi_i\varphi_j\rangle|}{||\varphi_k|| \cdot ||\varphi_i\varphi_j||}.$

This gives us a list $\mathcal{T}$ of triplets $(i,j,k)$, where each triplet contains our best approximation $\varphi_i \times \varphi_j \approx \varphi_k$.

3. Group factor eigenvectors by manifold factor. Define a weighted graph $H = (V, E, W)$ where $V$ is the set of unique factor eigenvectors in $\mathcal{T}$. The edge weights are given by the highest similarity score

$W_{ij} = \max_{(i,j,k) \in \mathcal{T}} S(\varphi_k, \varphi_i\varphi_j).$

Our problem now becomes one of separating the vertices into two groups, with higher priority given to separating vertices with high edge weights. Thus, this becomes formulated as a MAX-CUT problem. By using a MAX-CUT SDP solver with $H$ as the input, we can sort our factor eigenvectors into two bins that correspond to $\mathcal{M}_1$ and $\mathcal{M}_2$.

### some fun results!

An interesting application of this method is in structural biology—specifically, applied to cryo-electron miscroscopy (cryo-EM) data. Cryo-EM is a microscopy technique that won the Nobel Prize in chemistry and also played a role in many of the COVID-19 studies done over the past year.

The main problem in cryo-EM is a reconstruction problem: given all these images, can we recover information about the 3D structure of the molecule being imaged? In particular, a macromolecule may be composed of several different independently moving subunits. If each of these subunits moves continuously, then the shape space of the entire molecule can be modeled as a product manifold!

In our experiments, we use a model of a molecule with two independently moving subunits (see Figure 4, right). The red subunit rotates freely around the z-axis (latent space = circle), and the blue subunit stretches linearly along a limited range of the x-axis (latent space = line segment). Applying our algorithm gives two eigenvector groups, shown in Figure 5. We can see that the group on the left corresponds to the spectrum of the blue subunit (eigenvectors of a line) and the group on the right corresponds to the spectrum of the red subunit (eigenvectors of a circle).

So now we have eigenvector groups corresponding to each manifold factor. What’s next? Well, one thing we can do is use these eigenvectors to visualize the structure of each manifold factor. To do so, we can plot a diffusion map embedding of each eigenspace. Figure 6 shows visualizations using our method compared to standard diffusion maps and linear ICA. Without the factorization, it’s hard to get a good understanding of the individual structures within the data.

### future work

A big question for future research is what this looks like in the case of $m > 2$ manifold factors. There are at least two ways to approach this—for one, we can simply use a MAX k-CUT SDP and slightly modify how we search for factorizations and construct $H$ in steps (2) and (3). This depends mostly on the quality of the SDP. Alternatively, we can apply the $m = 2$ case iteratively, and keep factorizing the groups of eigenvectors that we get until we reach the correct number of manifold factors. In any case, there is definitely room for improvement here.

On a different note, I also had a great presenting at AISTATS! Unfortunately everything was virtual this year, but despite this we actually had a good amount of visitors come to our poster and engage in some nice discussions with us. It was great to experience a conference poster session for the first time (I was definitely a little nervous, even behind a screen), and I’m looking forward to more conferences in the future.

This work was done in collaboration with Amit Moscovich and Amit Singer.

1Figures taken from http://www.cis.jhu.edu/~cshen/html/PublishSwissRoll.html.
2We use the absolute cosine similarity rather than the signed cosine similarity because we can't distinguish between $\varphi_i$ and $-\varphi_i$.
3Left figure taken from https://en.wikipedia.org/wiki/Cryogenic_electron_microscopy.
4What we call a "3D rectangle" is just a 2D rectangle observed in 3D space, with Gaussian noise added in the z-direction.

## main references

1. Sharon Zhang, Amit Moscovich, Amit Singer. "Product Manifold Learning." In AISTATS 2021.
2. Stefan Lafon. "Diffusion maps and geometric harmonics." PhD Dissertation, Yale University. May 2004.
3. Mikhail Belkin and Partha Niyogi. "Towards a theoretical foundation for Laplacian-based manifold methods." Journal of Computer and System Sciences, 74 (8):1289–1308, 2008.
4. Ronald R. Coifman and Stéphane Lafon. "Diffusion maps." Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.