JOVANA
Library Glossary Getting Started Three Levels Fields How it works Mission
Join the mission
All guides

Graph Laplacians and spectral clustering

Encode a graph as a matrix, take its eigenvectors, and the low-frequency ones reveal communities. The Fiedler vector cuts a network nearly in half along its weakest seam.

Two matrices that describe a graph

Number the n nodes. The adjacency matrix A has A[i,j] = 1 (or the edge weight) when i and j are connected. The degree matrix D is diagonal with each node's total connection weight. The graph Laplacian is L = D - A. It looks plain, but its spectrum is where a graph's geometry hides.

The spectrum reads off the connectivity

Because x^T L x is a sum of squares, L is symmetric positive semidefinite, so by the spectral theorem its eigenvalues are real and nonnegative: 0 = lambda_1 <= lambda_2 <= ... <= lambda_n. The all-ones vector gives x_i - x_j = 0 on every edge, so lambda_1 = 0 always. A beautiful theorem: the number of zero eigenvalues equals the number of connected components.

So if your graph is one piece, lambda_2 is the first nonzero eigenvalue. Its size is the algebraic connectivity: small means the graph is nearly two pieces joined by a thin bridge. Its eigenvector — the Fiedler vector — points across that bridge.

Cutting a graph with the Fiedler vector

Minimizing x^T L x while keeping x orthogonal to the all-ones vector and unit length is exactly a Rayleigh-quotient problem whose answer is the Fiedler vector. Intuitively it assigns each node a number so connected nodes get similar numbers — and the cheapest non-constant assignment splits the weak seam. Threshold its sign and you have two clusters.

# Two triangles joined by a single edge 3--4
#   nodes 1,2,3  fully connected ;  4,5,6 fully connected ;  edge 3-4
# Laplacian eigenvalues come out like:
#   0,  0.something (small!),  3, 3, 3, ...
# Fiedler vector v2 (the small one) looks like:
#   node:  1     2     3     4     5     6
#   v2 :  -.4   -.4   -.3   +.3   +.4   +.4
# sign split:  {1,2,3} | {4,5,6}   <- the obvious two communities
The sign of the second eigenvector names the communities.

Spectral clustering: more than two groups

  1. Build the Laplacian L (often the normalized version D^{-1/2} L D^{-1/2}).
  2. Take the eigenvectors for the k smallest eigenvalues and stack them as columns of an n-by-k matrix.
  3. Each node is now a point in R^k; run k-means on those points to get k clusters.