Tensor Decompositions for Data Science

A dataset is a multidimensional array

Drop the abstraction and a tensor is just a multidimensional array: a vector is 1-way, a matrix is 2-way, and a stack of matrices is a 3-way array of shape I x J x K. Think users x movies x time, or pixel-row x pixel-column x color, or word x document x topic. The number of indices is the order of the tensor.

CP: a sum of simple tensors

The SVD writes a matrix as a sum of R rank-one pieces sum_r sigma_r u_r v_r^T. The CP decomposition is the direct 3-way analogue: T = sum_{r=1}^R a_r tensor b_r tensor c_r, a sum of R simple tensors. The smallest R that works is precisely the tensor rank.

Matrix (order 2):  M = sum_{r=1}^R  sigma_r * (u_r v_r^T)      <- SVD
Tensor (order 3):  T = sum_{r=1}^R  a_r (x) b_r (x) c_r        <- CP

Entrywise:  T[i,j,k] = sum_{r=1}^R  A[i,r] * B[j,r] * C[k,r]
Factor matrices A (IxR), B (JxR), C (KxR) hold the component vectors.

Storage: a full IxJxK tensor costs I*J*K numbers;
         a rank-R CP model costs only R*(I+J+K)  -- a huge win when R is small.

Surprise vs. matrices: tensor rank can EXCEED every dimension,
and the best rank-R approximation may FAIL to exist (ill-posed).
That is the price of leaving the safe 2-way world.

CP generalizes the SVD's sum-of-rank-ones to a sum of simple tensors.

Tucker: a small core times factor bases

The Tucker decomposition is the other generalization of the SVD: compress each mode with its own factor matrix and glue them with a small dense core tensor G. It is a higher-order PCA — pick orthonormal axes per mode, then express the data in those axes. The multilinear ranks (one per mode) are the sizes of G.

Unfold the tensor along each mode into a matrix and take its SVD to get an orthonormal factor matrix U(1), U(2), U(3) per mode.
Keep only the top few singular directions in each mode — this is the per-mode low-rank approximation.
Project the data onto those axes to get the small core G; the model is G with the U(i) reattached.

From a first course in matrices to multiway compression of tensors, the thread never broke: the universal property gave tensors a home, index gymnastics and contraction gave them a calculus, the exterior and symmetric algebras explained determinants and quadratic forms, and CP and Tucker turn all of it into algorithms that compress video, recommend movies, and factor brain scans. That is the payoff of multilinear algebra.