Random Projections

$\newcommand{\abs}[1]{\left\lvert#1\right\rvert}$ $\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$ $\newcommand{\inner}[1]{\left\langle#1\right\rangle}$ $\DeclareMathOperator*{\argmin}{arg\,min}$ $\DeclareMathOperator*{\argmax}{arg\,max}$ $\DeclareMathOperator*{\E}{\mathbb{E}}$ $\DeclareMathOperator*{\V}{\mathbb{V}}$

his post introduces random projections as a tool for dimensionality reduction in high-dimensional spaces. The emphasis is geometric: how random linear maps approximately preserve distances, inner products, and subspace structure. Probabilistic limit theory is used only implicitly; detailed asymptotic tools are treated separately.

Motivation: Geometry in High Dimensions

In high dimensions, many datasets exhibit redundancy: although observations live in $\mathbb{R}^N$ with $N$ large, their intrinsic dimension is often much smaller. Random projections exploit this by embedding high-dimensional data into a lower- dimensional space while approximately preserving Euclidean geometry.

The central question is:

How small can the target dimension be while preserving pairwise distances?

Random Linear Maps

Let $R \in \mathbb{R}^{n\times N}$ be a random matrix and define the linear map $f(x) = \frac{1}{\sqrt{n}} R x.$

Common choices for $R$ include:

i.i.d. Gaussian entries $R_{ij}\sim \mathcal N(0,1)$,
i.i.d. Rademacher entries $R_{ij}\in{\pm1}$ with equal probability,
sparse or structured random matrices.

The normalization ensures that $\mathbb E|f(x)|_2^2=|x|_2^2$.

Johnson–Lindenstrauss Lemma

The foundational result in random projection theory is the Johnson–Lindenstrauss (JL) lemma.

Let $Q={x_1,\ldots,x_q}\subset\mathbb{R}^N$ and let $\epsilon\in(0,1)$. There exists a linear map $f:\mathbb{R}^N\to\mathbb{R}^n$ with $n = O\!\left(\frac{\log q}{\epsilon^2}\right)$ such that, for all $u,v\in Q$, $(1-\epsilon)\|u-v\|_2^2 \le \|f(u)-f(v)\|_2^2 \le (1+\epsilon)\|u-v\|_2^2.$

Moreover, a random projection with i.i.d. Gaussian or Rademacher entries satisfies this property with high probability.

Interpretation

The embedding dimension $n$ depends logarithmically on the number of points, not on the ambient dimension $N$.
Pairwise Euclidean distances are approximately preserved.
The result is non-adaptive: the same random map works for all pairs in $Q$.

This phenomenon is a manifestation of concentration of measure in high- dimensional probability.

Distance and Inner Product Preservation

Distance preservation implies approximate inner product preservation. For centered data, $\langle u,v\rangle = \frac{1}{2}\left(\|u\|_2^2 + \|v\|_2^2 - \|u-v\|_2^2\right).$

Thus, JL embeddings preserve:

angles between vectors,
cosine similarity,
norms of individual vectors.

Subspace Embeddings

The JL lemma extends beyond finite point sets.

Let $S\subset\mathbb{R}^N$ be a fixed $k$-dimensional subspace. A random projection $f:\mathbb{R}^N\to\mathbb{R}^n$ with $n = O\!\left(\frac{k}{\epsilon^2}\right)$ satisfies $(1-\epsilon)\|x\|_2^2 \le \|f(x)\|_2^2 \le (1+\epsilon)\|x\|_2^2 \quad \forall x\in S$ with high probability.

Such maps are called oblivious subspace embeddings.

Random projections are geometry-preserving, not data-adaptive.

They complement, rather than replace, structured decompositions like SVD.

JL-type results explain why high-dimensional data often behaves “as if” it were low-dimensional.