\(\newcommand{\abs}[1]{\left\lvert#1\right\rvert}\) \(\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\) \(\newcommand{\inner}[1]{\left\langle#1\right\rangle}\) \(\DeclareMathOperator*{\argmin}{arg\,min}\) \(\DeclareMathOperator*{\argmax}{arg\,max}\) \(\DeclareMathOperator*{\E}{\mathbb{E}}\) \(\DeclareMathOperator*{\V}{\mathbb{V}}\)

T

his post collects the core definitions and inequalities for vector and matrix norms that are used throughout optimization, numerical linear algebra, and statistical learning. We introduce common vector and matrix norms, along with their key ordering, duality, and submultiplicativity properties. All results are deterministic and finite-dimensional; asymptotic and probabilistic tools are treated separately.


Vector Norms

A norm on a real vector space $V$ is a function \(\norm{\cdot}\colon V \to [0,\infty)\) that satisfies the following axioms for all $x,y\in V$ and all scalars $\lambda\in\mathbb{R}$:

  1. Non-negativity:
    \(\norm{x} \ge 0,\) and
    positive definiteness:
    \(\norm{x} = 0 \quad \iff \quad x = 0.\)

  2. Absolute homogeneity:
    \(\norm{\lambda x} = |\lambda|\,\norm{x}.\)

  3. Triangle inequality (subadditivity):
    \(\norm{x + y} \le \norm{x} + \norm{y}.\)

Different norms induce different geometries and optimization behavior, even though they are equivalent in finite dimensions.

For $\mathrm{x} = (x_i)_{i=1}^n \in \mathbb{R}^n$:

  • General $\ell_p$-norm: $\norm{\mathrm{x}}_p = \left(\sum _{i=1}^n \abs{x_i}^p\right)^{1/p}$, $p \in [1,\infty)$
  • Manhattan $\ell_1$-norm: $\norm{\mathrm{x}}_1 = \sum _{i=1}^n \abs{x_i}$
  • Euclidean $\ell_2$-norm: $\norm{\mathrm{x}}_2 = \left(\sum _{i=1}^n x_i^2\right)^{1/2}$
  • Maximum $\ell_\infty$-norm: $\norm{\mathrm{x}}_\infty = \max_i \abs{x_i}$
  • $\ell_0$ “norm”: Count of nonzero entries.

Usually, $\ell_2$ is common for distances, $\ell_1$ for sparsity, $\ell_\infty$ for robustness.

  • $\ell _0$ is not a true norm but is standard shorthand for sparsity.
  • $\ell _\infty$ arises as the limit of $\ell _p$ as $p \to \infty$.
  • In finite dimensions, all $\ell _p$ norms are equivalent up to constants.

Dual Norm

Given any norm \(\norm{\cdot}_{\ell_p}\) on $\mathbb{R}^n$, its dual norm $\norm{\cdot}_*$ is defined by

\[\|\mathbf{y}\|_* \;=\; \sup _{\mathbf{x}\neq 0} \frac{|\mathbf{x}^\top \mathbf{y}|}{\|\mathbf{x}\|_{\ell_p}} \;=\; \sup _{\|\mathbf{x}\|_{\ell_p}\le 1} |\mathbf{x}^\top \mathbf{y}|.\]

The dual norm of $\mathbf{y}$ measures the largest possible value of the inner product of $\mathbf{x}$ with $\mathbf{y}$ when $\mathbf{x}$ is restricted to lie in the unit ball of the original (primal) norm. Equivalently, it characterizes the maximum value of the linear functional generated by $\mathbf{y}$ over the unit ball.

Key Vector Inequalities

\[\norm{\mathrm{x}}_q \le \norm{\mathrm{x}}_p \le n^{1/p - 1/q} \norm{\mathrm{x}}_q, \quad (1 \leq p < q \le \infty)\]

For example:

\[\norm{\mathrm{x}}_\infty \le \norm{\mathrm{x}}_2 \le \norm{\mathrm{x}}_1 \le \sqrt{n}\norm{\mathrm{x}}_2 \le n\norm{\mathrm{x}}_\infty\]

Hölder’s inequality bounds the inner product,

\[|\mathrm{x}^\top \mathrm{y}| \le \norm{\mathrm{x}}_p \norm{\mathrm{y}}_q, \quad 1/p + 1/q = 1\]

which implies Cauchy-Schwarz inequality when $p=q=2$,

\[|\mathrm{x}^\top \mathrm{y}| \le \norm{\mathrm{x}}_2 \norm{\mathrm{y}}_2\]

and implies another interesting special case when $p=1, q=\infty$,

\[|\mathrm{x}^\top \mathrm{y}| \le \norm{\mathrm{x}}_1 \norm{\mathrm{y}}_\infty\]

By Hölder’s inequality, the dual norm associated with \(\|\cdot\|_p\) coincides with the $\ell _q$ norm, where $1/p + 1/q = 1$.

Minkowski inequality is a triangle inequality for $L^p$ spaces,

\[\norm{\mathrm{x}+\mathrm{y}}_p \le \norm{\mathrm{x}}_p + \norm{\mathrm{y}}_p, \quad p \in [1,\infty]\]

As a direct consequence, one also has \(\norm{\mathrm{x}-\mathrm{y}}_p \le \norm{\mathrm{x}}_p + \norm{\mathrm{y}}_p.\)

This confirms that the shortest path between two points is a straight line, even when you change how you measure “distance” (the $p$-norm).

Jensen’s inequality states that for a convex function, the function of an average is less than or equal to the average of the function,

\[f \left(\sum_i^n a_i x_i \right) \le \sum_i^n a_i f( x_i), \quad \sum_i^n a_i = 1\]

Matrix Norms

A matrix norm is a norm on the vector space $\mathbb{R}^{m\times n}$; that is, a function \(\|\cdot\|:\mathbb{R}^{m\times n}\to[0,\infty)\) satisfying the same axioms as a vector norm.

Matrix norms may be defined either as induced norms from vector norms or directly in terms of the entries or singular values of the matrix.

For $A \in \mathbb{R}^{m\times n}$, let $k=\min(m,n)$ and let the singular values be $s_1(A)\ge \cdots \ge s_k(A)\ge 0$ (with $s_{r}>0$ and $s_{r+1}=\cdots=s_k=0$ if $\mathrm{rank}(A)=r$). Write $s(A)=(s_1(A),\ldots,s_k(A))$.

  • Spectral (operator) norm ($2\to 2$ norm): \(\norm{A} _2 = \max _{x\neq 0}\frac{\norm{Ax} _2}{\norm{x} _2} = \max _{\norm{x} _2=1}\norm{Ax} _2 = \max _{\norm{x} _2=1,\ \norm{y} _2=1} y^\top A x = s_1(A) = \norm{s(A)} _\infty.\)

  • Induced (operator) $p$-norm: \(\norm{A} _p := \max _{x\neq 0}\frac{\norm{Ax} _p}{\norm{x} _p}, \qquad p\in[1,\infty].\)

Special cases:

\[\norm{A} _1 = \max _{1\le j\le n}\sum _{i=1}^m |A_{ij}| \quad\text{(max column sum)}, \qquad \norm{A} _\infty = \max _{1\le i\le m}\sum _{j=1}^n |A_{ij}| \quad\text{(max row sum)}.\]
  • Frobenius (Hilbert–Schmidt) norm: \(\norm{A} _F = \sqrt{\sum _{i=1}^m\sum _{j=1}^n |A_{ij}|^2} = \sqrt{\mathrm{tr}(A^\top A)} = \sqrt{\langle A,A\rangle}, \qquad \langle A,B\rangle := \mathrm{tr}(A^\top B),\)

Equivalently:

\[\norm{A} _F = \sqrt{\sum _{i=1}^k s_i(A)^2} = \norm{s(A)} _2.\]
  • Nuclear norm: \(\norm{A} _* = \sum _{i=1}^k s_i(A) = \norm{s(A)} _1.\)

  • Schatten $p$-norms (unifies spectral/Frobenius/nuclear): \(\norm{A} _{S,p} := \norm{s(A)} _p, \qquad p\in[1,\infty].\)

In particular,

\[\norm{A} _{S,\infty}=\norm{A} _2,\qquad \norm{A} _{S,2}=\norm{A} _F,\qquad \norm{A} _{S,1}=\norm{A} _*.\]

Key Matrix Inequalities

Throughout, let $A\in\mathbb{R}^{m\times n}$ and $B\in\mathbb{R}^{n\times k}$.

Submultiplicativity (induced norms)

For any induced matrix norm, \(\norm{AB} _p \le \norm{A} _p\,\norm{B} _p, \qquad p\in[1,\infty].\)

In particular, for the spectral norm, \(\norm{AB} _2 \le \norm{A} _2\,\norm{B} _2.\)


Operator norm ordering

For induced (operator) norms, \(\norm{A} _2 \;\le\; \sqrt{\norm{A} _1\,\norm{A} _\infty}.\)

In particular, \(\norm{A} _2 \le \norm{A} _F.\)


Mixed Frobenius–operator bounds

The Frobenius norm interacts with the spectral norm as

\[\norm{AB} _F \le \norm{A} _2\,\norm{B} _F, \qquad \norm{AB} _F \le \norm{A} _F\,\norm{B} _2.\]

Frobenius norm submultiplicativity (weaker form)

Although $\norm{\cdot}_F$ is not an induced norm, it satisfies \(\norm{AB} _F \le \norm{A} _F\,\norm{B} _F.\)


Singular value (Schatten norm) inequalities

Let $s(A)$ denote the vector of singular values.

  • Hölder inequality for Schatten norms: \(\norm{AB} _{S,r} \le \norm{A} _{S,p}\,\norm{B} _{S,q}, \qquad \frac{1}{p}+\frac{1}{q}=\frac{1}{r}.\)

  • In particular, \(\norm{AB} _* \le \norm{A} _2\,\norm{B} _*, \qquad \norm{AB} _* \le \norm{A} _*\,\norm{B} _2.\)


Trace inequalities

For compatible matrices $A,B$, \(|\mathrm{tr}(A^\top B)| \le \norm{A} _F\,\norm{B} _F.\)

More generally, using nuclear and spectral norms,

\[|\mathrm{tr}(A^\top B)| \le \norm{A} _*\,\norm{B} _2, \qquad |\mathrm{tr}(A^\top B)| \le \norm{A} _2\,\norm{B} _*.\]

Rank-dependent norm comparisons

If $\mathrm{rank}(A)=r$, then

\[\norm{A} _2 \le \norm{A} _F \le \sqrt{r}\,\norm{A} _2,\]

and

\[\norm{A} _F \le \norm{A} _* \le \sqrt{r}\,\norm{A} _F.\]

These inequalities are matrix analogues of norm equivalence in finite dimensions.