Dual norm intuition

Here's the way I like to think about it. I'll start with the finite dimensional space $\Bbb{R}^n$ because it looks like that's where you are, but I'll give an analogy for infinite dimensional spaces as well.

The quantity $z^Tx$ represents a linear functional on $\Bbb{R}^n$, that is a linear function which eats a vector and spits out a real number:

$$ f_z(x):\Bbb{R}^n\rightarrow\Bbb{R}\quad \text{such that }\quad f_z(\alpha x+\beta y)=\alpha f_z(x)+\beta f_z(y)\quad \forall \alpha,\beta\in\Bbb{R},x,y\in\Bbb{R}^n $$

Because of the Riesz Representation Theorem, we know that any linear function $f:\Bbb{R}^n\rightarrow\Bbb{R}$ will take the form $f=f_z$ for some $z\in\Bbb{R}^n$, i.e. $f(x) = z^Tx$.

The question is now this: given a linear function(al) $f_z(\cdot)$, how "big" is it? Well, to measure the size of vectors, we look at norms, so the idea is simple: how big is the number $f_z(x)=z^Tx$ relative to the size (norm) of $x$? This is exactly the number

$$ \frac{z^Tx}{\|x\|} $$ We then say that the norm of $z$ is the largest this quantity can possibly be:

$$ \|z\|_* = \sup_{x\neq 0} \frac{z^Tx}{\|x\|} $$ In a way, this is a kind of "stretch factor", but the stretching is measured with respect to $\|x\|$, which is the way we're measuring the size of $x$. With a simple one-line proof, you can show that my way of defining $\|z\|_*$ is the same as yours.

This idea extends to infinite dimensional normed spaces such as $L^p$ as well - every normed space has a "dual" space of (continuous/bounded) linear functionals, i.e. mappings which eat vectors (which might actually be functions) and spit out numbers. Each of these functionals has an associated "size", and that size is given by the dual norm:

$$ \|f\|_* = \sup_{x\neq 0}\frac{f(x)}{\|x\|} $$

To really complete the picture - and to expand on a couple of comments - it helps to also think about the dual norm as a special case of an operator norm. The idea behind a general operator norm is pretty much the same as what I described above, but for a more general linear operator $A:X\rightarrow Y$ where $X$ and $Y$ are any normed linear spaces. In the case of linear functionals, $X$ is a vector space like $\Bbb{R}^n$ or $L^p$ etc, and $Y$ is simply the 'base field', $\Bbb{R}$ (or more generally $\Bbb{C}$). The idea is that $A$ eats vectors and spits out other vectors, and to measure the "size" of $A$ we might look again at the ratio of the size of $Ax$ (measured with the $Y$ norm) to the size of $x$ (measured with the $X$ norm):

$$ \frac{\|Ax\|_Y}{\|x\|_X} $$ The largest of these values over nonzero $x\in X$ is a good value for the size of $A$, because it tells us a sort of worst-case stretch factor: $$ \|A\|=\sup_{x\neq 0}\frac{\|Ax\|_Y}{\|x\|_X} $$

This is very similar to the idea of a singular value - in fact, if we use the Euclidean norm $\|\cdot\|_2$, the operator norm of a matrix is its largest singular value!


The dual space is a space of linear functionals. If we want to define a norm on the dual space, we do what we always do to measure the "size" of a linear transformation: we use an operator norm.

Alternatively, the dual norm of $z$ is the matrix norm of the matrix $z^T$.


Dual norm is a particular case of the support function, specifically it is the support function of the unit ball of the original norm. When the unit ball is smooth enough $\|z\|_*$ is the Euclidean distance from the origin to the hyperplane with the normal vector $z$ (of unit Euclidean length) tangent to the ball. The equation of this hyperplane is $z^Tx=\|z\|_*$.

In general, the ball can have corner points where there is no tangency, but the hyperplane is still "supporting" in the sense that it meets the boundary of the ball, but does not go inside it.