What is the intuition for the trace norm (nuclear norm)?

One potential intuition for the trace norm is as a way of turning the rank of a matrix (which is very discontinuous) into a norm (which is continuous). Specifically, the trace norm is the unique norm with the property that $\|P\|_{\mathrm{tr}} = \mathrm{rank}(P)$ for every orthogonal projection $P \in M_n(\mathbb{C})$.

Closely-related to this is the following characterization of the trace norm, which basically says that $\|X\|_{\mathrm{tr}}$ measures the "amount of rank-1 matrices" needed to construct $X$: $$ \|X\|_{\mathrm{tr}} = \inf\big\{ \sum_j \|X_j\| : X = \sum_j X_j, \ \ \mathrm{rank}(X_j) = 1 \ \ \forall j \big\}. $$

Alternatively, just like the $\ell_1$-norm is typically the "right" norm to use when dealing with probability distributions (after all, we want probabilities to add up to $1$, not their squares to add up to $1$), the trace norm is typically the "right" norm to use when dealing with their non-commutative analog (density matrices/quantum states).


Another answer is that $M_n$, the space of $n\times n$ complex matrices, carries an operator norm where the norm of a matrix is its norm as a linear operator from $\mathbb{C}^n$ to itself (giving $\mathbb{C}^n$ euclidean norm). For some of us, this is the most natural and useful norm on $M_n$.

With operator norm, $M_n$ is a finite dimensional Banach space, so it has a dual space, which is just $M_n$ equipped with trace norm. In infinite dimensions the trace class operators on $H$, with trace norm, form the (unique) predual of $B(H)$.

Edit: I should add something about how this relates to the $\ell^1$ norm. The operator norm of a diagonal matrix is the $\ell^\infty$ norm of its entries, so operator norm can be seen as a sort of generalization of $\ell^\infty$ norm. Indeed, $M_n$ with operator norm contains an isometric copy of $\ell^\infty_n$ as the diagonal matrices. So it is natural that the dual norm should be the $\ell^1$ norm on the diagonal matrices.


The trace-class norm of $A$ is about putting the $\ell^1$ norm on the singular values of $A$, whereas the Hilbert-Schmidt norm uses $\ell^2$ instead. So your question is basically: why should we care about $\ell^1$ and not only $\ell^2$? Things become only interesting in infinite dimensional spaces, where these norm are not equivalent anymore. This is why the trace-class norm usually arises when considering compact operators acting on infinite dimensional spaces. Books like "Trace ideals and their applications" by B. Simon will tell you more about these objects, depending on what you're looking for.