Characteristic Function as a Fourier Transform

Indeed, the fact that this is a Fourier transform is by and large a mathematical coincidence; the intuition comes not from interpreting it as a Fourier transform, but by considering it from another angle, that of moment generating functions.

Throughout this answer, I assume all random variables are real-valued; it seems like that's what you're concerned about anyway.

If you have done some statistics, you are almost certainly familiar with the concept of the moment generating function of $X$, $$ M_X : \mathbb R \to \mathbb R \\ M_X(t) = \mathbb E\big[e^{tX}\big]. $$ This function has many nice properties. For instance, the $n$-th moment of $X$, $\mathbb E\big[X^n\big]$, can be found by computing $M_X^{(n)}(0)$, the $n$-th derivative of $M_X$ evaluated at $0$. Another important application is the fact that two random variables with the same moment generating function have the same distribution; that is to say, the process of determining a moment generating function is "invertible". A third and also significant application is the fact that, for any two independent random variables $X$ and $Y$, we have \begin{align*} M_{X+Y}(t) &= \mathbb E \big[e^{t(X+Y)}\big] \\ &= \mathbb E \big[e^{tX} e^{tY}\big] \\ &= \mathbb E \big[e^{tX} \big] \mathbb E \big[e^{tY} \big] \\ &= M_X(t)M_Y(t). \end{align*} (In a somewhat informal sense the third equality follows by considering $e^{tX}$ and $e^{tY}$ as independent random variables.) In conjunction with the fact that moment generating functions are invertible, this essentially permits us to derive a formula for the distribution of the sum of two independent random variables; hopefully, this application also makes clear why there is a seemingly arbitrary exponential in the definition of the moment generating function.

Now, the classical example of an application of moment generating functions is in the proof of the Central Limit Theorem. They are a natural candidate, because CLT involves the sums of independent random variables, and moment generating functions are well-equipped to deal with such matters. However, there is a glaring issue with their use: moment generating functions do not always exist. In particular, a random variable with infinite mean will not have a convergent moment generating function for any $t$ other than $0$.

This is where characteristic functions come in. As you know, we define a characteristic function by $$ \varphi_X : \mathbb R \to \mathbb C \\ \varphi_X(t) = \mathbb E \big[ e^{itX} \big]. $$ All of the nice properties that applied for moment generating functions mentioned above still apply for characteristic functions. In particular:

  • the $n$-th moment of $X$ can be found as $(-i)^{(n)} \varphi_X^{(n)}(0)$, if it exists

  • two random variables with the same characteristic function have the same distribution

  • $\varphi_{X+Y}(t) = \varphi_X(t)\varphi_Y(t)$ for independent r.v.s $X$, $Y$ (this is proven essentially the same way as before).

The critical difference with moment generating functions is this: characteristic functions always exist, at least of real-valued random variables. The intuitive reason that characteristic functions will always exist is that the possible values taken by $e^{itX}$ all lie on the unit circle, hence are bounded, and so intuitively the integral defining the expected value will take a finite value somewhere within the unit circle. Going back to the CLT example, this then allows us to complete our proof without issue; indeed, if you are interested, the proof on the Wikipedia page uses characteristic functions.

Based on this little narrative, it is pretty clear that the entire motivation for the introduction of $i$ in the exponent of the characteristic function is the fact that convergence will be guaranteed for a real-valued random variable. It is not much more than a nice mathematical coincidence that the characteristic function coincides with the Fourier transform, and it makes little sense (at least in my opinion) to try and carry over intuitions from the Fourier transform to the characteristic function; instead, the intuition can be seen by thinking about how this function might have been discovered in the first place.


I really like the answer by @hdighfan, but I would also like to approach the OP question from a different angle:

IMHO, one of the most interesting things about Fourier transform is that it transforms convolutions to multiplications, and the latter is generally easier to deal with when solving actual problems.

So then FTs are useful where convolutions are natural. Now, I cannot comment on physics due to lack of background, but I know more of control theory / system theory, i.e. where an input goes into a system and an output comes out the other end. The "beginners" case is linear, time-invariant (LTI) systems, where the output is the convolution of the input and the system response. So in this context, Fourier and Laplace transforms are very, very useful.

Another way to to look at this is to ask, why is the frequency domain representation useful at all? This is because, in an LTI system, pure sinusoids represent the basis. So you decompose an input into sinusoids (FT), multiply each by a linear gain that is frequency-dependent, and then reassemble (inverse FT). The reason "energy at frequency $f$" is a useful concept is (IMHO) largely because a sinusoid at frequency $f$ will go through the system and come out still as a sinusoid at frequency $f$. That plus many real-life systems can be very usefully described (or design-specified) as low-pass, band-pass, high-pass, etc.

So in short, IMHO the frequency domain interpretation is less useful in probability because there is just fewer equivalent natural situation (e.g. the equiv. of low-pass systems). Regardless, FTs still turn convolutions into multiplications. Now where do convolutions occur in probability? They occur when you sum up independent variables. That's why the characteristic function is exactly in the sweet spot for proving the CLT.