QM and Renormalization (layman)

The best way to explain renormalization is to consider what at first looks like a complete detour: Mandelbrot's fractal geometry. Developed in the 1960s and 1970s, Mandelbrot's geometry is the key idea behind major advances in statistical physics in the early 1970s, pioneered by Leo Kadanoff (mostly independently), but also associated with Alexander Polyakov, Michael Fisher, Kenneth Wilson, and many others in the 1970s and 1980s, building on classical work of Fenyman and Onsager, and these ideas give renormalization theory its modern form.

The basic idea can be summarized in one sentence: renormalization is the analysis of mathematical objects whose fractal dimensions at small distances are either different from what you expect because of nonlinear interactions, or incipiently different from what you expect, so that the naive scaling is modified by logarithms.

It really belongs to pure mathematics, but it was developed almost entirely within physics, with Mandelbrot being the exception.

Power laws

if a quantity x depends on a quantity y in such a way that a rescaling of y can be compensated by a rescaling of x, then x and y are related by a power-law.

$$x = C y^\alpha$$

Where C, $\alpha$ are constants. Power laws are important because they are scale free, meaning that once you choose a scale for y, the scale for x is determined by setting the coefficient of the power-law to 1, but there is no absolute scale, no absolute units for y. This is best illustrated with examples.

Suppose you have pendulum of length L, and a mass swinging on the end. The period of the pendulum is

$$ T = 2\pi \sqrt{L\over g} $$

The form of this relation gives you no information about any atomic length scales. Whatever units you choose for L, you can find appropriate units for T by rescaling to make the coefficient of the relation be order 1.

On the other hand, suppose you look at the approximate density of the atmosphere as you go up in height y:

$$ \rho(y) = C e^{- Ay}$$

The dependence is exponential, so it determines a length scale, 1/A. This length scale is related by a power law to the other parameters, like the density and acceleration of gravity, so it isn't an atomic length scale, but an emergent one.

The difference between power-laws and other relations can be understood from dimensional analysis. The coefficient of a power law mixes up units of x and units of y, so it allows a simultaneous rescaling of both by compensating amounts. The coefficients in an arbitrary relation pick out a scale for variation, so they are not scale-invariant.

Scaling limits

When y has a tiny discreteness scale, like the length of a wire counted in number of atoms, you expect that at large numbers, the behavior of the p will be independent of the underlying discreteness. So that measuring the dependence of the period of the pendulum on the length will be useless in revealing how big atoms are.

In order for this to be true, the information in y has to reach a scaling limit, the dependence of x on y has to be independent of the grain-scale at short distances which defines the continuum.

Here are some trivial examples: let $\epsilon$ be the atomic size, and the parameter y is an integer multiple of the atomic scale:

$$ y = n \epsilon $$

If x is a function of y which obeys the law

$$ x(y+\epsilon) = x(y) + \epsilon y $$

Then for small $\epsilon$, you get that $x(y) = {y^2\over 2} $, and this is standard calculus. If x obeys the law

$$ x(y+\epsilon) = x(y) + \epsilon x(y) $$

Then for small $\epsilon$, you find $x(y) = Ce^y$. In both cases, the change in $x$ in each $\epsilon$ step is determined by the change in y, and the stepsize becomes irrelevant in this scaling.

But suppose you are perverse and you decide to scale the x-steps differently

$$x(y+\epsilon) = x(y) + \epsilon^2 x(y) $$

Then as $\epsilon\rightarrow 0$, you get a constant x! The quantity x stops changing as the discreteness parameter goes to zero. You need just the right power on the $\epsilon$ to get a nontrivial relation between x and y. If you chose the wrong power the other way

$$x(y+\epsilon) = x(y) + \epsilon^{.5} x(y) $$

Then x would blow up at any finite value of y as $\epsilon\rightarrow 0$. Only one exponent, namely the trivial exponent 1, gives the correct continuum limit.

These are the classical calculus examples of microscopic scaling. The first nontrivial example is when x(y) is the sum of a random quantity, $\eta(y)$, which is a random number between -1 and 1, at each discrete position. Then you want to take the limit of $\epsilon\rightarrow 0$ of the sum of random numbers, to get a continuous version of a random walk. You try to do the calculus thing:

$$ x(y+\epsilon) = x(y) + \epsilon \eta(y) $$

But this choice converges to a constant x in the limit of small epsilon. The reason is that the sum of N random things only grows as $\sqrt{N}$, while the $\epsilon$ term suppresses it by 1/N. So to fix this, you need a different power law on $\epsilon$

$$ x(y+ \epsilon) = x(y) + \epsilon^{1/2} \eta(y) $$

This defines the stochastic calculus limit. There is a whole field of mathematics, Ito calculus, which only studies this scaling law for the continuum limit. It is important in fields like finance, where random walks appear everywhere, since any commodity price in an efficient market with bounded fluctuations must be a random walk.

So when you have a discrete system, like a computer simulation taking discrete steps in time, you can find a converging continuous limit of small steps, but only if you choose the appropriate scaling law for the quantities which are changing. The scaling law for fluctuating quantities is different from the scaling law for smoothly varying quantities.

For smooth quantities, $\delta x$ scales linearly in $\delta y$, or $\epsilon$, and this is the only case studied in ordinary calculus. Stochastic Ito calculus makes $\delta x$ scale as the square-root of $\delta y$, or as $\sqrt{\epsilon}$. Mandelbrot's advisor was Paul Levy, who had developed the theory of Levy flights, or random walks with power-law distributed steps, so that there is some probability of big steps which doesn't vanish when you take a scaling limit. In Levy flights, the continuum limit is obtained by scaling $\delta x$ as $\epsilon^\alpha$ where $\alpha$ is a continuous adjustable parameter.

This means that Mandelbrot had an important new perspective--- he understood that in natural phenomenon, where the continuum always emerges at long distances as an approximation to something small and grainy, the scaling laws did not have to be confined to integer powers, or even rational powers. You could have arbitrary scaling laws which define different continuum limits. This behavior would define the regularities in fluctuations you see in nature, like the rough shape of coastlines, or the jagged shapes of mountains.

These ideas are developed by Mandelbrot in "The Fractal Geometry of Nature", in a way accessible to anyone, because it does not presume any deep prior knowledge of mathematics.

Fractal geometric scaling

Consider a fractal shape, take the Koch curve for definiteness. If you calculate the length of the curve, you need to specify the length of the ruler with respect to which you calculate the length. As the ruler becomes small, the total length of the curve goes to infinity as a power, $1/l^d$ where d is the fractal dimension of the curve.

The meaning of this is not obscure--- the shape is irregular at small distances, so that the notion of length is inapplicable, and the ordinary scaling laws of length for differentiable curves, that the number of copies of a ruler of length l which fit on the curve diverges as $L/l$ is violated, and the violation of the law is in the exponent.

When you have microscopically fractal shapes, the scaling laws you would intuitively expect from the example of differentiable shapes change, and quantities which were originally finite, like the length, become infinite. Further, the process of defining the fractal shape is most conveniently expressed using what is called in physics a regulator--- using a fictitious finite length l which is the length of the ruler to measure the shape, and looking at quantities which are stable in the limit $l\rightarrow 0$.

So the length of the Koch curve doesn't make sense, it is infinite, but the coefficient of the blow-up of the power-law relating the length to l is finite, and is the Hausdorff measure of the Koch curve, the analogous notion to length for a fractal curve.

Fractal Fluctuations in Phase transitions

Consider a statistical fluctuating quantity, like the density of fluid in thermal equilibrium. For ordinary temperatures, there are fluctuations at the atomic scale, and these fluctuations average out at the macroscopic scale, so that the fluid looks uniform.

But when you tune the pressure and temperature to the liquid/gas critical point, the fluctuations become correlated, so that big macroscopic chunks of the gas-liquid hybrid are at higher density at certain regions, while they are at low density at other regions. This is obvious experimentally because a clear fluid at the critical point becomes milky white, because the density fluctuations on the scale of the wavelength of light are now significant.

To describe this system, you need the average density over many atomic sized volumes as a function of position. Define the long-distance density function $\phi(x)$ to be the average density of the fluid at every point over a box of length $l$. You can make a lattice of size l, and there is a statistical law which tells you how likely the density is to be at a given value, considering the density at neighboring positions. The statistical law takes the form of a probability distribution for the density at a site x, given the density on the neighboring sites y.

The law of the density can be expressed mathematically as follows:

$$ -\log(\rho(x)) = \sum_{<y,x>} (\phi(x)-\phi(y))^2 + V(\phi) $$

This has a simple meaning--- the density at a point has a mean value which is determined by the value of the neighbors, with an overall pull to some preferred value described by $V(\phi)$. The form of $V$ can be taken to be a polynomial (this is explained later)

$$ V(\phi) = a \phi^2 + b\phi^4 $$

where the parameter b must be positive. By tuning the parameter a, you can reach a point where the fluctuations appear at all length scales, and at this point, the lattice can be made arbitrarily small, and you find a continuum limit if you scale $\phi$ appropriately.

The limit $\epsilon\rightarrow 0$, $\phi\rightarrow \epsilon^{\alpha} \phi $ can be taken so that the fluctuations become independent of the lattice. The parameter $\alpha$ is the fractal dimension of $\phi$. For $V=0$, the fractal dimension of the field depends only on the dimension, and has one value. But for the actual form of V, the fractal dimension is altered from the naive value.

Quantum Field theory is the same thing

Quantum fields are defined by a Feynman path integral over field values. They can also be understood to describe the fluctuations of particles, but the field picture is best here.

The Feynman path integral says that one has to consider all possible quantum field fluctuations between the initial time and the final time to describe the quantum probability amplitude go from one time to another. This is the fundamental formulation of quantum mechanics in the Feynman Lagrangian approach.

But there is a simple mathematical relation between Feynman quantum mechanical path integrals (at least for Bosonic fields) and statistical distributions. The two are related by a formal method called Wick rotation, or imaginary time formulation.

The Wick rotation of ordinary quantum mechanics is Ito calculus of Brownian paths. The Wick rotation of field theory makes each (bosonic real-action) field theory into a statistical system, whose scaling laws have fractal (or anomalous) dimensions. The fractal dimensions mean that the typical field in the distribution looks the same after rescaling space by L and the field by a power of L.

Renormalization logarithms

In realistic quantum field theories in four-dimensional space-time, the actual scaling laws are only modified by logarithms. These logarithms are the sign of an incipient change in exponent. The reason is that in 4 dimensions, two random walks only marginally intersect, if you look at two random walks on a lattice starting at two positions a fixed distance apart, the probability that they collide goes to zero as the logarithm of the lattice spacing.

A logarithm is just the limit of an exponent for small values of the exponent. If you look at a power-law with a slightly different exponent

$$ x = y^{\alpha + \epsilon} = y^{\alpha} y^\epsilon = y^\alpha e^{\epsilon \log y} = y^{\alpha} (1 + \epsilon \log y + {\epsilon^2\over 2} \log^2 y + ...)$$

The original scale-invariance of the power-law relation seems to be broken by the logarithms, but it is just modified. If you scale y by an amount $A$, you scale $x$ by $A^{\alpha+\epsilon}$, which gives $\epsilon$ modifications to the dimension of $x$.

The quantities in four dimensional quantum field theory have infinitesimally modified dimensions in this way, at infinitesimal distances where the length scale associated with the mass of the particles is no longer visible. These logarithmic corrections to scaling make the four dimensional theory both mathematically easier and conceptually more difficult, because the new fractal scaling laws are not as apparent.

In three dimensions, scalar field theories just acquire anomalous dimensions. One of the more interesting ways to calculate the fractal dimensions is to use the known logarithms in 4 dimensions to find the dependence of the fractal dimension on the dimension of space, and this gives predictions for the fluid-critical scaling which match experimental data and computer simulations very accurately.


In fact, there is no need to consider particles as a points. If you think of particle as of "cloud", there are no infinities both in classical and quantum theories.

For instance, when physicists build a quantum mechanical model of the hydrogen atom, they consider electron as a cloud of negative charge smeared around proton. The numeric quantities obtained by using this model are in a very good agreement with experiment.

But many modern physicists use models where particles are considered point-like. There are at least two reasons for that.

The first reason is that if one wants to use model where particle is not point-like, he or she need to define the structure of the particle. But nobody knows the internal structure of the particles, so they cannot define it to use in the model. Please note that in a previous example of the hydrogen atom physicists are dealing with atom, not with particle. Physicists were able to develop such model because they knew something about the atom's internal structure (i.e. they new that positively charged proton is at the centre of atom, electron is smeared around proton, the electric field of proton was known, etc.). We cannot do the same thing with the particle because we know almost nothing about what is inside the particle.

The second reason is as follows: there is a model that works very well for collisions of the particles at high energies. This model is used, for instance, for particle colliders such as LHC. The distance traveled by the particle in such collider is very large compared to the size (if any) that can be associated with the particle itself. So it is logical to consider particles as point-like objects in this model, because the size of the particle itself plays ALMOST no role.

I wrote "ALMOST" because it does play role when one is trying to apply the model not to a number of very fast particles colliding at very high energies, but to a particle ITSELF. For instance, particle at rest is not traveling a large distance, and it's total energy is not much larger than it's self energy (which is $E=mc^2$ as you probably know). In this case there is no excuse to consider particle as a point-like object, and model fails to produce meaningful results.

So, where infinities come from? They come from conjecture that particles are point-like, and they appear both in classical and quantum theories. See what Vladimir wrote about it for details.

And the last thing related to your question: what is renormalization?

Renormalization is the following:

  1. at the first step the particle IS NOT considered as a point-like object. Physicists say that it has a size $\lambda$ and perform all calculations for this "sizable" object. Of course, no infinities appear.

  2. at the second step physicists separate those terms that depend on $\lambda$ (the "size" of the particle) from those terms that do not depend on $\lambda$.

  3. The terms that do not depend on the $\lambda$ have some independent physical meaning and are relevant for describing some (but not all!) properties of the particles. They are accurately calculated.

  4. at the next step the size of the particle is made smaller and smaller, i.e. $\lambda$ is approached to zero. Those terms that depend on $\lambda$ are divergent, i.e. when you approach $\lambda$ to zero they grow by infinity. The truth is that these terms are not used for anything, they are simply dropped. So, the goal of renormalization procedure is to separate finite terms from the equations and get rid of other divergent terms.

So, by using renormalization we can make the model "free" of divergences, but still we cannot use it for calculating some important properties of the particles. For instance, the mass and the electric charge of the particle cannot be calculated, because the model gives us no criteria to identify these quantities. Moreover, the particles that are known to have different masses (such as electron and muon) are indistinguishable in terms of this model.


The point particle idealization that leads to the infinities is removed by introducing a small perturbation (large energy cutoff = small distance cutoff) into the problem that depends on the energy cutoff scale $\Lambda$. Thus we have a family of models depending on $\Lambda$ and the original parameters of the model. The physics should be independent of where precisely the cutoff is applied, as it shouldn't matter how small the particle is once it is small enough.

The physics contained in a model must be independent of the parameters that happen to be used in the particular model. In many cases of interest, the experimental data can be empirically described in terms of a few physical key parameters, such as basic observable masses and charges. These are generally different from the mass and charge coefficients that appear in particular models. To distinguish these in a general context, one refers to the model-dependent coefficients – such as the quark masses mentioned above – as bare parameters and to the model-independent parameters chosen for the physical parameterization – measurable masses, charges, etc., related directly to experiment – as renormalized or dressed parameters.

The purpose of renormalization is to reparameterize the $\Lambda$-dependent family of Hamiltonians in such a way that one can match physical parameters in a numerically robust way that is essentially independent of $\Lambda$ (once it is large enough), so that at the end of the calculations, one can take the limit $\Lambda\rightarrow\infty$ without difficulties.

How to do this is explained in elementary terms in http://www.mat.univie.ac.at/~neum/ms/ren.pdf - the simplest example is a 2-state system!

Other possibly helpful explanations (some elementary, others less so) can be found in Chapter B5: Divergences and renormalization of A theoretical physics FAQ.