How do the electric or magnetic fields contain momentum?

In terms of a photon picture, this is not really mysterious at all. The electromagnetic force is mediated, in its quantum mechanical description, by the exchange of photons. These can be real - i.e. represent real light beams - or virtual, which means that the energy for the photon's existence has been 'borrowed' for a small amount of time as allowed by the Heisenberg uncertainty principle. Electrostatic and magnetostatic fields consist, in the quantum picture, of a huge number of virtual photons flying back and forth.

Now, each of these photons carries a certain amount of momentum. They must, because they will impart a force on the charged particles that absorb or emit them. Since each photon carries momentum, it is no surprise that the field as a whole can contain some net amount of momentum! Sometimes this will be zero - the contributions from the different photons will cancel out either locally at each point or globally once all points are considered - but this need not be the case. Thus, the electromagnetic field can carry momentum.

Now, this is a nice and intuitive picture, but it draws on a very exotic concept, so I'd understand if it weirds you out a little. More than that, since the existence of electromagnetic field momentum is required within classical electrodynamics, one would also want an answer which does not require quantum mechanics to explain it. (Think about this last bit carefully - it's not a trivial argument.)

In the end, whether the field "has" momentum or not is a matter of the definition of the word "have", which is a human construct. Strictly speaking, what is true is that

  • it is possible to arrange situations where charged particles interact in a way in which their total mechanical momentum is not conserved, but once all the particles are separated again then their final total momentum equals the initial one.

This is augmented by the fact that

  • there exists a quantity, with units of momentum, and which can be calculated from the electric and magnetic fields at each point, which will give a conserved quantity if it is added to the particles' total mechanical momentum.

It is important to note that the conservation of momentum is not a given; it is a property of physical theories which any particular theory may or may not have. (As it happens, all physical theories which we observe in the real world do observe it in some form, but that is not guaranteed a priori.)

One example of this is newtonian mechanics with forces which obey Newton's third law. In this case, it is a theorem of the theory that the total mechanical moementum is conserved.

Another example is Noether's theorem, which guarantees a momentum conservation law in dynamical systems of a certain class, whose laws are translationally invariant. For certain systems this invariance exists and hence momentum is conserved; for others it is not and momentum is not conserved.

For charged mechanical particles interacting electromagnetically, Newton's third law does not hold, so our old theorem is not applicable (and in fact its conclusion is false, as the mechanical momentum is not conserved). However, this does not mean that we cannot find a smarter, more sophisticated theorem which does imply a conservation law.

One therefore needs to sit for a bit and jiggle at the maths, but the theorem is indeed provable. In essence, what you do is

  • write down the total force on the mechanical particles,
  • express it in terms of electromagnetic fields, charges and currents,
  • use Maxwell's equations to transform the charges and currents into electric and magnetic fields, and thus
  • derive an expression for the total mechanical force on the system in terms of the integral of a certain function of the electric and magnetic fields at each point.
  • One then needs to transform this quantity into the total time derivative of a simpler expression, which will be interpreted as the electromagnetic field momentum. This is possible but it leaves a remainder which depends on what volume you're integrating over.
  • One can then prove that, for localized systems, this remainder vanishes. When it does, the total dynamical momentum - mechanical plus electromagnetic - is conserved.

In general, I would discourage you from attempting this calculation until you have taken solid courses in electromagnetism and vector calculus at university level, or you will just bruise yourself up against it. Focus, instead, on the physics, on a qualitative level.

If you have more specific questions I'm happy to try and reply, but if you want details on the mathematics you do need to specify what your background is so that we can give answers you will understand.


I'll show to you "how this is the case", "how this works", but this needs time. This is a very interesting question and I'm sorry to see that books divide in two set: basic one that simply jump the problem (maybe mentioning it or reporting formula saying that calculations are boring, while they are exciting); and advanced ones, that obviously deal with a such important problem, but they solve everything with few symbols using such hard mathematics that they are difficult to be understood by less capable readers (here I am). I find that the best compromise is, as always, Griffiths (from which I take most of this answer). But for once I want criticize his book too. He put the reader in front of the traumatizing divergence of a matrix, without prepare him to handle a such object. That's why I'll put in this answer a section named "Alternative divergence theorem". Here I will always use cartesian coordinates, simply I ascertain that this is the simplest way to see why there is a momentum density of the electromagnetic field and to calculate how much it is. These calculations are long, but if the reader trust me he/she will be gratified.

Momentum current density

It is necessary to introduce the concept of momentum current density. Like current density $\mathbf{J}$ is a vector such that $\mathbf{J} \cdot d\mathbf{a} dt$ gives the amount of charge that pass through $d\mathbf{a}$ in interval $t \to t+dt$; momentum current density is a matrix such that ${\sf{M}} \cdot d\mathbf{a} dt$ gives the amount of momentum that flow through $d\mathbf{a}$ in interval $t \to t+dt$. Charge is a scalar so current density is a vector, but momentum is a vector so it sounds reasonable introduce a matrix to describe its current density (the dot product with a vector gives a vector). In this definition I don't say if I have to do ${\sf{M}} \cdot d\mathbf{a}$ or $d\mathbf{a} \cdot {\sf{M}}$, but this is the same: we will see that in cartesian coordinate momentum current density is a symmetric matrix.

An interesting way to describe the momentum conservation

Let's suppose that inside volume $V$ are moving charged particles in electromagnetic field (not necessarily all given by the charge themselves). Let's be $d\mathbf{p}$ the change, during the time $dt$, of momentum of these bodies into the volume $V$. If we can write this equation, \begin{equation} \frac{d\mathbf{p}}{dt} = - \frac{d}{dt} \int_V \mathbf{C} d \tau - \oint_S {{\sf{D}}} \cdot d \mathbf{a} \end{equation} then the only reasonable interpretation is that $\mathbf{C}$ is momentum density of the field and ${\sf{D}}$ is the momentum current density. In fact in this way the equation above says that change of field momentum in interval $t \to t+dt$ inside the volume (i.e. $d \left( \int_V \mathbf{C} d \tau \right) $) is the opposite of variation of momentum of bodies (i.e. $d\mathbf{p}$) minus the field momentum that flowed out through the surface $S$ (i.e. $ \oint_S {{\sf{D}}} \cdot d \mathbf{a} dt $). We are excluding the possibility that bodies cross through surface $S$ in time $dt$, otherwise we should generalize the equation, but we don't need to do that if we are interested to evaluate the momentum of electromagnetic field in the easier way. Take your time to reflect about this equation and these considerations, and see how reasonable they are. Then go on.

Poynting vector and Maxwell tensor definition

Later we will use the vector we define here (introduced by John Henry Poynting in 1884) \begin{equation} \mathbf{S} = \frac{\mathbf{E} \times \mathbf{B}}{\mu_0} \end{equation}

Let me introduce this matrix, at the moment it can sound a mess but later you will see that we introduce this to avoid a mess and express the theorem in an elegant way. Maxwell tensor ${\sf{T}}$ is described by \begin{equation} T_{ij} = \epsilon_0 \left( E_i E_j - \frac{1}{2} \delta_{ij} E^2 \right) +\frac{1}{\mu_0} \left( B_i B_j - \frac{1}{2} \delta_{ij} B^2 \right) \end{equation} where index $i$ and $j$ refers to three coordinates $x$, $y$ and $z$ (so that tensor has in total 9 components: all possible combinations $T_{xx}$, $T_{xy}$, $T_{xz}$, $T_{yx}$, etc.) and $\delta_{ij}$ is Kronecker delta. Explicitly we could write (note that to obtain this below we have to substitute field squared in the definition whit sum of squared components) enter image description here

Alternative divergence theorem

Given a symmetric matrix field ${\sf{M}}$, defined into volume $V$ surrounded by surface $S$, we have \begin{equation} \int_V (\nabla \cdot {\sf{M}}) d \tau = \oint_S {\sf{M}} \cdot d \mathbf{a} \end{equation} By matrix field we mean an analogous to vector field, but with matrices (each element is a function of $(x,y,z,t)$). The analogy with ordinary divergence theorem is strong (note that $\nabla \cdot$ low by one the dimension of the object to which we apply it). On the left we do a volume integral of products between a vector field and infinitesimal volumes, on the right we do a surface integral of products between a matrix field and infinitesimal surface vectors. In both sides we have an infinite sum of infinitesimal vectors: a finite vector.

Proof of alternative divergence theorem

The proof of this alternative divergence theorem is similar to the one of ordinary divergence theorem. Let's suppose for simplicity that volume $V$ is the parallelepiped $(a<x<b, c<y<d, e<z<f)$. The flux of matrix field ${\sf{M}}$ through the face orthogonal to $x$ axis is \begin{equation} \int_e^f \int_c^d {\sf{M}} (a,y,z) \cdot \begin{bmatrix} -dydz \\ 0 \\ 0 \end{bmatrix} + \int_e^f \int_c^d {\sf{M}} (b,y,z) \cdot \begin{bmatrix} dydz \\ 0 \\ 0 \end{bmatrix} \end{equation} where we used $\mathbf{\hat{x}} dy dz$ as a column vector (here "hat" mean versor). If $\mathbf{M}_1 (x,y,z)$ is the vector field we get taking first column of matrix ${\sf{M}}$, we can write the flux through the two surfaces in this way \begin{equation} \int_e^f \int_c^d [ \mathbf{M}_1 (b,y,z) - \mathbf{M}_1 (a,y,z) ] dydz \end{equation} Integrand is a vector that can be written in this way \begin{equation} \begin{bmatrix} M_{xx} (b,y,z) - M_{xx} (a,y,z) \\ M_{yx} (b,y,z) - M_{yx} (a,y,z) \\ M_{zx} (b,y,z) - M_{zx} (a,y,z) \end{bmatrix} \end{equation} And exploiting fundamental theorem of calculus we can write \begin{equation} \begin{bmatrix} \int_a^b \frac{\partial M_{xx} (x,y,z) }{\partial x} dx \\ \int_a^b \frac{\partial M_{yx} (x,y,z) }{\partial x} dx \\ \int_a^b \frac{\partial M_{zx} (x,y,z) }{\partial x} dx \end{bmatrix} \end{equation} Which is $\int_a^b \frac{\partial \mathbf{M}_1}{\partial x} dx $. By substituting we see that flux through two surface can be written $\int_V \frac{\partial \mathbf{M}_1}{\partial x} dV$. Similarly we can found the flux through other couples of side: the total flux through the parallelepiped is \begin{equation} \int_V \left( \frac{\partial \mathbf{M}_1}{\partial x} + \frac{\partial \mathbf{M}_2}{\partial y} + \frac{\partial \mathbf{M}_3}{\partial z} \right) dV \end{equation} To end the proof we only have to show that the integrand is equal to the vector we symbolically wrote as $\nabla \cdot {\sf{M}}$. That is, we have to show that \begin{equation} \frac{\partial \mathbf{M}_1}{\partial x} + \frac{\partial \mathbf{M}_2}{\partial y} + \frac{\partial \mathbf{M}_3}{\partial z} = \begin{bmatrix} \displaystyle \frac{\partial}{\partial x} & \displaystyle \frac{\partial}{\partial y} & \displaystyle \frac{\partial}{\partial z} \end{bmatrix} \left[ \begin{array}{c|c|c} & & \\ \displaystyle \mathbf{M}_1 & \displaystyle \mathbf{M}_2 & \displaystyle \mathbf{M}_3 \\ & & \\ \end{array} \right] \end{equation} At first glance this looks wrong, but ${\sf{M}}$ is symmetric by hypothesis so the equation works. If you are skeptical about this conclusion, here the proof. In left member we have the sum of three vectors: \begin{equation} \left[ \frac{\partial M_{xx}}{\partial x} + \frac{\partial M_{xy}}{\partial y} + \frac{\partial M_{xz}}{\partial z} , \frac{\partial M_{yx}}{\partial x} + \frac{\partial M_{yy}}{\partial y} + \frac{\partial M_{yz}}{\partial z} , \frac{\partial M_{zx}}{\partial x} + \frac{\partial M_{zy}}{\partial y} + \frac{\partial M_{zz}}{\partial z} \right] \end{equation} while in second member we have the product between a vector (like we do in ordinary vector analysis, it is useful use the symbolism of treating $\nabla$ as a vector) and a matrix: \begin{equation} \left[ \frac{\partial M_{xx}}{\partial x} + \frac{\partial M_{yx}}{\partial y} + \frac{\partial M_{zx}}{\partial z} , \frac{\partial M_{xy}}{\partial x} + \frac{\partial M_{yy}}{\partial y} + \frac{\partial M_{zy}}{\partial z} , \frac{\partial M_{xz}}{\partial x} + \frac{\partial M_{yz}}{\partial y} + \frac{\partial M_{zz}}{\partial z} \right] \end{equation} if matrix is symmetric ($M_{ij}=M_{ji}$) these two expressions are the same. This show that the previous equation is an identity and the proof is ended.

The calculus of electromagnetic field momentum

The electromagnetic force that acts on charges in volume $V$ is \begin{equation} \mathbf{F} = \int_V (\mathbf{E} + \mathbf{u} \times \mathbf{B}) \rho d \tau = \int_V ( \rho \mathbf{E} + \mathbf{J} \times \mathbf{B} ) d \tau \end{equation} the force per unit volume is \begin{equation} \mathbf{f} = \rho \mathbf{E} + \mathbf{J} \times \mathbf{B} \end{equation} Exploiting Maxwell equations, it is possible eliminate sources in favor of field. I'll show to you how to do it now. Using first and last Maxwell equation we have \begin{equation} \mathbf{f} = \epsilon_0 ( \nabla \cdot \mathbf{E} ) \mathbf{E} + \left( \frac{1}{\mu_0} \nabla \times \mathbf{B} - \epsilon_0 \frac{\partial \mathbf{E}}{\partial t} \right) \times \mathbf{B} \end{equation} Now, \begin{equation} \frac{\partial }{\partial t} (\mathbf{E} \times \mathbf{B}) = \left( \frac{\partial \mathbf{E} }{\partial t} \times \mathbf{B} \right) + \left( \mathbf{E} \times \frac{\partial \mathbf{B} }{\partial t} \right) \end{equation} So by exploiting Faraday law we can rearrange in this way \begin{equation} \frac{\partial \mathbf{E} }{\partial t} \times \mathbf{B} = \frac{\partial }{\partial t} (\mathbf{E} \times \mathbf{B}) + \mathbf{E} \times (\nabla \times \mathbf{E}) \end{equation} Substituting we have \begin{equation} \mathbf{f} = \epsilon_0 \left[ (\nabla \cdot \mathbf{E}) \mathbf{E} - \mathbf{E} \times (\nabla \times \mathbf{E}) \right] -\frac{1}{\mu_0} \left[ \mathbf{B} \times (\nabla \times \mathbf{B}) \right] - \epsilon_0 \frac{\partial}{\partial t} (\mathbf{E} \times \mathbf{B}) \end{equation} Right for make things more symmetrical we can throw in a term $\frac{1}{\mu_0} (\nabla \cdot \mathbf{B})\mathbf{B}$ (it's the same: magnetic field is solenoidal). Now note that applying to the same vector $\mathbf{A}$ the vector identity \begin{equation} \nabla (\mathbf{A} \cdot \mathbf{B}) = \mathbf{A} \times (\nabla \times \mathbf{B}) + \mathbf{B} \times (\nabla \times \mathbf{A}) + ( \mathbf{A} \cdot \nabla )\mathbf{B} + ( \mathbf{B} \cdot \nabla )\mathbf{A} \end{equation} will allow us to write \begin{equation} - \mathbf{A} \times (\nabla \times \mathbf{A}) = - \frac{1}{2} \nabla (A^2) + (\mathbf{A} \cdot \nabla)\mathbf{A} \end{equation} All this allow us to write previous equation in this way \begin{equation} \begin{split} \mathbf{f} = & \epsilon_0 \left[ (\nabla \cdot \mathbf{E}) \mathbf{E} + (\mathbf{E} \cdot \nabla ) \mathbf{E} \right] - \frac{1}{\mu_0} \left[ (\nabla \cdot \mathbf{B}) \mathbf{B} + (\mathbf{B} \cdot \nabla ) \mathbf{B} \right] \\ & - \frac{1}{2} \nabla \left( \epsilon_0 E^2 + \frac{1}{\mu_0} B^2 \right) - \epsilon_0 \frac{\partial}{\partial t} (\mathbf{E} \times \mathbf{B}) \end{split} \end{equation} Despite symmetries, this expression is ugly. Beauty reigns in fundamental law of nature so it must be possible write the equation in a better and clarifier way. It comes to our rescue Maxwell tensor we wrote before and the idea to use the operator $\nabla$ like if it were a vector. Doing scalar product between $\nabla$ and Maxwell tensor we get the second member of the equation written above (less than term with time derivative) \begin{equation} \begin{split} \nabla \cdot {\sf{T}} = & \epsilon_0 \left[ (\nabla \cdot \mathbf{E}) \mathbf{E} + (\mathbf{E} \cdot \nabla ) \mathbf{E} \right] - \frac{1}{\mu_0} \left[ (\nabla \cdot \mathbf{B}) \mathbf{B} + (\mathbf{B} \cdot \nabla ) \mathbf{B} \right] \\ & - \frac{1}{2} \nabla \left( \epsilon_0 E^2 + \frac{1}{\mu_0} B^2 \right) \end{split} \end{equation} Let's check that this is true: we will be careful on the only $x$ coordinate (for symmetrical reasons other coordinates will work). The $x$ coordinate of the second member of the above equation is \begin{equation} \begin{split} & \epsilon_0 \left[ \left( \frac{\partial E_x}{\partial x} + \frac{\partial E_y}{\partial y} + \frac{\partial E_z}{\partial z} \right) E_x + E_x \frac{\partial E_x}{\partial x} + E_y \frac{\partial E_x}{\partial y} + E_z \frac{\partial E_x}{\partial z} \right] \\ & + \frac{1}{\mu_0} \left[ \left( \frac{\partial B_x}{\partial x} + \frac{\partial B_y}{\partial y} + \frac{\partial B_z}{\partial z} \right) B_x + B_x \frac{\partial B_x}{\partial x} + B_y \frac{\partial B_x}{\partial y} + B_z \frac{\partial B_x}{\partial z} \right] \\ & - \frac{\epsilon_0}{2} \frac{\partial (E_x^2 + E_y^2 + E_z^2)}{\partial x} - \frac{1}{2 \mu_0} \frac{\partial (B_x^2 + B_y^2 + B_z^2)}{\partial x} \end{split} \end{equation} While the $x$ coordinate of the first member (i.e. the "scalar product" between $\nabla$ and the first column of Maxwell tensor is) \begin{equation} \begin{split} & \frac{\epsilon_0}{2} \frac{\partial (E_x^2 - E_y^2 - E_z^2)}{\partial x} + \frac{1}{2 \mu_0} \frac{\partial (B_x^2 - B_y^2 - B_z^2)}{\partial x} \\ & + \epsilon_0 \frac{ \partial ( E_x E_y)}{\partial y} + \frac{1}{\mu_0} \frac{ \partial ( B_x B_y)}{\partial y} - \epsilon_0 \frac{ \partial ( E_x E_z)}{\partial z} + \frac{1}{\mu_0} \frac{ \partial ( B_x B_z)}{\partial z} \end{split} \end{equation} Reader can easily check that the two terms are equal (for simplicity, because of the symmetry we can concentrate on only one field, for example $E$). Now we are ready to write the "ugly" equation in a beautiful and compact way: \begin{equation} \mathbf{f} = - \epsilon_0 \mu_0 \frac{\partial \mathbf{S}}{\partial t} + \nabla \cdot {\sf{T}} \end{equation} where we used Poynting vector too. Exploiting $c=\frac{1}{\sqrt{\epsilon_0 \mu_0}}$ and integrating over volume we have \begin{equation} \frac{d\mathbf{p}}{dt} = - \frac{d}{dt} \int_V \frac{\mathbf{S}}{c^2} d \tau +\oint_S {\sf{T}} \cdot d \mathbf{a} \end{equation} where we exploited the "Alternative divergence theorem". Now come back to what I wrote in the section "An interesting way to describe the momentum conservation", and reflect: the proof is ended:

  • Momentum density: $\frac{\mathbf{S}}{c^2}$
  • Momentum current density: $ - {\sf{T}}$

Why Maxwell tensor is not defined with inverted sign? Probably this is due to an unfortunate historical convention. If we want express field momentum as a function of fields we have obviously enter image description here This can be exploited to show that for electromagnetic waves we have $E=pc$ (it is not a case that this is the same link between energy and momentum we have for extreme relativistic particles).

A final comment

Note that what we found suffers from abstraction and give problems: this momentum theorem (like Poynting theorem too) is general, not only for electromagnetic wave: we didn't any special hypothesis about the nature of the field (the only one being that the field obey to Maxwell equations). This lead to very interesting (but also very complicated!) problems related to hidden momentum. As we often see in physics, each time you solve a problem you are in front of harder problems and of strange (and so, interesting) point of reflection on the big puzzle.