Why are the higher angular momentum states of a hydrogen atom closer to the nucleus?

This is a tricky bit of intuition to get right. In essence, having a lower angular momentum expands the radial range that the electron is allowed to span - the inner turning point moves inward and the outward turning point moves outward - but the electron is moving much slower at the outward turning point, which means that it spends more time there and therefore that region weighs more heavily in the $\langle r\rangle$ calculation.

To see this in detail, consider the hydrogenic wavefunctions at $n=6$, as $l$ goes from 0 to 5, and the effective potentials $V_l(r)=-\tfrac1r+\tfrac{l(l+1)}{2r^2}$ which govern the radial motion.

Radial hydrogenic wavefunctions for n=6 and l=0 through 6

Hydrogenic wavefunctions $R_{nl}(r)$, which obey $\tfrac12R_{nl}''(r)+\left[-\tfrac1r+\tfrac{l(l+1)}{2r^2}\right]R_{nl}(r)=-\tfrac1{2n^2}R_{nl}(r)$, normalized to $\int_{-\infty}^\infty |R_{nl}(r)|^2\mathrm dr=1$. Source code in revision history.

The red points indicate the classical turning points, at which $$V_l(r)=-\frac1r+\frac{l(l+1)}{2r^2}=-\frac{1}{2n^2}=E_n,$$ and which mark the inflection points of the wavefunction. Note, in particular, that as $l$ increases, both the inner and the outer turning points move in towards the circular orbit, closing down the available range in $r$. Qualitatively, it looks like the inner turning point moves much more than the outer one, particularly since much of the dynamics of the $l=0$ wavefunction happens in that range.

You should note, though, that in absolute terms the outer turning point moves inward by about the same amount. This feels a bit counterintuitive: why does adding an outwards centrifugal force restrict the outwards range of $r$? The answer is that this calculation is done at constant energy, which means that adding angular momentum restricts the kinetic energy available for radial motion, so the electron cannot venture as far outwards from the equilibrium position.

The more important effect, however, is the time spent in the newly-opened regions. If you go from $l=5$ to $l=0$, you open up a significant range at low $r$ and a (fairly big, but bland) range at high $r$. Although much of the dynamics of the $l=0$ wavefunction is at low $r$, the potential there is very deep under the eigenenergy, which means that the electron has a lot of kinetic energy there, so it covers that ground fast and spends relatively little time there. In the long shallow tail just before the outward $l=0$ turning point, on the other hand, the kinetic energy is small, the electron is slow, and the time spent in that range is large.

To make this a little bit less handwavy, I should note that this sort of argument can be given mathematical substance, at an intuitive a level as the WKB approximation. In fact, if you approximate your wavefunction as an action-dependent phase with an amplitude, as $\psi(x)=P(x)e^{iS(x)}$, the resulting semiclassical amplitude $$ \psi(x)\approx \frac{\text{const}}{\sqrt[\leftroot{-2}\uproot{2}4]{\frac{2m}{\hbar^2}(V(x)-E)} }e^{\pm i\int\sqrt{\frac{2m}{\hbar^2}(V(x)-E)}\:\mathrm dx} $$ directly reproduces this $\sqrt{\frac1v}=\sqrt{\frac{\mathrm dt}{\mathrm dx}}$ factor. This phenomenon - higher amplitudes just before the turning points - is universal, and it is evident e.g. in the harmonic-oscillator wavefunctions.

Finally, a small word of warning not to take this too far. While the low-$l$ wavefunctions do, in fact, spend most of their time at larger $r$ than the high-$l$ electrons normally do, they are still the only electrons to spend considerable time in the low-$r$ parts of the atom: while the high-$l$ electrons don't go to as high an extreme of large $r$ as the low-$l$ ones, they definitely don't venture anywhere as near the core as the low-$l$ electrons do. This has an important effect in multi-electron atoms, because it means that low-$l$ electrons experience less shielding of the nuclear attraction by inner shells than high-$l$ ones do, and this has direct effects on their energy. So don't be fooled and keep on your toes here :).

One more thing: I should also note that this effect is in no way exclusive to quantum mechanics, and that at constant keplerian orbits with lower angular momentum also roam a larger range in $r$, also spend more time near their apoapses than near their periapses, and therefore also spend on average more time at higher $r$s than lower-angular-momentum orbits will. I was going to match this with a detailed classical calculation but this post is already long enough, so that calculation is left as an exercise for the interested reader.

Emilio Pisanty has already given a good answer. Here we offer a qualitative (as opposed to quantitative) proof of the angular momentum dependence.

  1. Recall first of all that the energy-levels $$\tag{2} E_n ~=~-\frac{R_{\mu}}{n^2}$$ in the non-relativistic hydrogen atom without spin-orbit interactions are linked to the principal quantum number $n\in\mathbb{N}$, where $R_{\mu}$ is Rydberg energy for the reduced mass $\mu$.

  2. OP is essentially asking:

    Why for fixed energy, the average radius $$\tag{2} \langle r \rangle~=~\frac{a_0}{2}\left[3n^2 -\ell (\ell+1)\right]$$ decreases with angular momentum $\ell$?

  3. A related question is:

    Why for fixed energy, the average inverse radius $$\tag{3} \langle \frac{1}{r} \rangle~=~\frac{1}{n^2a_0}$$ is independent of angular momentum $\ell$?

  4. The formula (3) is classically explained via the virial theorem, which says that the average potential energy $$\tag{4} \langle V(r) \rangle~=~2E_n$$ is twice the total energy.

  5. To intuitively explain (2), let us replace the angular momentum $\ell$ with the variable $$\tag{5} n_r ~:=~ n-\ell -1 ~\in\mathbb{N}_0. $$

  6. Then we can reformulate OP's question as

    Why for fixed energy, the average radius $$\tag{6} \langle r \rangle~=~n^2 a_0 \left[ 1+\frac{n_r +1/2}{n}- \frac{n_r(n_r+1)}{2n^2}\right]$$ increases with $n_r$?

  7. Now let us consider the semiclassical limit $n\gg 1$ so that we can use classical intuition for the orbital. The case $n_r=0$ then corresponds classically to circular orbits. Note that eqs. (3) and (6) agree in the naive classical sense when $n_r=0$.

  8. Increasing the parameter $n_r$ corresponds to increasing the length of the classical accessible radial region $[r_{-},r_{+}]$ between the two radial turning points $r_{-}$ and $r_{+}$.

  9. Since the hyperbola $r\mapsto \frac{1}{r}$ is concave upward, if the interval $[\frac{1}{r_{+}},\frac{1}{r_{-}}]$ is distributed evenly around $\frac{1}{r_0}=\frac{1}{n^2a_0}$, cf. eq. (3), then the interval $[r_{-},r_{+}]$ is distributed lopsided around $r_0=n^2a_0$ towards bigger radii (in appropriate quantum-mechanical/statistical sense). This effect explains qualitatively the behaviour (6) of increasing average radius.