Why is information in a volume proportional to it's surface area?

Explaining the math behind the holographic principle would be lengthy exercise. Is that really what you want?

A short hand-waving argument would be that you can pack a limited number of qbits (in the form of photons) together in a given space. If you take long wavelength photons, you can pack a lot of them together before a black hole forms. Two fundamental principles limit the number of photons you can put in a sphere of radius $R$:

1) you can't make their wavelength too long as this would prevent you from localizing the photons within the sphere, and

2) if you make their wavelengths too short, the energy content within the sphere becomes too high and a black hole forms that would have a radius larger than $R$.

The bottom line is that you can pack no more than a number of photons proportional to $R^2$ into the sphere of radius $R$, provided these have wavelengths comparable to $R$.

If you would select massive qbits (rather than massless photons) things get worse.