Quantum input-output theory : Why do we multiply by density of mode to have a number of photon **per unit of time**

Some remarks:

  1. From the question (emphasis mine)

    Before moving further, it is important to underline that $b^†(ω)b(ω)$ is not the number of photon of frequency $ω$ because it has the dimension of the inverse of a time.

    How so? According to

    $H_{field}=\int d\omega ~ \hbar \omega b^{\dagger}(\omega) b(\omega)$

    we ought to have that $ \int d\omega ~ \hbar \omega \langle b^{\dagger}(\omega) b(\omega)\rangle$ is the total energy of the bath in the absence of interactions with the system. Therefore $\langle b^{\dagger}(\omega) b(\omega)\rangle$ is simply the number of photons per unit frequency. The dimension is therefore inverse frequency, not inverse time.

  2. $\sum_k \hbar \omega_k a^{\dagger}(\omega_k) a(\omega_k) = \int d \omega ~ \hbar \omega \nu(\omega) a^{\dagger}(\omega) a(\omega)$

    It is worth noting here that in this step an approximation is being performed, namely the discrete system modes are being replaced by a continuum. Also note that the notation is slightly abusive, since the $a^\dagger$ operator changes units from the left to the right.

  3. We define the input field as :

    $b_{in}(t)=\frac{1}{\sqrt{2 \pi}} \int d\omega ~ e^{-i \omega t} b^0(\omega)$

    It looks like a Fourier transform but the way I understand it is more : we make evolve all mode at time t in Heisenberg picture assuming they are not interacting (which is the case of the input field before the interaction), and we sum on those modes : it is the definition of the total input field.

    I quite like this interpretation. Let me rephrase it a bit: at the start you have a quantum mechanical "wave packet" that is built from a superposition of bath modes, each evolving freely at frequency $\omega$. Note that this contains the notion of being an boundary condition at time $t_0$ (indicated only by the zero superscript in this formula). In many cases this will be considered asymptotically, with $t_0$ approaching the infinite past.

  4. $b_{in}(t)=\frac{1}{\sqrt{2 \pi}} e^{-i \omega t} b^0(\omega)=\frac{1}{\sqrt{2 \pi}} e^{-i \omega t} \sqrt{\nu(\omega)} a^0(\omega)$

    This formula in the question is not completely correct, especially the continuum system modes on the right hand side. It is certainly not what Gardiner&Collett have in their paper (see formula (2.22)). They only have a single discrete mode. If you want to have a continuum of system modes, there should at least be an integral for that somewhere, unless your coupling is local in frequency. But the latter would just correspond to a single mode problem with messy notation again. Either way the units are wrong in this, as pointed out in remark 2.

    I initially thought that this was where the confusion came from, but after StarBucK's I am adding this edit to address the real question:

EDIT:

So now that we have understood the units of $b(\omega)$, which was also nicely expained again in an answer by jgerber that was posted since, we can look at the units of $b_\textrm{in}(t)$. To understand this let us investigate the definition of the input operators a bit further. The original bath operator $b(\omega)$ is in the Heisenberg picture (as also pointed out by jgerber). So this operator is already time dependent and could (or maybe should) be written $b(t, \omega)$. As we saw above, what $\langle b^\dagger(t, \omega) b(t, \omega) \rangle$ then represents physically is the number of photons per unit frequency (so "per mode") at time t (not per unit time). So in other words: $b(t, \omega)$ is our standard photon operator, just for a continuum, not for a discrete mode.

The definition of the input operator can then be written:

$$b_{\textrm{in}}(t)=\frac{1}{\sqrt{2 \pi}} \int d\omega ~ e^{-i \omega (t-t_0)} b(t_0, \omega)$$

Note that $e^{i \omega t_0} b(t_0, \omega)$ is physically the photon operator in the interaction picture (that is with the free time evolution taken out) at time $t_0$. So if you have no interactions, then $e^{i \omega t_0} b(t_0, \omega)$ is actually independent of $t_0$. This means $b_{\textrm{in}}(t)$ is really the Fourier transform of the interaction picture operator at time $t_0$.

To make it a bit clearer what I am saying, you can also define an input operator in the frequency domain. The definition is just a Fourier transform again and if we evaluate this Fourier transform, we get a very simple result:

$$b_{\textrm{in}}(\omega) = \frac{1}{\sqrt{2 \pi}} \int dt e^{i \omega t} b_{\textrm{in}}(t) = e^{i \omega t_0} b(t_0, \omega).$$

So the input operator in the frequency domain is exactly the interaction picture photon operator at time $t_0$! Mathematically this is all simple, we are just doing Fourier transforms back and forth. But physically, this gives a lot of insight into what the input operators mean in my opinion.

So say we say that the expectation value of these frequency space input operators is some function $\langle b^\dagger_{\textrm{in}}(\omega) b_{\textrm{in}}(\omega) \rangle = I(\omega)$. I have called this function $I$ on purpose, because this represents the intensity spectrum that you send into your system at time $t_0$. So if you have some wavepacket flying towards your cavity/interaction region, the input operators give you the spectrum of this wavepacket.

The time-frequency relation then behaves very similarly to classical optics. We have

$$ \omega\textrm{-domain amplitude} \xleftarrow[]{\textrm{Expectation value}}b_{\textrm{in}}(\omega) \xrightarrow[]{\textrm{Fourier transform}} b_{\textrm{in}}(t) \xrightarrow[]{\textrm{Expectation value}} t\textrm{-domain amplitude}$$

So my advice is to think about it in terms of wavepackets. $\langle b^\dagger_{\textrm{in}}(\omega) b_{\textrm{in}}(\omega) \rangle$ gives you the number of photons per unit frequency at frequency $\omega$ in the wavepacket. $\langle b^\dagger_{\textrm{in}}(t) b_{\textrm{in}}(t) \rangle$ gives you the number of photons per unit time at time $t$ in the wavepacket. Here is a picture (source):

http://cvarin.github.io/CSci-Survival-Guide/fft.html

The picture is for classical fields, so you do not have the whole business with expectation values, but the principle is the same. $|E(\omega)|^2$ is the intensity per unit frequency at frequency $\omega$, $|E(t)|^2$ is the intensity per unit time at time t.

Summary: after stripping away the weirdness of the definition of input operators and the interaction picture, this is really just Fourier transforming wave packets.