How to derive annihilation and creation operators?

The way you "derive" them in real life is that you know about them from classical mechanics. It is absolutely astonishing that history seems to have forgotten this.

Classical harmonic oscillator in Hamiltonian form

Consider a classical undriven and undamped harmonic oscillator. Generally speaking, it's equation of motion is $$\ddot{q} + \omega_0^2 q = 0$$ where $q$ is the coordinate of the oscillator and $\omega_0$ is the natural resonance frequency. If we guess that the Lagrangian is $$ \mathcal L = \frac{\beta}{2} \dot{q}^2 - \frac{1}{2 \alpha} q^2 \, ,$$ then applying the Euler-Lagrange equation gives \begin{align} \frac{d \mathcal L}{dq} - \frac{d}{dt} \frac{d \mathcal L}{d \dot{q}} &= 0 \\ \ddot{q} + \frac{1}{\alpha \beta} q &= 0 \, , \end{align} which is correct for any $\alpha$ and $\beta$ such that $1 / \alpha \beta = \omega_0^2$.

Following the usual procedure to find the Hamiltonian, we get $$H = \frac{1}{2 \alpha} q^2 + \frac{1}{2 \beta} p^2 $$ where the momentum $p$ is defined as $p \equiv \partial \mathcal L / \partial \dot{q} = \beta \dot{q}$. Hamilton's equations of motion are $$ \dot p = - \frac{\partial H}{\partial q} = -q / \alpha \qquad \text{and} \qquad \dot q = \frac{\partial H}{\partial p} = p / \beta $$ or combined as a matrix equation $$\frac{d}{dt} \begin{bmatrix} q \\ p \end{bmatrix} = \left( \begin{array}{cc} 0 & 1/\beta \\ - 1 / \alpha & 0 \end{array} \right) \begin{bmatrix} q \\ p \end{bmatrix} \, . $$

Rescaled variables

If we define $x \equiv A q$ and $y \equiv B p$ with the constraint that $A/B = \sqrt{\beta / \alpha}$, then our Hamilton equations of motion become $$\frac{d}{dt} \begin{bmatrix} x \\ y \end{bmatrix} = \omega_0 \left( \begin{array}{cc} 0 & 1 \\ - 1 & 0 \end{array} \right) \begin{bmatrix} x \\ y \end{bmatrix} \, . $$ This is a set of first order coupled differential equations for $x$ and $y$. To uncouple the equations, we solve for the eigenvectors and eigenvalues of the matrix. They are$^{[*]}$ $$a \equiv x + i y = \begin{bmatrix} 1 \\ i \end{bmatrix} \text{ with eigenvalue } i \omega_0$$ and $$ a^* \equiv x - i y = \begin{bmatrix} 1 \\ -i \end{bmatrix} \text{ with eigenvalue } -i \omega_0 \, .$$ The $a$ and $a^*$ variables have very simple time dependence, i.e. $$\boxed{ \begin{array}{ll} \dot{a} = i \omega_0 a & a(t) = a(0) e^{i \omega_0 t} \\ \dot{a}^* = -i \omega_0 a^* & a^*(t) = a^*(0) e^{-i \omega_0 t} \end{array} } $$

Discussion

  1. The equation for $a$ and $a^*$ in terms of $x$ and $y$ (or in terms of $q$ and $p$) mirror the quantum equations for $\hat a$ and $\hat a^\dagger$.

  2. The time evolution of $a$ and $a^*$ is the same as for $a$ and $a^\dagger$ in the Heisenberg picture of quantum mechanics.

As we can see, the $\hat a$ and $\hat a^\dagger$ operators definitely didn't come just from quantum mechanics, as they are direct analogues to the variables that diagonalize the Hamiltonian evolution matrix of the classical harmonic oscillator.

Further notes

A huge part of why $a$ and $a^*$ are useful is that they make it easy to analyze coupled and driven systems, in particular using perturbative methods. For example, using the rotating frame and rotating wave approximation, the interaction term of two interacting oscillators $$H_\text{interact} = g x_1 x_2 \, .$$ can be rewritten as $$H_\text{interact} = \frac{g}{4} (a_1 a_2^* + a_1^* a_2) \, ,$$ which is exactly analogous to the quantum interaction term $$\frac{g}{4} ( a_1 a_2^\dagger + a_1^\dagger a_2) \, .$$

Footnotes

$[*]$: I wasn't careful to make the signs come out the same way as they do in quantum mechanics, so actually we have the eigenvalues swapped compared to what you'd get with $\hat a$ and $\hat a^\dagger$. I also wasn't careful to make the normalization come out the same way as it does in quantum mechanics. This is a pretty simple detail that could be edited by someone else, if they're interested.