Why does time evolution operator have the form $U(t) = e^{-itH}$?

For time-independent Hamiltonians, $U(t) = \mathrm{e}^{\mathrm{i}Ht}$ follows from Stone's theorem on one-parameter unitary groups, since the Schrödinger equation $$ H\psi = \mathrm{i}\partial_t \psi$$ is just the statement that $H$ is the infinitesimal generator of a one-parameter group parameterized by the time $t$.

For time-dependent Hamiltonians $H(t) = H_0 + V(t)$, the time evolution actually is dependent on the start- and end-points, and the Schrödinger equation is iteratively solved by a Dyson series in the interaction picture, whose schematic form is $$ U(t_1,t_2) = \mathcal{T}\exp(\int_{t_1}^{t_2}\mathrm{e}^{\mathrm{i}H_0 t}V(t)\mathrm{e}^{-\mathrm{i}H_0t}\mathrm{d}t)$$ in the Schrödinger picture, and one obtains it from the evolution equation in the interaction picture, which is the Tomonaga-Schwinger equation $$ \mathrm{i}\partial_t U_I(t,t_0)\psi_I(t_0)\lvert_{t=t_1} = V(t_1)U_I(t_1,t_0)\psi_I(t_0)$$ and iterating the solution $$ U(t_1,t_0) = 1 - \mathrm{i}\int_{t_0}^{t_1}U(t,t_0)\mathrm{d}t$$


An intuitive answer to motivate the Stone theorem that ACuriousMind's answer cites.

Take a quantum system. For now, let it be a finite dimensional one (the Stone theorem is needed to make the reasoning work in a separable, infinite dimensional Hilbert space). Let it evolve for time $t_1$. The evolved state is some linear operator $U(t_1)$ imparted on the input $\psi$. Now let it evolve for $t_2$. The state is now $U(t_2)\,U(t_1)\psi$. $U(t)$ is here an $N\times N$ square matrix. But these two evolutions are the same as $U(t_1+t_2)$. So we have immediately:

$$U(t_1+t_2) = U(t_2) U(t_1)$$.

From this functional equation, as well as the postulate that the evolution through any timeslice of any length $\delta t$ is the same as the evolution through any other timeslice of the same duration (i.e. we assume that our system is non time varying), we can then derive $U(q) = U(1)^q$ for any rational $q$.

The only continuous extension of such a function is the matrix exponential of the form $U(t) = \exp(K\,t)$. In this particular case, we know that our evolution must be unitary (the system has unity probability to end up in some state!), which tells us that $K=-K^\dagger$. Thus we can write $U(t) = \exp\left(-\frac{i}{\hbar}H\,t\right)$ for some Hermitian $H$.


This rather detailed answer will potentially chart out the issues and assumptions that are needed for a straightforward idea. Understanding how it is done might be more useful than actually doing it.

Assumption 0) The Hamiltonian is time independent.

To be honest, I never find arguments about taking derivatives of operators very appealing. We will have a Hilbert Space (a vector space with additional structure) and talk about operators that are mere functions from the vector space to itself (or from a subset of the vector space to itself) or for a vector operator ... it can be a number of functions from (a subset of) the vector space to itself). If we have a parameterized function from (a subset of) $\mathbb R$ into the vector space, then we can take an ordinary (vector valued) derivative of the function. We will incorporate this in the following by just having $H$ act on vectors to give vectors, and if you have a parameterized function then it will act on each vector in the parameter So $(H[\vec v])(t)=H[\vec v(t)].$

Assumption 1) The Hamiltonian is self-adjoint.

This means that there is an orthonormal basis of eigenvectors. Actually, having an orthonormal basis of eigenvectors is more fundamental than being self-adjoint when it comes to vector operators, and that is what we want, so really that's what we should assume, but it's traditional to say that observables are self-adjoint. And this is scalar operator, so it is fine to do it the traditional (backwards) way.

So fix a maximal set of orthonormal eigenvectors $\{|n\rangle:n\in\mathbb N \}$ with corresponding eigenvalues $\{E_n\}.$ Note that the eigenvectors have a dense span, and in particular, for an arbitrary vector $\vec v$ in the Hilbert space $\mathcal H$ there are unique coefficients $c_n=\langle v | n\rangle$ such that $\vec v = \lim_{N\rightarrow \infty}\Sigma_{k=0}^{k=N} c_k|n\rangle.$

Note for the next part that for an eigenvector $|n\rangle,$ then $H|n\rangle=E_n|n\rangle.$

Assumption 2) Assume evolution is determined by the Schrödinger equation $ \frac{\partial}{\partial t} |\psi(t)\rangle = \frac{1}{i}H |\psi(t)\rangle$

We assume the above for every single vector, not just the one we care about (and this is essential) and so combined with assumption one we get $\frac{\partial}{\partial t} |n\rangle = \frac{1}{i}H |n\rangle= \frac{1}{i}E_n |n\rangle.$ But this means the evolution is confined to a 1d subspace of the Hilbert space, so regular 1d ordinary differential equations apply for the simple equation $\frac{d}{d t} \vec{v} = \frac{1}{i}E_n \vec v.$ It has solution by regular 1d ODE of $\vec{v}(t)=e^{-iE_n}\left[\vec{v}(0)\right].$

If you consider that fixed maximal set of orthonormal eigenvectors $\{|n\rangle:n\in\mathbb N \}$ with corresponding eigenvalues $\{E_n\}$ then you can get a time parameterized family of indexed vectors $\{|n\rangle(t)=e^{-iE_n}\left[|n\rangle(0)\right]:n\in\mathbb N\}\subset \mathcal H.$ Each of which is itself correctly evolving in time.

Assumption 3) Assume the Hamiltonian is continuous/bounded (same thing since it is linear)

We know how $\Sigma_{k=0}^{k=N} c_k|n\rangle$ evolves in time $\frac{d}{dt}\Sigma_{k=0}^{k=N} c_k|n\rangle=\Sigma_{k=0}^{k=N} c_k\frac{d}{dt}|n\rangle$ by linearity and that equals $\Sigma_{k=0}^{k=N} c_kE_n|n\rangle$ since that is how those particular states evolve from above. But since they are eigenvectors we get $\Sigma_{k=0}^{k=N} c_kE_n|n\rangle=\Sigma_{k=0}^{k=N} c_kH|n\rangle$ which by linearity equals $H\Sigma_{k=0}^{k=N} c_k|n\rangle$ And by assumption $H$ is continuous/bounded so $\lim H\Sigma_{k=0}^{k=N} c_k|n\rangle=H\lim\Sigma_{k=0}^{k=N} c_k|n\rangle$

We know how $\Sigma_{n=0}^{n=N} c_n|n\rangle(t)=\Sigma_{n=0}^{n=N} c_ne^{-iE_nt}|n\rangle(0)$ evolves in time:

$\begin{eqnarray} \frac{d}{dt}\Sigma_{n=0}^{n=N} c_n|n\rangle(t)&= \Sigma_{n=0}^{n=N} c_n\frac{d}{dt}|n\rangle(t) \\ &= \Sigma_{n=0}^{n=N} c_n\frac{1}{i}E_n|n\rangle(t) \\ &= \Sigma_{n=0}^{n=N} c_n\frac{1}{i}H|n\rangle(t)\\ &= \frac{1}{i}H\Sigma_{n=0}^{n=N} c_n|n\rangle(t). \end{eqnarray}$

Where each equal sign comes from: linearity of differentiation in the n dimensional vector space spanned by the vectors, the evolution of the $|n\rangle$, the eigenvector nature of the vectors, and finally by linearity of the Hamiltonian. But really the whole thing followed from the Schrödinger equation, sorry about that.

Now we can find out

$\begin{eqnarray} \frac{d}{dt}\vec v(t)&= \frac{d}{dt}\lim \Sigma_{n=0}^{n=N} c_n|n\rangle(t) \\ &= \frac{1}{i}H\lim \Sigma_{n=0}^{n=N} c_n|n\rangle(t) \\ &= \frac{1}{i}\lim H\Sigma_{n=0}^{n=N} c_n|n\rangle(t) \\ &= \lim \frac{1}{i}H\Sigma_{n=0}^{n=N} c_n|n\rangle(t) \\ &= \lim \frac{d}{dt}\Sigma_{n=0}^{n=N} c_n|n\rangle(t). \end{eqnarray}$

Where each equal sign is because of the denseness of the eigenvectors (and the definition of $v(t)$), the Schrödinger equation evolution, the continuity/boundedness of $H,$ linearity, and finally the Schrödinger equation.

So far this might seem incredibly boring, or even trivial. But if the Hamiltonian were not self adjoint there would be problems. If it was not continuous/bounded then it either might not act on the whole space (so some vectors would not have time derivatives) or else it's value would not be determined by the eigenvectors since even if the eigenvectors formed linear combinations that got close the $H$ acting on the limit might not be the limit of what $H$ does on the sequence that approaches the limit). So we need these assumptions to do this computations.

OK. So now we can claim that the time derivative is determined by the time derivative on the linear combinations of eigenvectors. Which is great. Because we know how those work on the operator $U(t_2,t_1)=e^{-i(t_2-t_1)H}.$ For any eigenvector $|n\rangle(t_1)$ we know $U(t_2,t_1)|n\rangle(t_1)$=$e^{-i(t_2-t_1)H}|n\rangle(t_1)$=$e^{-i(t_2-t_1)E_n}|n\rangle(t_1)$=$e^{-i(t_2-t_1)E_n}e^{-iE_nt_1}|n\rangle(0)$=$e^{-iE_nt_2}|n\rangle(0)$=$|n\rangle(t_2)$.

It evolves eigenvectors. But again, because of the continuity/boundedness of $H$ we get that for any vector whatsoever $e^{-i(t_2-t_1)H}$ is determined by the eigenvectors and the action on the limit is the limit of the action on the sequence that approaches the limit.

So what have we done. First we showed uniqueness, we found what the eigenvectors had to do, and argued that because of the continuity of $H$ the time derivative is determined by the time derivative on the eigenvectors, and this evolutions evolves the eigenvectors fine and so again by continuity it must be give you that unique solution.

We didn't really have to do anything fancy.

But what if the Hamiltonian isn't self-adjoint? Some people do that, but they then just change the topology and in the new topology it is self adjoint, so really no one seems to really do that.

What if it it didn't follow the Schrödinger equation? As long as the time derivative operator is (anti) self adjoint, we are fine, nothing changes. But this is strongly related to conservation of probability and unitary evolution, so changing that would be a big deal, particularly for the more popular probability based interpretations.

And what if the Hamiltonian wasn't bounded? That means you would have arbitrarily high energies. We don't really know what happens at super high energies, so if you results depends on exactly what happens at super high energies, don't trust your results too much. So why not replace a Hamiltonian that has super high energies with one that doesn't? By adjusting the Hilbert space you can fit it into a bounded range of energies for your Hamiltonian, but maybe some operators you like will become defined only a a subset, which is really what happens you have to restrict the domain of some operators, and then deal. So life will be more complicated if your Hamiltonian isn't bounded, one way or another.

What if your Hamiltonian depends on time? This is strongly related to lack of conservation of energy. Which usually means you had an external system interacting with your system. On the one hand you could include the external system and have a larger system and then if you track that energy flowing from the other system then energy can be conserved again. What this means is that instead of an external system that just does what it does you set up the state of the formerly-external system and let it evolve too, so the time part is gone, and things do what they do because each part evolves the way it evolves. This is a fully correct way to do it, where the time dependence has gone away.

So you have to weigh the difficulty of incorporating a realistic (sub)system versus having a time dependent Hamiltonian. Where did our assumption of a time independent Hamiltonian come in? We used it all along by having an operator just be a thing that act on vectors (assumption 0).