Why is the standard definition of fidelity unnecessarily complicated?

The square root function is not analytic, but has a branch cut (which is usually chosen along the negative real axis, but this is merely a choice).

To convince yourself that this is not a technical point but a real problem with your alternative definition, note that the square root function doesn't have a standard power series expansion around zero. When we want to define what is $\sqrt{\rho}$ we cannot use the series expansion properties and rather define it using the fact that $\rho$ can be diagonalized $\rho = \sum_n r_n |n\rangle \langle n|$ then $$ \sqrt{\rho} : = \sum_n \sqrt{r_n} |n\rangle \langle n|$$ and since $\rho$ is Hermitian and $r_n$ are real non-negatives so there's no ambiguity, and $\sqrt{\rho}$ is also Hermitian.

However, $\rho\sigma$ is in general not Hermitian (it is Hermitian only in the trivial case where they commute). Therefore, we have no safe way to define $\sqrt{\rho\sigma}$ (in contrast to $\sqrt{\sqrt{\rho}\sigma\sqrt{\rho}}$ which is a square root of an Hermitian operator).


Okay, this is a rather subtle situation, but I think I've figured it out. The key is to be very careful about which mathematical results about Hermitian operators do and do not hold for generic operators. Let $H$ represent an arbitrary Hermitian matrix, $N$ an arbitrary normal one, and $M$ an arbitrary matrix, all acting on an $n$-dimensional Hilbert space.

Subtlety 1: For normal $N$, the numerical range $$\left \{ \frac{\langle \psi | N | \psi \rangle}{\langle \psi|\psi\rangle} \right \}$$ for all nonzero $\psi$ in the Hilbert space is the convex hull of the eigenvalues of $N$. (So for $H$ Hermitian, it's the real interval $[\min \lambda, \max \lambda]$.) For generic $M$, the numerical range is still convex and contains the eigenvalues, but is not necessarily a hull for them.

Subtlety 2: The term "positive (semi-)definite" is usually only defined for Hermitian matrices $H$. There, it can be defined in one of two equivalent ways:

PD1: A Hermitian matrix is positive (semi-)definite (PD1) if all of its eigenvalues are positive (nonnegative real).

PD2: A Hermitian matrix is positive (semi-)definite (PD2) if $\langle \psi | H | \psi \rangle > (\geq)\ 0$ for all nonzero $|\psi\rangle$ in the Hilbert space.

PD1 and PD2 are equivalent for Hermitian $H$, but their generalizations for generic (not necessarily Hermitian) $M$ are no longer equivalent: in the generic case, PD2 is strictly stronger than PD1. Indeed, PD2 (i.e. $\langle \psi | H | \psi \rangle \ge\ 0\ \forall\,|\psi\rangle$) implies that the matrix has to be Hermitian (which is why it is often considered the "standard" definition); on the other hand, there are clearly non-hermitian matrices which satisfy PD1 (i.e. have positive eigenvalues) -- see here for an example of a non-Hermitian $M$ that satisfies PD1 but not PD2.

Subtlety 3: There are two inequivalent definitions of a square root of a matrix.

SR1: $R$ is a square root (SR1) of a generic matrix $M$ if $R^2 = M$. Under this definition, a matrix has a finite number of square roots (e.g. $2^n$ if its eigenvalues are all distinct). This definition makes sense for any matrix. I'm not sure whether or not there's generically a natural choice of "principal" square root in this situation (e.g. if $M$ is defective), so the notation $\sqrt{M}$ is not (as far as I know) well defined.

SR2: $R$ is a square root (SR2) of a positive definite Hermitian matrix $P$ if $R^\dagger R = P$. (Since $P$ is Hermitian, we don't need to specify whether we mean PD1 or PD2 for "positive definite".) Under this definition, the set of square roots of a matrix $P$ is isomorphic to the Lie group $U(n)$, because if $R_1$ is a square root of $P$, then $R_2$ is a square root of $P$ iff $R_2 = U R_1$ for some unitary matrix $U$. $R$ is not necessarily Hermitian. But under this definition, we can define the natural "principal" square root of $P$, which we denote by $L = \sqrt{P}$, as the unique square root that is also Hermitian and positive (semi-)definite (again, no need to distinguish PD1 from PD2 here).

Since the principal square root $L$ is Hermitian by definition, it respects both definitions SR1 and SR2, as $L^\dagger L = L^2 = P$. But a generic square root (SR1) of $P$ will not be a square root (SR2) of $P$ or vice versa.

For a Hermitian $H$, the usual power series expansion of the square root function $$\sum_{n=0}^\infty \frac{(-1)^n (2n)!}{(1-2n)(n!)^2 4^n} (H - I)^n$$ will converge to $\sqrt{H}$ iff all the eigenvalues of $H$ lie in the interval $[0,2]$. For a generic $M$, this series will converge to a square root of $M$ iff all of the eigenvalues of $M$ lie in the disk in the complex plane that has that interval as a diameter. (If I recall correctly, the boundary points 0 and 2 are included, but the boundary points with nonzero imaginary part are all excluded.)

Subtlety 4: If either of two generic matrices $A$ or $B$ is invertible, then $AB$ and $BA$ are similar, but if both $A$ and $B$ are singular, then $AB$ and $BA$ are not necessarily similar (see here for a counterexample). But even in this case, $AB$ and $BA$ always have the same eigenvalues and in fact characteristic polynomials, so (for example) their traces and regions of convergence for any formal power series will be the same.

Subtlety 5: If we have two Hermitian positive-definite matrices $P_1$ and $P_2$, then their non-Hermitian product $P_1 P_2$ will satisfy PD1 but not necessarily PD2 (see the first link above for a counterexample), so it may or may not necessarily be "positive definite", depending on your definition.

Now we can finally try to answer my question. The standard definition of the fidelity is unambiguous, because only Hermitian positive-semidefinite operators are ever getting square rooted. Since $\rho \sigma$ is non-Hermitian, its numerical range is generically complex and it does not satisfy PD2. Moreover, we can't talk about its square roots using definition SR2. And generically, the notation $\sqrt{M}$ may not be meaningful for a non-Hermitian $M$ because it implies some natural principal branch.

But we can talk about the square roots (plural) of $\rho \sigma$ under definition SR1, as with any matrix. Moreover, $\rho \sigma$ is a highly non-generic matrix. It satisfies PD1 by Subtlety 5. In fact, by Subtlety 4, $\rho \sigma$ has the same characteristic polynomial (with all roots lying in $[0,1]$) as $\sqrt{\rho} \sigma \sqrt{\rho}$. So because the eigenvalues all lie in this interval (which is obviously not generically true), there is a natural choice of principal square root: the one given by the usual power series expansion above, which is guaranteed to converge in light of its eigenvalue spectrum. So in this particular case, we can get away with defining $\sqrt{\rho \sigma}$ by the formal power series above. Then by the logic outlined in my question, we can indeed cycle the operators inside the square root and always get the right answer.

TLDR: The expression $\sqrt{M}$ is not uniquely defined for a generic matrix $M$ that is not Hermitian and positive semidefinite. But in this case, the special properties of the matrix $\rho \sigma$ guarantee that the formal power series above converges, so we can use that power series to (non-conventionally) define $\sqrt{\rho \sigma}$. If we use that convention, then we will indeed always get the same answer as the traditional definition. However, this is a bit of a hack, and the traditional definition's meaning is clear without needing to make any additional implicit definitions.