Why, when going from special to general relativity, do we just replace partial derivatives with covariant derivatives?

Transforming partial derivatives to covariant derivatives when going from Minkowski to a general spacetime is just a rule of thumb, and should not be applied carelessly.

For example, when studying electromagnetism in the Lorenz gauge $(\nabla_\mu A^\mu =0)$, working from first principles, one can show that the inhomogeneous wave equation reads:

$$\nabla_\nu \nabla^\nu A^\mu - R^\mu_{\,\,\nu} A^\nu = -j^\mu$$

whereas in Minkowski the same equation reads:

$$\partial_\nu \partial^\nu A^\mu = -j^\mu$$

If we used $\partial\rightarrow\nabla$, we would not find the contribution of the curvature term. Although in general the $\partial\rightarrow\nabla$ might work, to be safe you should try to derive physical rules using a covariant approach (e.g. from an action principle).

You are right that it is not unique. The rule you mention is called minimal coupling. It is similar to electromagnetism when we replace $p_{\mu}$ by $p_{\mu} - eA_{\mu}$ in our first-order equations. This is the simplest approach one could take, in which you just add a term describing, e.g. electromagnetism, to the action, and then it just couples to gravity through the metric in the volume element.

There are other ways of doing so by contracting the Ricci tensor with the field strength tensor, for instance, but these are non-minimal. We make choices like these all the time, even in choosing the form of the connection in the covariant derivative. So the answer in the end is that this minimal approach agrees with experiment to their current accuracies, so why complicate things?