How do physicists know when it is appropriate to use $\mathrm dx$ as if it is a number?

You are a victim of a quite frequent way of teaching calculus which continues to show embarrassment with the differentials, even if nowadays we have all the conceptual tools to deal with them in a safe way.

Forget about operators (which are not the natural way of looking at things in the present context) and infinitesimals (which mean too many things, some of them obsolete and inconsistent).

Let me try to show a consistent way of thinking and using differentials.

I assume that we have the definition of derivative of a real function of one real variable. Let's indicate the derivative of a function $f$ in a point $x_0$ as $f^{\prime}(x_0)$. If the function is smooth enough (for our purposes it is sufficient that its first derivative is a continuous function) the first derivative at $x_0$ tells us how the function behaves locally in a neighbor of $x_0$: $$ f(x) = f(x_0) + f^{\prime}(x_0) (x-x_0) + O((x-x_0)^2) $$ from which we can say that the best linear approximation to the variation of $f$ around $x_0$ is represented by $f^{\prime}(x_0) (x-x_0)$.

We call such a best linear approximation of the variation of $f$ around $x_0$ the differential of $f$ at $x_0$ and we denote it by $df_{x_0}$. It is clear from its explicit expression that it is a (in general non-linear) function of $x_0$ and a (linear) function of $x$. It is also clear that such linear function of $x$ is defined for any $x$, but only in the neighborhood of $x_0$ we can write $$ \Delta f = f(x)-f(x_0) \simeq df_{x_0}(x) = f^{\prime}(x_0) (x-x_0). $$

At this point we can observe that our definitions and notations allow to write unambiguously $$ \Delta x = x-x_0 = dx_{x_0} = 1 \cdot (x-x_0) $$ and therefore we are enabled to write $$ \frac{\Delta f}{\Delta x} \simeq \frac{df_{x_0}}{d x_{x_0}} = f^{\prime}(x_0), $$ where we know that the symbol $\simeq$ here means a part corrections which vanish when $(x-x_0)\rightarrow 0$ at least linearly.

You see that this way of dealing with differentials is fully consistent, differentials are real valued functions, and no strange infinitesimal quantity is around.

Generalization to differentials of more than one variable or to vector valued functions is of course possible and it is a trivial generalization of the previous treatment. For example, the length of line element of a curve, $ds$, you wrote in you question can be written as $$ ds = \sqrt{dx^2+dy^2}=\sqrt{x^{\prime 2}+y^{\prime 2}} d\tau $$ if $\left(x(\tau),y(\tau)\right)$ is a parametrization of the curve. Each quantity appearing with a $d$ here is a differential, and at the and of the day, a real number.

I agree that, even if everything above is nothing but math, it is the community of physicists which seems to be more uneasy with the manipulations of differentials.

I would say that the "operations" and "treating differentials as numbers" are really the same thing. You can view $\frac{\text dy}{\text dx}$ as performing an "operation" denoted by $\frac{\text d}{\text dx}$ on a function $y(x)$, or you can think of $\frac{\text dy}{\text dx}$ as an actual ratio comparing by how much the function changes ($\text dy$) when you change the input by some amount ($\text dx$). i.e. the "operation" of applying $\frac{\text d}{\text dx}$ to a function $f$ means to find that ratio for all values of $x$.

So, in your final integral you are essentially adding up all contributions $\sqrt{1 +\left(\frac{\mathrm dy}{\mathrm dx}\right)^2} \mathrm dx$ where you can think of $\frac{\text dy}{\text dx}$ as a ratio of "numbers" and $\text dx$ as a "number".

how do you justify when to switch between treating it as a number and an operation?

Even if these two things can be thought of as the same thing, this really is more of an opinion. Whatever is the easiest for you to think about, go with that. The math itself doesn't really care about how we think about it (as long as our thoughts are still mathematically valid).