Why can interaction with a macroscopic apparatus, such as a Stern-Gerlach machine, sometimes not cause a measurement?

It's a very good question, since indeed if the original Stern-Gerlach machine had a well-defined momentum, then you are right that there could be no coherence upon rejoining the beams! The rule of thumb for decoherence: a superposition is destroyed/decohered when information has leaked out. In this setting that would mean that if by measuring, say, the momentum of the Stern-Gerlach machine you could figure out whether the spin had curved upwards or downwards, then the quantum superposition between up and down would have been destroyed.

Let's be more exact, as it then will become clear why in practice we can preserve the quantum coherence in this kind of set-up.

Let us for simplicity suppose that the first Stern-Gerlach machine simply imparts a momentum $\pm k$ to the spin, with the sign depending on the spins orientation. By momentum conservation, the Stern-Gerlach machine gets the opposite momentum, i.e. (using that $\hat x$ generates translation in momentum space) $$\left( |\uparrow \rangle + |\downarrow \rangle \right) \otimes |SG_1\rangle \to \left( e^{- i k \hat x} |\uparrow \rangle \otimes e^{ i k \hat x} |SG_1\rangle \right) + \left( e^{i k \hat x} |\downarrow \rangle \otimes e^{- i k \hat x} |SG_1\rangle \right) $$ Let us now attach the second (upside-down) Stern-Gerlach machine, with the final state $$\to \left( |\uparrow \rangle \otimes e^{ i k \hat x} |SG_1\rangle \otimes e^{-i k \hat x} |SG_2\rangle \right) + \left( |\downarrow \rangle \otimes e^{- i k \hat x} |SG_1\rangle \otimes e^{ i k \hat x} |SG_2\rangle \right) $$

For a clearer presentation, let me now drop the second SG machine (afterwards you can substitute it back in since nothing really changes). So we now ask the question: does the final state $\boxed{ \left( |\uparrow \rangle \otimes e^{ i k \hat x} |SG_1\rangle \right) + \left( |\downarrow \rangle \otimes e^{- i k \hat x} |SG_1\rangle \right) }$ still have quantum coherence between the up and down spins?

Let us decompose $$ e^{ -i k \hat x} |SG_1\rangle = \alpha \; e^{i k \hat x} |SG_1\rangle + |\beta \rangle $$ where by definition the two components on the right-hand side are orthogonal, i.e. $\langle SG_1 | e^{ -2 i k \hat x} | SG_1 \rangle = \alpha$. Then $|\alpha|^2$ is the probability we have preserved the quantum coherence! Indeed, the final state can be rewritten as $$\boxed{ \alpha \left( |\uparrow \rangle +| \downarrow \rangle \right) \otimes e^{ i k \hat x} |SG_1\rangle + |\uparrow\rangle \otimes | \gamma \rangle + |\downarrow \rangle \otimes |\beta\rangle }$$ where $\langle \gamma | \beta \rangle = 0$. In other words, tracing out over the Stern-Gerlach machine, we get a density matrix for our spin-system: $\boxed{\hat \rho = |\alpha|^2 \hat \rho_\textrm{coherent} + (1-|\alpha|^2) \hat \rho_\textrm{decohered}}$.

So you see that in principle you are right: the quantum coherence is completely destroyed if the overlap between the SG machines with different momenta is exactly zero, i.e. $\alpha = 0$. But that would only be the case if our SG has a perfectly well-defined momentum to begin with. Of course that is completely unphysical, since that would mean our Stern-Gerlach machine would be smeared out over the universe. Analogously, suppose our SG machine had a perfectly well-defined position, then the momentum-translation is merely a phase factor, and $|\alpha|=1$ so in this case there is zero information loss! But of course this is equally unphysical, as it would mean our SG machine has completely random momentum to begin with. But now we can begin to see why in practice there is no decoherence due to the momentum transfer: in practice we can think of the momentum of the SG machine as being described by some mean value and a Gaussian curve, and whilst it is true that the momentum transfer of the spin slightly shifts this mean value, there will still be a large overlap with the original distribution, and so $|\alpha| \approx 1$. So there is strictly speaking some decoherence, but it is negligible. (This is mostly due to the macroscopic nature of the SG machine. If it were much smaller, than the momentum of the spin would have a much greater relative effect.)