Are all pseudoscalars secretly Goldstone bosons?

The two theories, namely the ``gradient model'' $\partial_\mu\pi \bar{\Psi}\gamma^5\gamma^\mu\Psi$ and the Yukawa model $g\pi\bar{\Psi}\gamma^5\Psi$ (both with a massive $\Psi$), are definitely not equivalent. They have different symmetries, spectrum and scattering amplitudes, hence are physically distinct theories. The main mistake that you (the OP) are doing is using the free equations of motion for the fermions, but that's ok only for the external legs and not for the virtual ones that enter e.g. in the one-loop calculation of the $\pi$ mass, or as I'll show below in a scattering amplitude with an intermediate virtual $\psi$ exchanged. (The mistake that Cosmas Zachos was doing in his earlier answer and that in part is still doing in the marginally improved answer is explained in my comments to his answer, I will not repeat it here).

The gradient model is indeed invariant under $\pi\rightarrow \pi+const$ which clearly forbids a mass term for $\pi$. This isn't the case for the Yukawa model where a bare mass is needed to remove the quadratic divergent mass generated by the fermion loops. A physical pole mass is therefore generically non-zero, barring fine-tuning.

More importantly, Goldstone bosons (GB's) aren't just massless particles, they have various special features. For example, soft GBs (that is the limit of vanishing $\pi$-momentum) give vanishing scattering amplitudes (the so-called Adler zero condition). This is realized for the gradient theory but not for the Yukawa theory. Let's see this in more detail looking at a physical scattering amplitude $\pi\Psi\rightarrow \pi\Psi$. For the Yukawa theory one has $$ M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}=-g^2 \left[\bar{u}(p_2^\prime)\frac{i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)}{s-m^2+i\epsilon}u(p_2)+\mbox{crossed diag.}\right] $$ for $\pi(p_1)\Psi(p_2)\rightarrow\pi(p_1^\prime)\Psi(p_2^\prime)$. The $\gamma^5$ have been moved around and simplified with the numerator of the fermion propagator, i.e. $\gamma^5i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha+m)\gamma^5=-i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)$. We could have simplified the numerator using $\gamma_\alpha p_2^\alpha u(p_2)=m u(p_2)$, where $m$ is the fermion mass, but it's more convenient this form in the following. There is an s-channel contribution, explicitly displayed, along with a crossed diagram that we do not display explicitly.

(disclaimer: I am doing this calculation by hand on an Ipad, I hope it is not grossly incorrect :-), although factors of 2 and minus signs are most likely off)

This $M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}$ doesn't vanish for $p_1\rightarrow 0$ because, even though the numerator goes to zero (namely $\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)u(p_2)=\gamma_\alpha p_1^\alpha u(p_2)\rightarrow 0$), so does the denominator at the same rate ($s-m^2=2p_{1\alpha} p_2^\alpha\rightarrow0$; here I am assuming that we have tuned the spectrum to be the same, that is the $\pi$ mass in the Yukawa model has been tuned to zero by hand, otherwise the numerator wouldn't even vanish and the comparison between the two models would make no sense).

On the other hand, for the gradient theory we get $$ M^{gradient}_{\pi\Psi\rightarrow\pi\Psi}=\frac{1}{f^2}\left[\bar{u}(p_2^\prime)\frac{i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)^3}{s-m^2+i\epsilon}u(p_2)+\mbox{crossed diag.}\right]\rightarrow 0 $$ which is not only different (hence the two theories are physically distinct, period) but it gives a vanishing amplitude in the GB soft limit $p_1\rightarrow 0$ since the numerator can be written as $i\bar{u}(p_2^\prime)(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)^3u(p_2)=i\bar{u}(p_2^\prime)\gamma_\alpha p_1^{\prime\alpha}(\gamma_\beta p_1^\beta+\gamma_\beta p_2^\beta-m)(\gamma_\gamma p_1^\gamma)u(p_2)$, and for momentum conservation $p_1^\prime\rightarrow 0$ too.

The takeaway message is: the two models are distinct physically and mathematically. The gradient theory describes a GB whereas the Yukawa theory describe a scalar with a mass tuned to be zero.

Extra edits I have finally found some times to add a last remark that I mentioned in the comments but it is actually worth to report in the full answer. It is also related to the answer by @Cosmas Zachos.

Having established that the two theories are different, one may wonder how much different and what is the relation between the two, given that the simple use of the equations of motion by the OP was flawed. The answer is very simple: the two theories differ at the non-linear $\pi$-level, starting from the quadratic order. In particular, the claim is that the theory given by $$ \mathcal{L}_{new}=\bar{\Psi}(i\gamma^\alpha \partial_\alpha-m e^{-2i\gamma^5\pi/f})\Psi+\frac{1}{2}(\partial_\mu\pi)^2\qquad f\equiv 2m/g\,, $$ which differs from $\mathcal{L}_{Yukawa}=\bar{\Psi}(i\gamma^\alpha \partial_\alpha-m)\Psi+ig\pi \bar{\Psi}\gamma^5\Psi+\frac{1}{2}(\partial_\mu\pi)^2$ starting from $o(\pi^2)$, is in fact equivalent to the gradient theory $$ \mathcal{L}_{gradient}=\frac{1}{2}(\partial_\mu\pi)^2+\bar{\Psi}(i\gamma^\alpha \partial_\alpha-m)\Psi+\frac{1}{f}\partial_\mu\pi \bar{\Psi}\gamma^5\gamma^\mu\Psi\,. $$ Indeed, it's enough to perform the field redefinition $\Psi\rightarrow e^{i\gamma^5 \pi/f}\Psi$ to move the $\pi$ from the non-derivative term to the gradient coming from the $\Psi$-kinetic term.

As ultimate check, let's see the behavior under the soft limit $p_1\rightarrow 0$. The contributions from two linear-$\pi$ vertex insertions is like in the Yukawa theory, but now there is also a contact term coming from expanding the exponential, $\frac{2m}{f^2}\pi^2\bar{\Psi}\Psi$, i.e. $$ M^{new}_{\pi\Psi\rightarrow\pi\Psi}=M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}+i\frac{4m}{f^2}\bar{u}(p_2^\prime)u(p_2)\,. $$ Now, in the soft limit $p_1\rightarrow 0$, we have $p_2^\prime \rightarrow p_2$ hence the Yukawa terms give $$ M^{Yukawa}_{p_1\rightarrow 0}=-g^2 \left[\bar{u}(p_2)\frac{i\gamma_\alpha p_1^\alpha}{2 p_1 p_2}u(p_2)+\mbox{crossed diag.}\right]=-2ig^2 $$ where I have used that $\bar{u}(p)\gamma^\alpha u(p)=2p^\alpha$. On the other hand, the new contact term in from $\mathcal{L}^{new}$ gives
$$ i\frac{4m}{f^2}\bar{u}(p_2)u(p_2)=i\frac{8m^2}{f^2}=2i g^2\,, $$ from $\bar{u}(p)u(p)=2m$ (no sum over the two spin orientations, we are considering definite polarizations). Summing the two contributions we see that they vanish each other out, in agreement with the Adler-zero condition.

But again, the equivalence with the gradient theory is achieved only after modifying the theory at the $o(\pi^2)$ level in the way shown above, which corresponds to render the $\pi$ a GB. The Yukawa coupling alone instead is for non-GB particles.


You are really asking a question about the controlling $U(1)_A$ symmetry of the one-flavor σ-model. You thus need to first display the symmetry you are really probing.

Let's start from the linear σ-model. Schematically (being cavalier with over-all factors...), $$ {\cal L}_{lin}= i\bar{\Psi}\partial \!\! / ~\Psi +\tfrac{1}{2} \partial \pi \cdot \partial \pi +\tfrac{1}{2} \partial \sigma \cdot \partial \sigma + g \bar\Psi (\sigma +i\gamma_5 \pi) \Psi -V(\pi,\sigma), $$ invariant under the U(1)A symmetry, $$ \delta \Psi= \frac{i}{2}\theta \gamma_5 \Psi,\\ \delta \sigma= \theta \pi,\\ \delta \pi= -\theta \sigma, $$ which you must check leaves the kinetic, Yukawa, terms invariant; and dictate it does so on the stable potential; you may choose the latter to be Goldstone's sombrero, etc, but I assume you are familiar with the SSB monkey business, and you know how to carry out the requisite field shifts, etc.

The corresponding (on-shell) conserved current is $$ J_A^{\mu}=-\tfrac{i}{2} \bar\Psi \gamma_5 \gamma^{\mu} \Psi + \pi \partial_\mu \sigma - \sigma \partial_\mu \pi, $$ so $ \partial\cdot J_A=0$.

Now, assuming the potential minimum enforces $\langle \pi\rangle =0$, $\langle \sigma\rangle=-f$ and redefining $\sigma\equiv \sigma ' -f$, s.t. $\langle \sigma' \rangle=0 $, observe this gives the fermion a mass m=gf, and the σ' a mass dependent on the arbitrary curvature of the potential V at its minimum--which you may take to be large, so the scalar σ' may be made arbitrarily massive so as to decouple from the low-energy model.

The result is the associated low energy σ-model, virtually trivial, involving, crucially, a massless goldston π, $$ {\cal L}_{low}= i\bar{\Psi}\partial \!\! / ~\Psi - gf \bar\Psi \Psi +\tfrac{1}{2} \partial \pi \cdot \partial \pi + g \bar\Psi i\gamma_5 \pi \Psi +\sigma' ~ {\mathrm terms}, $$ invariant under $$\delta \Psi= \frac{i}{2}\theta \gamma_5 \Psi, \qquad \delta \pi= \theta f ~~~(-{\small \theta \sigma'}), \qquad ({\small \delta \sigma'= \theta \pi}), $$ so that $$ J_A^\mu=-\tfrac{i}{2} \bar\Psi \gamma_5 \gamma^{\mu} \Psi + f\partial_\mu \pi ~~~(+{\small \pi\partial_\mu \sigma' - \sigma'\partial_\mu \pi}). $$ This hallmark shift in the transformation law for the π identifies it as a Goldstone boson.

(Note added to address concerns of @TwoBs: in fact, a residual term in the variation here is cancelled by the $-\theta \sigma'$ originally left out of $\delta \pi$, and the omitted $g\bar{\Psi} \sigma' \Psi$ in the lagrangian. I am omitting the ultra-heavy σ's here, which are still necessary for the full axial invariance, albeit out of sight here. Nevertheless, the amplitudes of the linear sigma model do, of course, have Adler zeros, as well known; however, the ππσ' vertex in the potential is involved.)

This symmetry enters into and constrains perturbation theory and "protects" the tree-level masslessness of the Goldstone mode π, as an induced mass for it would break it. (In fact, far more transparent than your naturally deprecated derivative coupling model, basically not relevant to the question. See VanderBij & Veltman 1984 for the pitfalls of the infinite-Higgs-mass limit.)

You may then see that comparison with the unrenormalizable gradient model is completely gratuitous, and avoidable. The pseudo scalar Yukawa term by itself has all crucial ingredients of a SSB symmetry in it, and is instantly made axially invariant with an innocuous σ scalar Yukawa, and a potential of your design, renormalizable if you wish, chosen to minimize most visible effects of the σ. The linear σ model (whose non-abelian generalization is the Higgs sector of the standard model) is the prototype that all future variant representations and tweaks are ineluctably based on.

Takeaway: A massless pseudoscalar coupling to a fermion in a parity-preserving Yukawa mode is perforce (or can be naturally promoted to) a goldston of an axial SSB and will remain such in perturbation theory--unless extraneous couplings explicitly spoil this custodial axial symmetry.

Added Ref: The historic 1960 σ-model paper reviewed in most good texts is as good as any to reset one's compass.