Obtain the conditional distribution of $X$ given $X^2=t$

Disclaimer: I'm not an expert, so please correct me if there are serious flaws in my argument.

Hand-waving/guessing

First observe that if $X^2=t$ then $X$ is either $\sqrt{t}$ or $-\sqrt{t}$, so this is a discrete distribution, and to specify the distribution you only need to compute $P(X=\sqrt{t} \mid X^2=t)$ and $P(X=-\sqrt{t} \mid X^2=t)$.

In analogy with the discrete case, you might rewrite $\{X^2 = t\} = \{X=\sqrt{t}\} \cup \{X = -\sqrt{t}\}$ and guess $$P(X=\sqrt{t} \mid X^2=t) = \frac{f_X(\sqrt{t})}{f_X(\sqrt{t}) + f_X(-\sqrt{t})}$$ $$P(X=-\sqrt{t} \mid X^2=t) = \frac{f_X(-\sqrt{t})}{f_X(\sqrt{t}) + f_X(-\sqrt{t})}$$

This would certainly be a probability distribution since the probabilities sum to one. However it is not clear why we can add/divide the densities in this way, since the density values are not really probabilities. The discussion below gives a little more background on why this ends up being valid. However, in my experience undergraduate courses are content to sweep this under the rug since this discussion would involve some rather technical measure theory concepts.

Before I turn to the more rigorous discussion, I want to comment that I noticed your denominator is the density of $f_{X^2}$ (which you correctly noted to be $f_{X^2}(t) = \frac{1}{2\sqrt{t}}(f_X(\sqrt{t}) + f_X(-\sqrt{t}))$. However, you used $f_X(\sqrt{t})$ as the numerator. Due to the change of variables, the $\frac{1}{2\sqrt{t}}$ scaling factor prevents your attempt from being a probability distribution (in that $\frac{f_X(\sqrt{t})}{f_{X^2}(t)} + \frac{f_X(-\sqrt{t})}{f_{X^2}(t)} \ne 1$). This is one reason why it can be dangerous to manipulate densities as you would with discrete probabilities, even if sometimes you do get the correct answer.

Verifying guess with slightly more technical stuff

Conditioning on events of measure zero is a bit of a thorny discussion. One framework for handling this situation is the notion of regular conditional probabilities. The definition translates to the following. The map $(t, A) \mapsto \nu(t, A)$ is the regular conditional probability of $X$ given $X^2$ if

  1. $\nu(t, \cdot)$ is a probability measure for each $t$
  2. $\nu(\cdot, A)$ is measurable for each measurable set $A$
  3. $P(\{X \in A\} \cap \{X^2 \in B\}) = \int_B \nu(t, A) f_{X^2}(t) \, dt, \qquad \text{for all measurable sets $A,B$}.$

(Note that the right-hand side of the third condition can be written as $E[\nu(X^2, A) \mathbf{1}_{X^2 \in B}]$.)

Then $\nu(t, A)$ is what we would usually denote by $P(X \in A \mid X^2 = t)$.

Let's try the guess we made above. That is, let's try $$\nu(t, A) = \frac{f_X(\sqrt{t})}{f_X(\sqrt{t}) + f_X(-\sqrt{t})} \mathbf{1}_A(\sqrt{t}) + \frac{f_X(-\sqrt{t})}{f_X(\sqrt{t}) + f_X(-\sqrt{t})}\mathbf{1}_A(-\sqrt{t}).$$ This satisfies the first two conditions. It remains to check the third condition.

Notice that for $t \ge 0$, we have $f_{X^2}(t) = \frac{d}{dt} P(X^2 \le t) = \frac{d}{dt} \int_{-\sqrt{t}}^\sqrt{t} f_X(s) \, ds = \frac{1}{2\sqrt{t}} (f_X(\sqrt{t}) + f_X(-\sqrt{t}))$. (This is in the denominator of your attempt.) Let $B_+ := B \cap [0, \infty)$. Then, \begin{align} &\int_{B} \nu(t, A) f_{X^2}(t) \, dt \\ &= \int_{B_+} \frac{1}{2\sqrt{t}} f_X(\sqrt{t}) \mathbf{1}_A(\sqrt{t}) \, dt + \int_{B_+} \frac{1}{2\sqrt{t}} f_X(-\sqrt{t}) \mathbf{1}_A(-\sqrt{t}) \, dt \\ &= \int_0^\infty f_X(u) \mathbf{1}_A(u) \mathbf{1}_{B_+}(u^2) \, du + \int_0^\infty f_X(-u) \mathbf{1}_A(-u) \mathbf{1}_{B_+}((-u)^2) \, du \\ &= \int_{-\infty}^\infty f_X(v) \mathbf{1}_A(v) \mathbf{1}_{B_+}(v^2) \, dv \\ &= P(\{X \in A\} \cap \{X^2 \in B\}). \end{align}

Alternate definition involving a limit of events with vanshing probability

Teresa Lisbon and Matthew Pilling consider an alternate way to define conditioning on a zero-probability event by approximating it with events with probability approaching zero. This is discussed on the same Wikipedia page. In our situation, it would be

$$P(X \in A \mid X^2 = t ) = \lim_{\epsilon \to 0} \frac{P(\{X \in A\} \cap \{t-\epsilon < X^2 < t + \epsilon\})}{P(t-\epsilon < X^2 < t + \epsilon)}.$$

I have not checked whether this alternate definition yields the same answer as above, but I want to note that this is not always considered a "good" way to define conditional probability. Indeed there is a warning on that Wikipedia section with a link to a discussion that contains a simple example for where this definition leads to an irregular conditional probability distribution.

Consider $U \sim \text{Uniform}(0,1)$. For any $\delta > 0$, we have \begin{align} &P(0.5-\delta < U < 0.5+\delta \mid U=0.5) \\ &= \lim_{\epsilon \to 0} \frac{P(\{0.5-\delta < U < 0.5+\delta\} \cap \{0.5-\epsilon < U < 0.5+\epsilon\})}{P(0.5-\epsilon < U < 0.5+\epsilon)} \\ &= 1\end{align} but $$P(U=0.5 \mid U=0.5) = \lim_{\epsilon \to 0} \frac{P(\{U=0.5\} \cap \{0.5-\epsilon < U < 0.5+\epsilon\})}{P(0.5-\epsilon < U < 0.5+\epsilon)} = 0.$$

Appendix

My understanding of conditional probabilities may be shaky so please correct any misunderstandings in my answer. I found the Wikipedia pages useful (linked above) as well as this answer by Stefan Hansen.


I'm wondering if this conditional distribution is even well$-$defined. Let $\epsilon_1,\epsilon_2$ be very small positive numbers and $t>0$ be arbitrary. Define $I_1=(-\sqrt{t}-\epsilon_1,-\sqrt{t}+\epsilon_1)$ and $I_2=(\sqrt{t}-\epsilon_2,\sqrt{t}+\epsilon_2)$. Assume $\epsilon_1, \epsilon_2$ are small enough to make $I_1,I_2$ disjoint. Notice how $$P\Big(X\in I_1 \Big|X\in I_1 \cup I_2\Big)=\frac{\int_{-\sqrt{t}-\epsilon_1}^{-\sqrt{t}+\epsilon_1}f_{X}(x)dx}{\int_{-\sqrt{t}-\epsilon_1}^{-\sqrt{t}+\epsilon_1}f_{X}(x)dx+\int_{\sqrt{t}-\epsilon_2}^{\sqrt{t}+\epsilon_2}f_{X}(x)dx}\approx \frac{\epsilon_1 f_{X}\big(-\sqrt{t}\big)}{\epsilon_1 f_{X}\big(-\sqrt{t}\big)+\epsilon_2 f_{X}\big(\sqrt{t}\big)}$$ One would expect that the above expression approaches $P\Big(X=-\sqrt{t}\Big|X^2=t\Big)$ as $(\epsilon_1,\epsilon_2)\rightarrow 0$ but such a limit doesn't exist. I think the most natural conditional pmf would be $$P(X=s|X\in S)=\frac{f_{X}(s)}{\sum_{x\in S}f_{X}(x)}$$ where $X\sim f_{X}$ is any continuous random variable supported on $\mathbb{R}$, $S\subseteq \mathbb{R}$ is any finite set, and $s\in S$ is fixed, but you could repeat a similar argument above and see something weird happening. FYI: This is not intended to be an answer, but my response is too long for a comment.


MAJOR EDIT

Better check @angryavian's answer.

$\Pr\{X=x|X^2=t\}$

$\;\;\;=\Pr\{X=x|X>0, X^2=t\}\Pr\{X>0|X^2=t\}$

$\;\;\;\;\;\; +\Pr\{X=x|X<0, X^2=t\}\Pr\{X<0|X^2=t\}$

Now:

$\Pr\{X=x|X>0, X^2=t\}=1$, if $x=\sqrt{t}$, and zero otherwise,

also $\Pr\{X=x|X<0, X^2=t\}=1$, if $x=-\sqrt{t}$, and zero otherwise.

And:

$\Pr\{X>0|X^2=t\} = \dfrac{\Pr\{X^2\approx t,X>0\}}{\Pr\{X^2\approx t\}}$

$\;\;\;\; =\dfrac{f_X(\sqrt{t})\left|d(x^2)|_{x=\sqrt{t}}\right|}{f_X(\sqrt{t})\left|d(x^2)|_{x=\sqrt{t}}\right|+f_X(-\sqrt{t})\left|d((-x)^2)|_{x=-\sqrt{t}}\right|}$

$\;\;\;\;= \dfrac{\frac{1}{\sqrt{2\pi}}\exp\{-0.5(\sqrt{t}-\theta)^2\}}{\frac{1}{\sqrt{2\pi}}\exp\{-0.5(\sqrt{t}-\theta)^2\}+\frac{1}{\sqrt{2\pi}}\exp\{-0.5(-\sqrt{t}-\theta)^2\}} = \dfrac{e^{\theta\sqrt{t}}}{e^{\theta\sqrt{t}}+e^{-\theta\sqrt{t}}}$

In a similar way:

$\Pr\{X<0|X^2=t\} = \dfrac{e^{-\theta\sqrt{t}}}{e^{\theta\sqrt{t}}+e^{-\theta\sqrt{t}}}$

So,

$\Pr\{X=x|X^2=t\} =\dfrac{e^{-\theta\sqrt{t}}}{e^{\theta\sqrt{t}}+e^{-\theta\sqrt{t}}}\delta(x+\sqrt{t})+\dfrac{e^{\theta\sqrt{t}}}{e^{\theta\sqrt{t}}+e^{-\theta\sqrt{t}}}\delta(x-\sqrt{t}) $