Confused by Kullback-Leibler on conditional probability distributions

It depends on whether you are conditioning on a random variable or an event.

Given a random variable $x$,

$$ \operatorname{KL}[p(y \mid x) \,\|\, q(y \mid x)] \doteq \iint p(\bar{x},\bar{y}) \ln\frac{p(\bar{y} \mid \bar{x})}{q(\bar{y} \mid \bar{x})} \mathrm{d}\bar{x} \mathrm{d}\bar{y} \quad\text{or}\quad \sum_{\bar{x}}\sum_{\bar{y}} p(\bar{x},\bar{y}) \ln\frac{p(\bar{y} \mid \bar{x})}{q(\bar{y} \mid \bar{x})}. $$

Given an event $\bar{x}$,

$$ \operatorname{KL}[p(y \mid \bar{x}) \,\|\, q(y \mid \bar{x})] \doteq \int p(\bar{y}|\bar{x}) \ln\frac{p(\bar{y} \mid \bar{x})}{q(\bar{y} \mid \bar{x})} \mathrm{d}\bar{y} \quad\text{or}\quad \sum_{\bar{y}} p(\bar{y}|\bar{x}) \ln\frac{p(\bar{y} \mid \bar{x})}{q(\bar{y} \mid \bar{x})}. $$

Note how conditioning on an event is equivalent to changing the probability distribution over its variable to a point mass. This is what turns the joint into a conditional above,

$$ p'(x,y) \doteq p(y|x)\delta_{\bar{x}}(x)=p(y|\bar{x}). $$

To be more explicit, you can also choose instead of the KL conditioned on a random variable to use an expectation over event of the KL conditioned on those event,

$$ \operatorname{KL}[p(y \mid x) \,\|\, q(y \mid x)] =\operatorname{E}_{\bar{x}\sim p(x)}\big[ \operatorname{KL}[p(y \mid \bar{x}) \,\|\, q(y \mid \bar{x})] \big]. $$

Mixing up random variables and event is quite common but it's often easy to know from the context which is meant.


I don't quite see what confuses you. Think about how we compute, for example, a conditional expectation: $E(Z \mid X)=\sum_Z P(Z \mid X) $ : that is, we sum only over $Z$, and the result is a function of the conditioning variable $X$. (Put in other way, your each value of $X$ we have that $P(Z \mid X=x)$ is a different probability distribution - and hence for each value of $X$ we have different values of the (conditioned to $X=x$) expectation, variance, etc). The same happens here. And the conditioned KL divergence is not a number, but a function of $X$.