Could someone explain conditional independence?

The scenario you describe provides a good example for conditional independence, though you haven't quite described it as such. As the Wikipedia article puts it,

$R$ and $B$ are conditionally independent [given $Y$] if and only if, given knowledge of whether $Y$ occurs, knowledge of whether $R$ occurs provides no information on the likelihood of $B$ occurring, and knowledge of whether $B$ occurs provides no information on the likelihood of $R$ occurring.

In this case, $R$ and $B$ are the events of persons A and B getting home in time for dinner, and $Y$ is the event of a snow storm hitting the city. Certainly the probabilities of $R$ and $B$ will depend on whether $Y$ occurs. However, just as it's plausible to assume that if these two people have nothing to do with each other their probabilities of getting home in time are independent, it's also plausible to assume that, while they will both have a lower probability of getting home in time if a snow storm hits, these lower probabilities will nevertheless still be independent of each other. That is, if you already know that a snow storm is raging and I tell you that person A is getting home late, that gives you no new information about whether person B is getting home late. You're getting information on that from the fact that there's a snow storm, but given that fact, the fact that A is getting home late doesn't make it more or less likely that B is getting home late, too. So conditional independence is the same as normal independence, but restricted to the case where you know that a certain condition is or isn't fulfilled. Not only can you not find out about A by finding out about B in general (normal independence), but you also can't do so under the condition that there's a snow storm (conditional independence).

An example of events that are independent but not conditionally independent would be: You randomly sample two people A and B from a large population and consider the probabilities that they will get home in time. Without any further knowledge, you might plausibly assume that these probabilities are independent. Now you introduce event $Y$, which occurs if the two people live in the same neighbourhood (however that might be defined). If you know that $Y$ occurred and I tell you that A is getting home late, then that would tend to increase the probability that B is also getting home late, since they live in the same neighbourhood and any traffic-related causes of A getting home late might also delay B. So in this case the probabilities of A and B getting home in time are not conditionally independent given $Y$, since once you know that $Y$ occurred, you are able to gain information about the probability of B getting home in time by finding out whether A is getting home in time.

Strictly speaking, this scenario only works if there's always the same amount of traffic delay in the city overall and it just moves to different neighbourhoods. If that's not the case, then it wouldn't be correct to assume independence between the two probabilities, since the fact that one of the two is getting home late would already make it somewhat likelier that there's heavy traffic in the city in general, even without knowing that they live in the same neighbourhood.

To give a precise example: Say you roll a blue die and a red die. The two results are independent of each other. Now you tell me that the blue result isn't a $6$ and the red result isn't a $1$. You've given me new information, but that hasn't affected the independence of the results. By taking a look at the blue die, I can't gain any knowledge about the red die; after I look at the blue die I will still have a probability of $1/5$ for each number on the red die except $1$. So the probabilities for the results are conditionally independent given the information you've given me. But if instead you tell me that the sum of the two results is even, this allows me to learn a lot about the red die by looking at the blue die. For instance, if I see a $3$ on the blue die, the red die can only be $1$, $3$ or $5$. So in this case the probabilities for the results are not conditionally independent given this other information that you've given me. This also underscores that conditional independence is always relative to the given condition -- in this case, the results of the dice rolls are conditionally independent with respect to the event "the blue result is not $6$ and the red result is not $1$", but they're not conditionally independent with respect to the event "the sum of the results is even".


The example you've given (the snowstorm) is usually given as a case where you might think two events might be truly independent (since they take totally different routes home), i.e.

$p(A|B)=p(A)$.

However in this case they are not truly independent, they are "only" conditionally independent given the snowstorm i.e.

$p(A|B,Z) = p(A|Z)$.

A clearer example paraphrased from Norman Fenton's website: if Alice (A) and Bob (B) both flip the same coin, but that coin might be biased, we cannot say

$p(A=H|B=H) = p(A=H)$

(i.e. that they are independent) because if we see Bob flips heads, it is more likely to be biased towards heads, and hence the left probability should be higher. However if we denote Z as the event "the coin is biased towards heads", then

$p(A=H|B=H,Z)=p(A=H|Z)$

we can remove Bob from the equation because we know the coin is biased. Given the fact that the coin is biased, the two flips are conditionally independent.

This is the common form of conditional independence, you have events that are not statistically independent, but they are conditionally independent.

It is possible for something to be statistically independent and not conditionally independent. To borrow from Wikipedia: if $A$ and $B$ both take the value $0$ or $1$ with $0.5$ probability, and $C$ denotes the product of the values of $A$ and $B$ ($C=A\times B$), then $A$ and $B$ are independent:

$p(A=0|B=0) = p(A=0) = 0.5$

but they are not conditionally independent given $C$:

$p(A=0|B=0,C=0) = 0.5 \neq \frac{2}{3} = p(A=0|C=0)$


Other answers have provided great responses elaborating on the intuitive meaning of conditional dependence. Here, I won't add to that; instead I want to address your question about "what it does for us," focusing on computational implications.

There are three events/propositions/random variables in play, $A$, $B$, and $C$. They have a joint probability, $P(A,B,C)$. In general, a joint probability for three events can be factored in many different ways: \begin{align} P(A,B,C) &= P(A)P(B,C|A)\\ &= P(A)P(B|A)P(C|A,B) \;=\; P(A)P(C|A)P(B|A,C)\\ &= P(B)P(A,C|B)\\ &= P(B)P(A|B)P(C|A,B) \;=\; P(B)P(C|B)P(A|B,C)\\ &= P(C)P(A,B|C)\\ &= P(C)P(A|C)P(B|A,C) \;=\; P(C)P(B|C)P(A|B,C)\\ \end{align} Something to notice here is that every expression on the RHS includes a factor with three variables

Now suppose our information about the problem tells us that $A$ and $B$ are conditionally independent given $C$. A conventional notation for this is: $$ A \perp\!\!\!\perp B \,|\, C, $$ which means (among other implications), $$ P(A|B,C) = P(A|C). $$ This means that the last of the many expressions I displayed for $P(A,B,C)$ above can be written, $$ P(A,B,C) = P(C)P(B|C)P(A|C). $$ From a computational perspective, the key thing to note is that conditional dependence here means we can write the 3-variable function $P(A,B,C)$ in terms of 1-variable and 2-variable functions. In a nutshell, conditional independence means that joint distributions are simpler than they might have been. When there are lots of variables, conditional independence can imply grand simplifications of joint probabilities. And if (as is often the case) you have to sum or integrate over some of the variables, conditional independence can let you pull some factors through a sum/integral, simplifying the summand/integrand.

This can be very important for computational implementation of Bayesian inference. When we want to quantify how strongly some observed data, $D$, support rival hypotheses $H_i$ (with $i$ a label distinguishing the hypotheses), you are probably used to seeing Bayes's theorem (BT) in its "posterior $\propto$ prior times likelihood" form: $$ P(H_i|D) = \frac{P(H_i)P(D|H_i)}{P(D)}, $$ where the terms in the numerator are the prior probability for $H_i$ and the sampling (or conditional predictive) probability for $D$ (aka, the likelihood for $H_i$), and the term in the denominator is the prior predictive probability for $D$ (aka the marginal likelihood, since it is the marginal of $P(D,H_i)$). But recall that $P(H_i,D) = P(H_i)P(D|H_i)$ (in fact, one typically derives BT using this, and equating it to the alternative factorization). So BT can be written as $$ P(H_i|D) = \frac{P(H_i,D)}{P(D)}, $$ or, in words, $$ \mbox{Posterior} = \frac{\mbox{Joint for everything}}{\mbox{Marginal for observations}}. $$ In models with complex dependence structures, this turns out to be the easiest way to think of modeling: The modeler expresses the joint probability for the data and all hypotheses (possibly including latent parameters for things you don't know but need to know in order to predict the data). From the joint, you compute the marginal for the data, to normalize the joint to give you the posterior (you may not even need to do this, e.g., if you use MCMC methods that don't depend on normalization constants).

Now you can see the value of conditional independence. Since the starting point of computation is the joint for everything, anything you can do to simplify the expression for the joint (and its sums/integrals) can be a great help to computation. Probabilistic programming languages (e.g., BUGS, JAGS, and to some degree Stan) use graphical representations of conditional dependence assumptions to organize and simplify computations.