From correlation coefficient to conditional probability

As a first approximation:

We examine a pair of firms and pick one at random. Let $A$ be the event that the firm is more successful and $B$ the event that the firm has the stronger CEO. We are after $\mathsf P(A\mid B)$, the probability that a firm is the more successful given that it has a stronger CEO.

The measure of correlation is by definition:$$\begin{align}\rho ~=~ & \dfrac{\mathsf P(A \cap B)~-~\mathsf P(A)~\mathsf P(B)}{\sqrt{~\mathsf P(A)~(1-\mathsf P(A))~\mathsf P(B)~(1-\mathsf P(B))~}}\\[2ex] ~=~ & \dfrac{(\mathsf P(A \mid B)-1)~\mathsf P(B)}{\sqrt{~\mathsf P(A)~(1-\mathsf P(A))~\mathsf P(B)~(1-\mathsf P(B))~}}\end{align}$$

Now, half of every pair will have a stronger CEO, and half of every pair will be the more successful; just not necessarily the same half. So $\mathsf P(A)=\tfrac 12, \mathsf P(B)=\tfrac 12$ and hence:

$$\begin{align}\rho ~=~& 2~\mathsf P(A \mid B)-1 \\[2ex] \mathsf P(A\mid B) ~=~ & \dfrac{1+\rho}{2} \\[1ex] ~=~& \dfrac{1+0.30}{2} \\[1ex] ~=~& 0.65\end{align}$$

More reasonably:

We might consider the successfulness, and the strength of the CEO, of any company $i$ to be jointly bivariate normal random variables ($A_i, B_i$) with identical though dependent distributions. Then for any pair of companies $(i,j)$ we are looking for $\mathsf P(A_i>A_j \mid B_i>B_j)$

This would be obtained through a similar, but slightly more involved, procedure.

PS:

Let $\mathbf 1_A$ be the indicator random variable that event $A$ occurs. $$\begin{split}\mathsf E(\mathbf 1_A) &= 1\mathsf P(A)+ 0\mathsf P(A^\complement)\\ &= \mathsf P(A)\\[2ex]\mathsf {Var}(\mathbf 1_A)& = \mathsf E(\mathbf 1_A^2)-\mathsf E(\mathbf 1_A)^2\\&= 1^2\mathsf P(A)-1\mathsf P(A)^2\\&= \mathsf P(A)(1-\mathsf P(A))\\[2ex] \mathsf {Cov}(\mathbf 1_A,\mathbf 1_B) &= \mathsf E(\mathbf 1_A\mathbf 1_B)-\mathsf E(\mathbf 1_A)\mathsf E(\mathbf 1_B)\\&= \mathsf E(\mathbf 1_{A\cap B})-\mathsf E(\mathbf 1_A)\mathsf E(\mathbf 1_B)\\&= \mathsf P(A\cap B)-\mathsf P(A)\mathsf P(B)\end{split}$$

For a randomly selected firm let $X$ be the CEO quality and $Y$ be the firm success (however these are measured). The author's assertion follows under the assumption that $(X,Y)$ has a bivariate normal distribution with correlation $\rho=0.3$.

If $(X_1,Y_1)$ and $(X_2,Y_2)$ are measured for independently selected firms, then the difference $(X_1-X_2,Y_1-Y_2)$ is also bivariate normal, with mean zero and the same correlation $\rho$. Kahneman's claim is that $P(Y_1>Y_2\mid X_1>X_2)\approx 0.6$. This follows from a fact(*) about bivariate normal variables:

If $(A,B)$ are bivariate normal with means $\mu_A$ and $\mu_B$ respectively, and correlation $\rho$, then $$P(B>\mu_B\mid A>\mu_A)=\frac12 + \frac{\arcsin\rho}\pi.$$

If $\rho=0.3$ the RHS works out to $0.59698668$.

(*) The fact can be deduced from this result.

I suspect that the key assumption is a bivariate normal distribution

Suppose $X$ (the difference in the CEO quality) and $Y$ (the difference in the firm success) have a jointly bivariate normal distribution with zero means, and a correlation of $0.3$ (for example they could each have variance $1$ and a covariance between them of $0.3$)

In that case, the probability that $X$ and $Y$ had the same sign would be about $0.597$, close enough to $0.6$. Here is a simulation in R illustrating the point:

> set.seed(1)
> cases <- 1000000
> correl <- 0.3
> A <- rnorm(cases)
> B <- rnorm(cases)
> C <- rnorm(cases)
> X <- sqrt(correl)*A + sqrt(1-correl)*B 
> Y <- sqrt(correl)*A + sqrt(1-correl)*C
> cor(X,Y)
[1] 0.2998031
> mean(X*Y > 0) 
[1] 0.597309

By contrast if $X$ and $Y$ each took the values $1$ and $-1$ with probabilities of $\frac12$ and with correlation between them of $0.3$, I suspect the probability that $X$ and $Y$ had the same sign would be $0.65$.

From correlation coefficient to conditional probability

Tags:

Statistics

Probability

Related

Recent Posts