Why is covariance something to care about?

The previous answer gives an interesting description of the use of covariance in portfolio theory. I would like to add some details about the reasoning behind the concept and definition of covariance, its advantages and drawbacks, its possible "geometric" interpretation, and its principal applications. I hope that the following explanations and examples could help you to better interpret covariance and its relevance.

As already mentioned, variance is a measure that quantifies how much two real-valued random variables "vary together", i.e. how changes in one variable are associated with changes in another variable. Its meaning is somewhat similar - albeit different, as explained below - to that of statistical correlation. The definition of covariance for two random variables $X $ and $Y $ is

$$Cov (X,Y)= E \left ([X-E (X)][Y-E (Y)] \right)$$

where $E(z)$ denotes the expected value of $z $.

The interpretation of this definition is as follows. The quantities $ X-E (X)$ and $Y-E (Y)$ represent the deviations from the means for the single points of the dataset. If two variables have a positive relationship, that is to say higher or lower values of the first tend to be associated with correspondingly higher or lower values of the second, then for most items the two deviations will show a concordant sign, and then a positive product. The result will be a positive average value of these products, i.e. a positive covariance. Similarly, if two variables vary together via a negative relationship, then the two deviations from the mean will show a discordant sign, and then a negative product, for most items. The result will be a negative average value of these products, i.e. a negative covariance.

On the other hand, if the two variables are poorly correlated, then the two deviations will show a concordant sign in some items and a discordant sign in other items. The products will then be in part positive and in part negative. This finally results in a relatively small average value of these products, i.e. a relatively small covariance. The extreme case occurs when the two variables are independent. In this case the covariance is zero. This can be shown by observing that, expanding the product, the covariance expression reported above can be written as

$$E \left ([X-E (X)][Y-E (Y)] \right)=E (X Y) - E (X) E (Y) $$

Because under independence $E (X Y) = E (X) E (Y) $, the covariance for independent variables is zero. Also note that the inverse is not true, that is to say zero covariance does not imply independence. Classical examples of this are, for instance, $XY $ datasets forming a circle or a square: here the covariance is zero, but the variables are clearly dependent.

A good way to understand the definition of covariance from a geometric point of view - as asked in the OP - might be to consider a generic $XY $- scatterplot of data, drawing a horizontal line corresponding to $E (Y)$ and a vertical line corresponding to $E (X) $. To simplify, let us transpose the whole scatterplot so that these two lines coincide with the $x $-axis and the $y $-axis, respectively. Now if we take a point $(X_i,Y_i ) $ of the scatterplot and draw, from this point, the two perpendicular distances to the axes, we get a rectangle whose area is equal to the product $|(X_i-E (X))(Y_i-E (Y))|$. In particular, if the rectangle is in the first or third quadrant, the product is positive and is equal to the rectangle area; if the rectangle is in the second or fourth quadrant, the product is equal to the negative of the rectangle area. Repeating this for all points of the scatterplot, we create a set of rectangles. The average area of these rectangles (calculated considering as positive the areas of the those in the first or third quadrant, and as negative the areas of the those in the second or fourth quadrant) is a geometric equivalent of the covariance. For example, if a dataset is closely distributed around the $Y=2X $ line, most rectangles will be drawn in the first and third quadrant, so that their average area, as well as the covariance, will be positive. If a dataset is closely distributed around the $Y=-2X $ line, most rectangles will be drawn in the second and fourth quadrant, so that their average area, as well as the covariance, will be negative. On the other hand, if a dataset tends to be dispersed around the origin with no linear trend, the rectangles will be drawn in all quadrants. In this case, we will have to sum a more balanced amount of positive and negative quantities, finally leading to a smaller average area and then a smaller covariance.

The examples above are also useful to understand two key points regarding the meaning of covariance. The first is that covariance, as a measure of correlation, is $\textbf{not scaled}$, and then is strictly affected by the data ranges. As such, the sign of the covariance gives us the direction of the potential relationship (positive or negative) between the two variables, but tell us nothing about the strength of the relationship. The scaled version of the covariance is the statistical correlation, which is obtained by dividing the covariance to the product of the SD of the two variables. Compared to covariance, the statistical correlation is a better measure to express the strength of the relationship: it standardizes the amount of interdependence between the two variables, thus quantifying how closely the two variables move (in this regard, also note that the dimensional unit of covariance is the product of the dimensional units of the two variables, whereas the correlation is dimensionless). Accordingly, two variables with a given degree of correlation can show a large or small covariance, depending on the range of the data. For example, a $XY $-dataset formed by the points $$(-5,-5), (1,1), (4,4)$$ and another dataset formed by the points $$(-500,-500), (100,100), (400,400)$$ clearly have both a perfect correlation equal to $1$, but the covariance is $14$ in the first case and $140,000$ in the second case. Therefore, the covariance sign has a more definite meaning than its magnitude: a positive covariance implies that the variables are positively related, while a negative covariance implies that the variables are negatively related.

The second point is that covariance is a $\textbf{measure of linearity}$. This means that the sign of the covariance gives us information only about the tendency in the linear relationship between the variables, but tells us little about the existence of nonlinear relationships.

Despite these limitations, there are several scenarios and applications in which one might be interested in calculating covariance. Among these, there are the followings:

  • problems where we need to determine the variance of the sum of two random variables, since

$$var (X+Y)=var (X)+var (Y)+2 cov (X,Y) $$

  • in the context of data embedding/dimensionality reduction procedures, where the covariance between variables in a given dataset can be useful to unmask a lower dimension space that can still capture most of the variance in the data. This is typically performed by combining variables that are highly correlated (i.e. have high covariance), to mimimize the loss of information. A classical example of this application is principal component analysis, a statistical procedure commonly used to convert a set of observations of potentially correlated variables into a smaller set of linearly uncorrelated variables (defined principal components);

  • in all cases where we need to use a covariance matrix. Given two vectors ${\displaystyle X=(x_{1},\dots ,x_{n})}$ and ${\displaystyle Y=(y_{1},\dots ,y_{m})} $ of random variables, a covariance matrix is a $n \times m$ matrix whose term in the $(i, j) $ position is the covariance ${\displaystyle \operatorname {cov} (x_{i},y_{j})}$. An example of this application is the standard canonical-correlation analysis, a statistical procedure aimed at finding linear combinations of the $X_i$ and $Y_j$ variables, which have maximum correlation with each other;

  • in genomic sciences, for the computational assessment of similarity across DNA or RNA sequencing datasets. These comparative sequence analyses are often applied, for example, to test the reproducibility of biological or technical replicates, or to detect highly conserved DNA regions across species;

  • in economics, e.g. in portfolio theory (already well described in the previous answer). Simplifying, for example covariance calculations can give investors important insight into how two stocks could move together in the future.The behaviour of historical prices is useful to assess whether the prices tend to move with each other or opposite each other. This allows you to predict the potential price movement of a two-stock portfolio.

This list is clearly not exhaustive, but I hope that it could give to you an idea of the wide applications of covariance.


The covariance helps you compute the variance of a linear combination of random variables. Given two random variables $X_1$ and $X_2$ with variances $\sigma_1^2$ and $\sigma_2^2$ and covariance $\sigma_{12}$, you can compute the variance of $c_1 X_1 + c_2 X_2$ as $c_1^2 \sigma_1^2 + c_2^2 \sigma_2^2 + 2 c_1 c_2 \sigma_{12}$.

One application is in portfolio theory. Suppose there are $n$ stocks. Each stock (or investment) $i$ has a expected return value $\mu_i$ and variance $\sigma_i^2$. Typically, the larger the expected return, the larger the variance. Stocks also have covariances. Suppose stocks $i$ and $j$ have covariance $\sigma_{ij}$. Stocks of firms in the same business (like two oil companies) have positive covariance since if the oil business becomes more profitable, both stocks increase in value. Some companies have negative covariance, such as an oil firm and a solar panel manufacturer: if countries transfer from oil to solar power, the stock value of the solar panel manufacturer goes up while the oil stocks go down, and vice versa. Now suppose you want to buy a portfolio of stocks. If you purchase $x_i$ units of stock $i$, then the expected value of your portfolio is $\sum_{i=1}^n \mu_i x_j$ and the variance is $\sum_{i=1}^n \sum_{j=1}^n \sigma_{ij} x_i x_j$ (using the notation $\sigma_{ii}$ for $\sigma_i^2$). So, the covariance helps you compute the variance of your portfolio. Modern portfolio theory is based around these formulas.