numpy covariance matrix

You have two vectors, not 25. The computer I'm on doesn't have python so I can't test this, but try:

z = zip(x,y)
np.cov(z)

Of course.... really what you want is probably more like:

n=100 # number of points in each vector
num_vects=25
vals=[]
for _ in range(num_vects):
    vals.append(np.random.normal(size=n))
np.cov(vals)

This takes the covariance (I think/hope) of num_vects 1xn vectors


Try this:

import numpy as np
x=np.random.normal(size=25)
y=np.random.normal(size=25)
z = np.vstack((x, y))
c = np.cov(z.T)

 Covariance matrix from samples vectors

To clarify the small confusion regarding what is a covariance matrix defined using two N-dimensional vectors, there are two possibilities.

The question you have to ask yourself is whether you consider:

  • each vector as N realizations/samples of one single variable (for example two 3-dimensional vectors [X1,X2,X3] and [Y1,Y2,Y3], where you have 3 realizations for the variables X and Y respectively)
  • each vector as 1 realization for N variables (for example two 3-dimensional vectors [X1,Y1,Z1] and [X2,Y2,Z2], where you have 1 realization for the variables X,Y and Z per vector)

Since a covariance matrix is intuitively defined as a variance based on two different variables:

  • in the first case, you have 2 variables, N example values for each, so you end up with a 2x2 matrix where the covariances are computed thanks to N samples per variable
  • in the second case, you have N variables, 2 samples for each, so you end up with a NxN matrix

About the actual question, using numpy

if you consider that you have 25 variables per vector (took 3 instead of 25 to simplify example code), so one realization for several variables in one vector, use rowvar=0

# [X1,Y1,Z1]
X_realization1 = [1,2,3]

# [X2,Y2,Z2]
X_realization2 = [2,1,8]

numpy.cov([X,Y],rowvar=0) # rowvar false, each column is a variable

Code returns, considering 3 variables:

array([[ 0.5, -0.5,  2.5],
       [-0.5,  0.5, -2.5],
       [ 2.5, -2.5, 12.5]])

otherwise, if you consider that one vector is 25 samples for one variable, use rowvar=1 (numpy's default parameter)

# [X1,X2,X3]
X = [1,2,3]

# [Y1,Y2,Y3]
Y = [2,1,8]

numpy.cov([X,Y],rowvar=1) # rowvar true (default), each row is a variable

Code returns, considering 2 variables:

array([[ 1.        ,  3.        ],
       [ 3.        , 14.33333333]])