scipy - generate random variables with correlations

It seems like a rejection-based sampling method such as the Metropolis-Hastings algorithm is what you want. Scipy can implement such methods with its scipy.optimize.basinhopping function.

Rejection-based sampling methods allow you to draw samples from any given probability distribution. The idea is that you draw random samples from another "proposal" pdf that is easy to sample from (such as uniform or gaussian distributions) and then use a random test to decide if this sample from the proposal distribution should be "accepted" as representing a sample of the desired distribution.

The remaining tricks will then be:

  1. Figure out the form of the joint N-dimensional probability density function which has marginals of the form you want along each dimension, but with the correlation matrix that you want. This is easy to do for the Gaussian distribution, where the desired correlation matrix and mean vector is all you need to define the distribution. If your marginals have a simple expression, you can probably find this pdf with some straightforward-but-tedious algebra. This paper cites several others which do what you are talking about, and I'm certain that there are many more.

  2. Formulate a function for basinhopping to minimize such that it's accepted "minima" amount to samples of this pdf you have defined.

Given the results of (1), (2) should be straightforward.


If you just want correlation through a Gaussian Copula (*), then it can be calculated in a few steps with numpy and scipy.

  • create multivariate random variables with desired covariance, numpy.random.multivariate_normal, and creating a (nobs by k_variables) array

  • apply scipy.stats.norm.cdf to transform normal to uniform random variables, for each column/variable to get uniform marginal distributions

  • apply dist.ppf to transform uniform margin to the desired distribution, where dist can be one of the distributions in scipy.stats

(*) Gaussian copula is only one choice and it is not the best when we are interested in tail behavior, but it is the easiest to work with for example http://archive.wired.com/techbiz/it/magazine/17-03/wp_quant?currentPage=all

two references

https://stats.stackexchange.com/questions/37424/how-to-simulate-from-a-gaussian-copula

http://www.mathworks.com/products/demos/statistics/copulademo.html

(I might have done this a while ago in python, but don't have any scripts or function right now.)


If you have already a positive semi-definite correlation matrix R [n x n], it's easy to build a NormalCopula taking R as input. I'll show you an example with n = 3. The code is based on OpenTURNS library.

import openturns as ot

# you can replace this part by your matrix
dim = 3
R = ot.CorrelationMatrix (dim)
R[0,1] = 0.25
R[0,2] = 0.6
R[1,2] = 0.9

copula = ot.NormalCopula(R)

Should you like to get a sample of size, just write

size = 5
print(copula.getSample(size))
>>>    [ X0       X1       X2       ]
0 : [ 0.355353 0.76205  0.632379 ]
1 : [ 0.902567 0.984443 0.989552 ]
2 : [ 0.423219 0.811016 0.754304 ]
3 : [ 0.303776 0.471557 0.450188 ]
4 : [ 0.746168 0.918729 0.891347 ]

EDIT - Following the comment of @Michael_Baudin

Of course, if you want to set the marginal distributions as e.g. Beta and LogNormal marginals, its also possible:

X0 = ot.LogNormal(0.1, 1, 0)
X1 = ot.Beta()
X2 = ot.Uniform(1.0, 2.0)
distribution = ot.ComposedDistribution([X0,X1,X2], Original_copula)
print(distribution.getSample(size))
>>> [ X0         X1         X2         ]
0 : [  3.97678    0.158823   1.75635   ]
1 : [  1.18929   -0.554092   1.18952   ]
2 : [  2.59542    0.0751359  1.68599   ]
3 : [  1.33363   -0.18407    1.42241   ]
4 : [  1.34084    0.198019   1.6553    ]