Multiplying Numpy/Scipy Sparse and Dense Matrices Efficiently

The reason the dot product runs into memory issues when computing r = dot(C,Y) is because numpy's dot function does not have native support for handling sparse matrices. What is happening is numpy thinks of the sparse matrix C as a python object, and not a numpy array. If you inspect on small scale you can see the problem first hand:

>>> from numpy import dot, array
>>> from scipy import sparse
>>> Y = array([[1,2],[3,4]])
>>> C = sparse.csr_matrix(array([[1,0], [0,2]]))
>>> dot(C,Y)
array([[  (0, 0)    1
  (1, 1)    2,   (0, 0) 2
  (1, 1)    4],
  [  (0, 0) 3
  (1, 1)    6,   (0, 0) 4
  (1, 1)    8]], dtype=object)

Clearly the above is not the result you are interested in. Instead what you want to do is compute using scipy's sparse.csr_matrix.dot function:

r = sparse.csr_matrix.dot(C, Y)

or more compactly

r = C.dot(Y)

Try:

import numpy as np
from scipy import sparse

f = 100
n = 300000

Y = np.random.rand(n, f)
Cdiag = np.random.rand(n) # diagonal of C
Cdiag[np.random.rand(n) < 0.99] = 0

# Compute Y.T * C * Y, skipping zero elements
mask = np.flatnonzero(Cdiag)
Cskip = Cdiag[mask]

def ytcy_fast(Y):
    Yskip = Y[mask,:]
    CY = Cskip[:,None] * Yskip  # broadcasting
    return Yskip.T.dot(CY)

%timeit ytcy_fast(Y)

# For comparison: all-sparse matrices
C_sparse = sparse.spdiags([Cdiag], [0], n, n)
Y_sparse = sparse.csr_matrix(Y)
%timeit Y_sparse.T.dot(C_sparse * Y_sparse)

My timings:

In [59]: %timeit ytcy_fast(Y)
100 loops, best of 3: 16.1 ms per loop

In [18]: %timeit Y_sparse.T.dot(C_sparse * Y_sparse)
1 loops, best of 3: 282 ms per loop

First, are you really sure you need to perform a full matrix inversion in your problem ? Most of the time, one only really need to compute x = A^-1 y which is a much easier problem to solve.

If this is really so, I would consider computing an approximation of the inverse matrix instead of the full matrix inversion. Since matrix inversion is really costly. See for example the Lanczos algorithm for an efficient approximation of the inverse matrix. The approximation can be stored sparsely as a bonus. Plus, it requires only matrix-vector operations so you don't even have to store the full matrix to inverse.

As an alternative, using pyoperators, you can also use to .todense method to compute the matrix to inverse using efficient matrix vector operations. There is a special sparse container for diagonal matrices.

For an implementation of the Lanczos algorithm, you can have a look at pyoperators (disclaimer: I am one of the coauthor of this piece of software).