How to employ something such as openMP in Cython?

This question is from 3 years ago and nowadays Cython has available functions that support the OpenMP backend. See for example the documentation here. One very convenient function is the prange. This is one example of how a (rather naive) dot function could be implemented using prange.

Don't forget to compile passing the "/opemmp" argument to the C compiler.

import numpy as np
cimport numpy as np
import cython
from cython.parallel import prange

ctypedef np.double_t cDOUBLE
DOUBLE = np.float64

def mydot(np.ndarray[cDOUBLE, ndim=2] a, np.ndarray[cDOUBLE, ndim=2] b):

    cdef np.ndarray[cDOUBLE, ndim=2] c
    cdef int i, M, N, K

    c = np.zeros((a.shape[0], b.shape[1]), dtype=DOUBLE)
    M = a.shape[0]
    N = a.shape[1]
    K = b.shape[1]

    for i in prange(M, nogil=True):
        multiply(&a[i,0], &b[0,0], &c[i,0], N, K)

    return c

@cython.wraparound(False)
@cython.boundscheck(False)
@cython.nonecheck(False)
cdef void multiply(double *a, double *b, double *c, int N, int K) nogil:
    cdef int j, k
    for j in range(N):
        for k in range(K):
            c[k] += a[j]*b[k+j*K]

If somebody stumbles over this question:

Now, there is direct support for OpenMP in cython via the cython.parallel module, see http://docs.cython.org/src/userguide/parallelism.html