Scipy: Speeding up calculation of a 2D complex integral

You can gain a factor of about 10 in speed by using Cython, see below:

In [87]: %timeit cythonmodule.doit(lam=lam, y0=y0, zxp=zxp, z=z, k=k, ra=ra)
1 loops, best of 3: 501 ms per loop
In [85]: %timeit doit()
1 loops, best of 3: 4.97 s per loop

This is probably not enough, and the bad news is that this is probably quite close (maybe factor of 2 at most) to everything-in-C/Fortran speed --- if using the same algorithm for adaptive integration. (scipy.integrate.quad itself is already in Fortran.)

To get further, you'd need to consider different ways to do the integration. This requires some thinking --- can't offer much from the top of my head now.

Alternatively, you can reduce the tolerance up to which the integral is evaluated.

# Do in Python
#
# >>> import pyximport; pyximport.install(reload_support=True)
# >>> import cythonmodule

cimport numpy as np
cimport cython

cdef extern from "complex.h":
    double complex csqrt(double complex z) nogil
    double complex cexp(double complex z) nogil
    double creal(double complex z) nogil
    double cimag(double complex z) nogil

from libc.math cimport sqrt

from scipy.integrate import dblquad

cdef class Params:
    cdef public double lam, y0, k, zxp, z, ra

    def __init__(self, lam, y0, k, zxp, z, ra):
        self.lam = lam
        self.y0 = y0
        self.k = k
        self.zxp = zxp
        self.z = z
        self.ra = ra

@cython.cdivision(True)
def integrand_real(double x, double y, Params p):
    R1 = sqrt(x**2 + (y-p.y0)**2 + p.z**2)
    R2 = sqrt(x**2 + y**2 + p.zxp**2)
    return creal(cexp(1j*p.k*(R1-R2)) * (-1j*p.z/p.lam/R2/R1**2) * (1+1j/p.k/R1))

@cython.cdivision(True)
def integrand_imag(double x, double y, Params p):
    R1 = sqrt(x**2 + (y-p.y0)**2 + p.z**2)
    R2 = sqrt(x**2 + y**2 + p.zxp**2)
    return cimag(cexp(1j*p.k*(R1-R2)) * (-1j*p.z/p.lam/R2/R1**2) * (1+1j/p.k/R1))

def ymax(double x, Params p):
    return sqrt(p.ra**2 + x**2)

def doit(lam, y0, k, zxp, z, ra):
    p = Params(lam=lam, y0=y0, k=k, zxp=zxp, z=z, ra=ra)
    rr, err = dblquad(integrand_real, -ra, ra, lambda x: -ymax(x, p), lambda x: ymax(x, p), args=(p,))
    ri, err = dblquad(integrand_imag, -ra, ra, lambda x: -ymax(x, p), lambda x: ymax(x, p), args=(p,))
    return rr + 1j*ri

Have you considered multiprocessing (multithreading)? It seems that you don't have a need to do a final integration (over the whole set) so simple parallel processing might be the answer. Even if you did have to integrate, you can wait for running threads to finish computation before doing the final integration. That is, you can block the main thread until all workers have completed.

http://docs.python.org/2/library/multiprocessing.html