Expected distance between two uniform points in distinct rectangles

I tried to implement my proposal in a C-code. That is a mixture of analytic and numeric integration. It does $10^6$ rectangles with half-percent relative precision in about 16 seconds, which is a bit better than the corresponding Iosif's 30 minutes. You can play with parameters to trade speed for precision and vice versa too. The code should be self-explanatory but feel free to ask questions if something is unclear.

Edit: This is the best and the fastest version. $n$ is gone now and the guaranteed relative precision is $1/N^2$ (the constant $1$ is correct, so if you want $10^{-3}$ accuracy (to compare with Mathematica time), just set $N=34$ and get $10^6$ pairs in under 10 seconds. The time is essentially proportional to $N$. For $10^{-5}$ accuracy $N=340$ and 83 seconds suffice. I'll explain the algorithm a bit later; now it makes sense :-)

Edit 2: The outline of the algorithm.

We shall use the averaging over the projections. If we take the discrete set of $N$ equally spaced lines $L_j$, then the approximate formula is $$ |z|\approx \frac{\pi}{2}\frac 1N\sum_{j=1}^N |P_j z| $$
where $P_j$ is the orthogonal projection operator to the line $L_j$. The relative accuracy of this approximation can be easily computed and is, as I said, $1\pm N^{-2}$. The computation of the average projection is going to be exact.

For each projection, we need to evaluate the convolution of $A(z)=|z|$ with four normalized characteristic functions $F_j$ of intervals $[-U_j,U_j]$ at some point $x$. We arrange $U_j$ in the increasing order, so that $U_0<U_1<U_2<U_3$ and do the honest convolution of the absolute value with the third and the fourth function, so we have an explicit formula for $A*F_2*F_3$, which is a cubic spline with partition points $\pm U_3\pm U_2$. The convolution $F_0*F_1$ is just a linear spline, which, when shifted to $x$, has the partition points $x\pm U_0\pm U_1$. We thus need to integrate the product of the two splines, which is the fourth degree spline with known partition points. This is done by arranging the partition points in the increasing order and applying the 3-node Gauss quadrature on each partition interval in the support of $F_0*F_1$. That's it.

I tried to implement it in the fastest way possible, so some parts may look a bit strange. The function $gghh()$ is essentially the product of $A*F_3*F_2$ and $F_1*F_0$, the function $F()$ does the integration job over a single partition interval (up to a constant) and $D()$ takes care of setting the projections and determining the partition intervals of interest. However, once the idea of the program is clear, you can certainly try to see if your code writing skills are better than mine :-)

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <time.h>

const double pi=3.141592653589, ppi=pi/57.6, pi2=pi/2, dl=sqrt(0.6)/2;

double gghh(double a, double b, double c, double d, double x, double t)
{
double y=fabs(t), g=y*a, h=2*d;
if(y<=a-b) g=(a*a+y*y+b*b/3)*0.5; 
else if(y<a+b) {double z=a+b-y; g+=z*z*z/(12*b);}

y=fabs(t-x); 
if(y>c-d) h-=(y-c+d);
return g*h;
}

double F(double a, double b, double c, double d, double x, double aa, double bb)
{
double t2=(aa+bb)*0.5, bbaa=bb-aa, dt=dl*bbaa;
return bbaa*(gghh(a,b,c,d,x,t2-dt)+gghh(a,b,c,d,x,t2+dt)+1.6*gghh(a,b,c,d,x,t2))/(a*c*d);
}

double D(double a1,double b1, double c1, double d1, double a2,double b2, double c2, double d2, int N)
{
double s=0.0;
double X1=b1-a1, Y1=d1-c1, X2=b2-a2, Y2=d2-c2, S1=(a2+b2-a1-b1), S2=(c2+d2-c1-d1);

double t0=pi2/N, cs=cos(t0), ss=sin(t0), dcs=2*cs*cs-1, dss=2*cs*ss; 
double SS=fabs(S1)+fabs(S2)+fabs(X1)+fabs(X2)+fabs(Y1)+fabs(Y2);
SS*=0.00000001;
for(int k=0; k<N;++k)
{ 
double csnew=cs*dcs-ss*dss;
ss=ss*dcs+cs*dss; cs=csnew;
double U[4]={fabs(X1*cs)+SS, fabs(Y1*ss)+SS, fabs(X2*cs)+SS, fabs(Y2*ss)+SS};
double x=-fabs(S1*cs+S2*ss);


for(int kk=0;kk<3;++kk)
{
int kkk=3-kk;
for(int j=0;j<kkk;++j)
if(U[j]>U[j+1]) {double u=U[j]; U[j]=U[j+1]; U[j+1]=u;}
}


double U0=U[0], U1=U[1], U2=U[2], U3=U[3];

double V[4]={-U3-U2,-U3+U2,U3-U2,U3+U2}, 
VV[4]={x-U1-U0,x-U1+U0,x+U1-U0,x+U1+U0};

double W[8]; 
int i=0, ii=0, kstart=-1, kfinish=-1;
while(ii<4)
{
++kfinish; 
if(V[i]<VV[ii]) {W[kfinish]=V[i]; ++i;}
else {W[kfinish]=VV[ii]; if(ii==0) kstart=kfinish; ++ii;} 
}

for(int kk=kstart;kk<kfinish;++kk)
s+=F(U3,U2,U1,U0,x,W[kk],W[kk+1]);
}
return ppi*s/N;
}



double unitrand()
{
return (rand()+0.0)/RAND_MAX;
}


int main()
{
time_t now=time(0);
srand(now); 

int N=1000;

double m=100,M=0;

for(int k=0; k<1000000;++k)
{
if(k%10000==0) {printf("%d %.12f %.12f\n",k/10000,m,M);}
double 
a1=unitrand(),b1=a1+unitrand(),
a2=unitrand(),b2=a2+unitrand(),
c1=unitrand(),d1=c1+unitrand(),
c2=unitrand(),d2=c2+unitrand();

double r=D(a1,b1,c1,d1,a2,b2,c2,d2,N);

if(k%1000==0)
{
r/=D(a1,b1,c1,d1,a2,b2,c2,d2,600);
if(r<m) m=r;
if(r>M) M=r;
}
}
printf("\n%.12f",D(1,2,3,5,4,6,7,8,4000));
printf("\n%.12f",D(1,2,3,5,4,6,7,8,N)/D(1,2,3,5,4,6,7,8,2000)-1);
printf("\n%.12f",D(0,2,0,2,0,2,0,2,N)/D(0,2,0,2,0,2,0,2,2000)-1);
printf("\n%.12f",D(0,3,0,0.0001,0,3,0,0.0001,N)-1);
printf("\n%.12f",D(0,2,0,0,0,0,0,2,N)/D(0,2,0,0,0,0,0,2,2000)-1);
printf("\n%.12f",D(0,0,0,0,3,3,4,4,N)/5-1);
return 0;
}

Let $r=\frac12\sqrt{(a_1-a_2+b_1-b_2)^2 + (c_1-c_2+d_1-d_2)^2}$, which is the distance between the centers of the rectangles.

Then the distance between $(x_1,y_1)$ and $(x_2,y_2)$ is $$\frac{r}{2}+\frac{(x_1-x_2)^2+(y_1-y_2)^2}{2r}+O(((x_1-x_2)^2+(y_1-y_2)^2)^2)$$

The expected value of the constant and first-order terms simplifies to $$r+\frac{(a_1-b_1)^2+(a_2-b_2)^2+(c_1-d_1)^2+(c_2-d_2)^2}{24r}$$

As an example, the expected distance between points in $[1,2]\times[3,5]$ and $[4,6]\times[7,8]$ is actually $4.99$, and this approximation gives $5.03$. Perhaps that's good enough; it would depend on the particular parameters you have in mind.


I see no problem here with just using cubature formulas; see e.g. this paper and references there.

Mathematica computes $10^3$ expectations like this in about 2 sec. So, you can expect $10^5$ expectations like this to be computed in about 3 min.

Here is an image of the corresponding Mathematica notebook (click on the image to magnify it):

enter image description here