A quick algorithm for calculating the $\ell_1$-distance between two finite sets on the real line?

Of course one does not want brute force, but the case count is not as bad as suggested. I think a reasonable upper bound on the number of cases might be (a shift of the) the Padovan Sequence $1,1,1,2,2,3,4,5,7,12,\cdots$ with $p_n=p_{n-2}+p_{n-3}$ which is $O(\theta^n)$ for $\theta\approx 1.3245\cdots$ the unique real root of $t^3-t-1$. This is based on the idea that the hardest case is if the points are alternately in $A,B,A,B,\cdots.$

I will sketch this and give similar bounds. Consider the $n-1$ intervals $I_j=[x_j,x_{j+1}]$ (with the points $x_1 \lt x_2\lt \cdots)$ Then $\Gamma$ is one of the $2^{n-1}$ ways to pick some of those sub-intervals. If $A,B$ are disjoint that falls to $2^{n-3}$ since we can't omit $I_1$ nor $I_{n-1}$ and in fact to $F_{n-2} \lt 2^{n-3}$ since we can't omit two in a row. Of course there is no advantage to keeping all the intervals. We haven't taken into account $A$ and $B.$ Here is why strick alternation is harder than the other cases. Given a run $x_{i+1},\cdots, x_j$ all in $A$ but not $B,$ We can't omit more than one of $I_i,\cdots I_j$ and we should omit one. If there is one of $I_2,\cdots,I_{j-1}$ which is of the maximal length we should omit it. Otherwise the choices are the longest of those (which is second or third longest) or $I_1$ or $I_j$. We may need to see the other choices first before deciding on the second or third longest.) A point in $A \cap B$ splits the problem into two almost disjoint ones.

Assuming the $n$ points $x_1,x_2,x_3,\cdots,x_n$ do alternate $A,B,A,B,\cdots,$ let $c_n$ be the number of possible cases for $\Gamma$ where we assume nothing about the actual values of the $x_i$ except that they increase. Then $c_2=c_3=c_4=1.$ I will now observe that $c_n=c_{n-2}+c_{n-3}$ for $n \ge 5.$

We will have to use the interval $I_1$ . Record this, then shift to considering $x_2,x_3,\cdots,x_{n}$ where the $x_2$ is now considered universal , i.e. in $A \cap B,$ If we do not use [x_2,x_3] then we have the $c_{n-2}$ cases for $x+3,x_4,\cdots,x_n$ If we do use $[x_2,x_3],$ it is for the benefit of $x_3$ in which case we do not use $[x_3,x_4].$ This gives the $c_{n-3}$ cases for $x_4,x_5,\cdots, x_n.$

Once the members of $A \cup B$ are listed in increasing order we can in $O(n)$ time list for each $x_i$ the closest $x_j \le x_i \le x_k$ with $x_j,x_k \in B$ ($A$ resp.) if $x_i \in A$ ($B$ resp.) I feel that $O(n)$ time might be enough to finish, but I could be wrong.

LATER A few more observations:

Every complete interval in $\Gamma$ is of type $AB,ABA,BA$ or $BAB$ where type $ AB$ means one or more points, all from $A$ (in increasing order) followed by one or more all from $B.$ However type $ABA$ means that with one exception every point is from $A$ and the excepional $B$ is not an endpoint.

In particular, if the points happen to be alternately from $A$ and $B$ (the hardest case in my opinion) then $\Gamma$ consists of a subset of the intervals $I_1,\cdots,I_{n-1}$ which never uses three in a row nor skips two subintervals in a row. In other words, looking at how the points are grouped, the number of possibilities is the number of ways to write $n$ as an ordered sum of $2$s and $3$s

If it happens that $I_k$ is longer than $I_{k-1}$ and $I_{k+1}$ combined then we must not use $I_k$ and hence are forced to use both $I_{k-1}$ and $I_{k+1}$

All this could be adapted to the general case.


While it doesn’t seem to be possible to beat the $O(n\log n)$ bound for general input, the problem can be solved in time $O(n)$ if $A\cup B$ is given sorted.

This builds on the ideas in fedja’s and Aaron Meyerowitz’s answers. As implicit in the other posts, I am assuming unit-cost real RAM or a similar model, so that arithmetic operations on real numbers are exact and take constant time.

Let $A\cup B=\{x_1,\dots,x_n\}$ with $x_1<x_2<\dots<x_n$. In order to compute $d_1(A,B)$, we will compute the sequence of numbers $$\begin{align*} u_i&=d_1\bigl(A\cap[x_1,x_i],B\cap[x_1,x_i]\bigr),\\ v_i&=d_1\bigl((A\cup\{x_i\})\cap[x_1,x_i],(B\cup\{x_i\})\cap[x_1,x_i]\bigr) \end{align*}$$ for $i=1,\dots,n$. This can be done in linear time using the following recurrence: $$\begin{align*} v_1&=0,\\ u_1&=\begin{cases}0&x_1\in A\cap B,\\\infty&\text{otherwise,}\end{cases}\\ v_{i+1}&=\min\bigl\{u_i,v_i+(x_{i+1}-x_i)\bigr\},\\ u_{i+1}&=\begin{cases} v_{i+1}&x_{i+1}\in A\cap B,\\ v_i+(x_{i+1}-x_i)&x_{i+1}\in A\smallsetminus B\text{ and }x_i\in B,\text{ or v.v.,}\\ u_i+(x_{i+1}-x_i)&x_{i+1},x_i\in A\smallsetminus B,\text{ or v.v.} \end{cases} \end{align*}$$ (Some optimization is possible, treating blocks of elements belonging to the same set in bulk. However, this does not improve the worst-case asymptotic complexity.) Note that the two cases in the minimum in the computation of $v_{i+1}$ correspond to the choice whether $[x_i,x_{i+1}]$ is in $\Gamma$ or not. It is straightforward to extend the algorithm so that it also outputs a witnessing sequence of intervals.

Hausdorff distance can be also computed in time $O(n)$ on sorted input, using the following pseudocode:

/* for each i, find distance to the maximal y \le x_i that belongs to the other set: */
lastA := lastB := -\infty
for i := 1 to n do:
  if x_i in A then lastA := x_i
  if x_i in B then lastB := x_i
  if x_i in A then d_i := x_i - lastB
              else d_i := x_i - lastA
/* do the same thing backwards, and compute the result along the way */
lastA := lastB := \infty
d := 0
for i := n downto 1 do:
  if x_i in A then lastA := x_i
  if x_i in B then lastB := x_i
  if x_i in A then d_i := min (d_i, lastB - x_i)
              else d_i := min (d_i, lastA - x_i)
  d := max (d, d_i)
return d

Somebody should check the argument below because I'm still terribly sleepy but it looks like you can get away with simple dynamic programming. Indeed, let's go from the left and see what's going on. First we are forced to connect the leftmost point with the next point in the other set. That creates certain length and a leftmost "universal point" for the remaining configuration so that any interval containing that universal point satisfies the requirement that it has both types of points in it. You can now join a few next points backward to that universal point but once you make a gap, you start all over and are forced to create the next universal point at some particular position.

This calls for going from the right and solving the optimization problem with a universal point to the right of that point for each position of the universal point in $A\cup B$ . We need just to record the total length and the last point joined to the universal point in the optimal configuration to keep track of everything. At each step consider all possibilities to make a gap and the next universal point ($n$ choices at most) and minimize the sum of the required length between the universal points (finding that length takes time $n$ at most, but you can utilize the information about the previous gap choice, so, probably, constant on average) and the minimum for the next universal point (already computed). So, the total time is at most $n^3$ for sure, probably $n^2$.

Again, sorry if I wrote some nonsense.