Count number of exact matching sequences

Lemma. Let $R = R(P, T)$ be an output sequence for given strings $P$ and $T$ of length $n$ and $2n - 1$ correspondingly. Then $$R = \underbrace{00\ldots\ldots\ldots\ldots\ldots\ldots0}_{\text{non-negative number of zeros}}\underbrace{1\underbrace{0\ldots0}_{\text{$k - 1$ zeros}}1\underbrace{0\ldots0}_{\text{$k - 1$ zeros}}1\ldots 1\underbrace{0\ldots0}_{\text{$k - 1$ zeros}}1}_{\text{non-negative number of ones}}\underbrace{00\ldots\ldots\ldots\ldots\ldots\ldots0}_{\text{non-negative number of zeros}}$$ for some positive integer $k$. (In other words, the distance between any two neighbouring ones in $R$ is the same.)

Proof. Suppose $$R = \underbrace{\ldots\ldots\ldots}_{\text{whatever}}1\underbrace{00\ldots0}_{\text{$k - 1$ zeros}}1\underbrace{00\ldots0}_{\text{$\ell - 1$ zeros}}1\underbrace{\ldots\ldots\ldots}_{\text{whatever}}$$ for some positive integers $k \ne \ell$. Let $k < \ell$, otherwise we can reverse all $P$, $T$ and $R$.

Let $P = a_0a_1\ldots a_{n - 1}$ where $a_i$ are symbols. Let $d = \gcd\{\,k, \ell\,\}$. Then $$T = T_1P_0P_1\ldots P_mT_2,$$ where $T_i$ and $P_i$ are strings such that $P = P_0 P_1 \ldots P_{x - 1}P'_x$, $|P_0| = |P_1| = \ldots = |P_{m - 1}| = d$, $0 \le |P_m| < d$ and $P'_x = P_m$ is a prefix of $P_x$. (Here $AB$ means concatenation of strings $A$ and $B$.)

Let $K = \frac{k}{d}$ and $L = \frac{\ell}{d}$. Then $P_i = P_{i + K}$ for $0 \le i < m - K$ and $P_i = P_{i + L}$ for $0 \le i < m - L$. Also $\gcd\{\,K, L\,\} = 1$.

So $P_0 = P_{K} = P_{2K} = \ldots = P_{yK}$ for $(y - 1)K < L \le yK$. If $K \mid L$ then it is easy to see that $P$ is prefix of $P_0P_1\ldots P_x = P_1P_2 \ldots P_{x+1}$, therefore $R$ misses at least one 1. Then $(y - 1)K < L < yK$ and $P_0 = P_{yK - L} = P_{yK - L + K} = \ldots$. Iterating this process we get $P_0 = P_1 = \ldots = P_{m - 1}$ and $P_m$ is a prefix of $P_0.$ Therefore $R$ misses at least one 1, and this contradiction proves lemma. $\square$

It is easy to see that every $R$ described in the condition of lemma is achievable, so lemma describes all $R$ possible.

To compute the number of such sequences it is better to count sequences of all 0's and sequences with one 1 and then for all $k$ find the nubmer of sequences with at least two 1's: $$1 + n + \sum_{k = 1}^{n - 1} \sum_{r = 0}^{k - 1}\binom{\left\lceil\frac{n - r}{k}\right\rceil}{2}.$$

Here $k$ is the distance between ones, $r$ is a remainder of (zero-based) position number of the first 1 modulo $k$ and $\binom{\left\lceil\frac{n - r}{k}\right\rceil}{2}$ is the number of ways to choose the first and the last 1's.

P. S. It is possible to show that asymptotics of this functions is $\frac12n^2\ln n$. Let $f(n)$ be the desired number of sequences. Using inequality $x \le \lceil x \rceil < x + 1$ we get $$1 + n + \sum_{k = 1}^{n - 1} \sum_{r = 0}^{k - 1}\binom{\frac{n - r}{k}}{2} \le f(n) < 1 + n + \sum_{k = 1}^{n - 1} \sum_{r = 0}^{k - 1}\binom{\frac{n - r}{k} + 1}{2}\\ \sum_{k = 1}^{n - 1} \sum_{r = 0}^{k - 1}\frac12\cdot\frac{n - r}{k}\left(\frac{n - r}{k} - 1\right) \le f(n) - 1 - n < \sum_{k = 1}^{n - 1} \sum_{r = 0}^{k - 1}\frac12\cdot\frac{n - r}{k}\left(\frac{n - r}{k} + 1\right)\\ \sum_{k = 1}^{n - 1} \sum_{r = 0}^{k - 1}\frac{1}{k^2}(n - r - k)(n - r) \le 2(f(n) - 1 - n) < \sum_{k = 1}^{n - 1} \sum_{r = 0}^{k - 1}\frac{1}{k^2}(n - r)(n - r + k)\\ \sum_{k = 1}^{n - 1} \frac{1}{k^2}\sum_{r = 0}^{k - 1}(n^2 + O(kn)) \le 2(f(n) - 1 - n) < \sum_{k = 1}^{n - 1} \frac{1}{k^2}\sum_{r = 0}^{k - 1}(n^2 + O(kn))\\ \sum_{k = 1}^{n - 1} \frac{1}{k^2}(kn^2 + O(k^2n)) \le 2(f(n) - 1 - n) < \sum_{k = 1}^{n - 1} \frac{1}{k^2}(kn^2 + O(k^2n))\\ n^2\sum_{k = 1}^{n - 1} \left(\frac{1}{k} + O\left(\frac1n\right)\right) \le 2(f(n) - 1 - n) < n^2\sum_{k = 1}^{n - 1} \left(\frac{1}{k} + O\left(\frac1n\right)\right)\\ n^2\ln n \sim n^2(H_{n - 1} + O(1)) \le 2(f(n) - 1 - n) < n^2(H_{n - 1} + O(1)) \sim n^2\ln n. $$ Thus $f(n) \sim \frac12n^2\ln n$.


\begin{align} f(0) &= 1 \\ f(1) &= 2 \\ f(n) &= 2f(n-1)-f(n-2)+\sigma_0(n-1) \\ &= 1+n+\sum_{i=1}^{n-1}i\cdot\sigma_0(n-i) \end{align}

Where $\sigma_0$ is the divisor count, A000005 on OEIS. I don't have a full formal proof, but I can sketch it.

Instead of looking at all possible $P$ and $T$, we're going to look directly at possible matching sequences $M$ to enumerate those. $M$ is always a possibility if it's all zeros (consider $P$ all ones, $T$ all zeros). $M$ is also always a possibility if it contains exactly one $1$ (consider $P$ all ones, $T$ all zeros except a run of $n$ ones in the right position).

What if $M$ contains multiple $1$s? If $M$ contains two $1$s which are $k$ positions apart, then all $P$ that can lead to $M$ have to have period $k$. This is easy to see if we look at an example. Consider $M=??1?1???$, where the $?$ are arbitrary, i.e. $k=2$. We have $P=abcdefgh$, where the $a$ to $h$ are (independently) $0$ or $1$. Those two $1$s in $M$ impose a certain structure on $T$:

M   ??101???
T = ??abcdefgh?????   imposed by first 1
T = ????abcdefgh???   imposed by second 1

From this, we can see that $a=c=e=g$ and $b=d=f=h$ and hence $P$ needs to have period $2$.

Now since $T$ has length $2n-1$ all the constraints imposed by the $1$s in $M$ overlap in at least one position. Hence, the segment between the first $1$ and the last $1$ has to obey the same period. In other words, the distance between any two adjacent $1$s in a valid $M$ must be the same. Valid examples include $111$ and $000100100100000$, but not $1101$, $010010001$ or $010101000101$. Smylic proves this formally in their answer.

With this in mind, we can construct either a recursive or an explicit formula for the number of valid $M$, $f(n)$. Let's have a look at the full list for $n=4$:

$$ 0000\\ 0001\\ 0010\\ 0011\\ 0100\\ 0101\\ 0110\\ 0111\\ 1000\\ 1001\\ 1010\\ 1100\\ 1110\\ 1111\\ $$

We can always generate a valid $M_n$ (i.e. a matching sequence of length $n$) by taking an $M_{n-1}$ and prepending a zero. That gives us the first eight $M_4$ in the list above. Their number if of course $f(n-1)$.

We can also generate a valid $M_n$ by appending a zero, but we need to make sure that we don't double-count with the previous step. The $M_{n-1}$ which we can append a zero to and obtain a new $M_n$ are those that start with a $1$. In otherwords, those that weren't obtained from prepending a zero. There are four of these for $M_4$ and their general number is $f(n-1) - f(n-2)$.

Finally, there are some $M_n$ that start and end with $1$. Since we're adding a new $1$, we need to make sure that their distances are all the same. But this is quite easy since we know that the $1$s span the entire $n$ positions. This means that there must be a $1$ in every $j$th position, where $j$ divides $n-1$ (e.g. $1001001$ where there's a $1$ in every third position and $3$ divides $n-1=6$). The number of ways we can write down these ones are the number of divisors of $n-1$, $\sigma_0(n-1)$.

Taking that all together we arrive at the above recursive formula

$$ f(n) = 2f(n-1)-f(n-2)+\sigma_0(n-1) $$

Alternatively, we can look at it explicitly: like I said, having all $0$s always works, which is one possibility. Having a single $1$ always works, which gives us $n$ possibilities. If there is more than one $1$, the $1$s span some substring of $M$ of length $2\leq i\leq n$. We can treat this substring the same way as the last case for the recursive derivation and find that there are $\sigma_o(i-1)$ ways to place the $1$s in that substring. Additionally, there are $n+1-i$ to place that substring into $M$, surrounded by $0$s, which gives us the multiplicity in the sum:

$$ f(n) = 1+n+\sum_{i=1}^{n-1}i\cdot\sigma_0(n-i) $$