How many Binary Strings of length N contain within it the substring '11011'?

Here we are looking for binary strings of length $N$ which do not contain the substring $11011$. The result is then $2^N$ minus this number.

The so-called Goulden-Jackson Cluster Method is a convenient technique to derive a generating function for problems of this kind.

We consider words of length $N\geq 0$ built from an alphabet $$\mathcal{V}=\{0,1\}$$ and the set $\mathcal{B}=\{11011\}$ of bad words which are not allowed to be part of the words we are looking for.

We derive a function $F(x)$ with the coefficient of $x^N$ being the number of wanted words of length $n$. According to the paper (p.7) the generating function $F(x)$ is \begin{align*} F(x)=\frac{1}{1-dx-\text{weight}(\mathcal{C})} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and with the weight-numerator $\mathcal{C}$ with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[11011]) \end{align*}

We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[11011])&=-x^5-\text{weight}(\mathcal{C}[11011])\left(x^3+x^4\right) \end{align*}

It follows: A generating function $F(x)$ for the number of words built from $\{0,1\}$ which do not contain the subword $11011$ is \begin{align*} F(x)&=\frac{1}{1-dx-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-2x+\frac{x^5}{1+x^3+x^4}}\\ &=\frac{1+x^3+x^4}{1-2x+x^3-x^4-x^5} \end{align*}

Since the generating function counting the number $2^N$ of all binary strings of length $N$ is \begin{align*} \frac{1}{1-2x}=1+2x+4x^2+\cdots \end{align*}

We conclude: A generating function for the number binary strings of length $N$ which contain the string $11011$ is

\begin{align*} \frac{1}{1-2x}-F(x)&=\frac{1}{1-2x}-\frac{1+x^3+x^4}{1-2x+x^3-x^4-x^5}\\ &=\frac{x^5}{(1-2x)(1-2x+x^3-x^4-x^5)}\\ &=x^5+4x^6+12x^7+31x^8+75x^9+175x^{10}\\ &\qquad 399x^{11}+894x^{12}+1975x^{13}+4313x^{14}+9330x^{15}+\cdots \end{align*}

The last line (1) was calculated with the help of Wolfram Alpha and we see the number of solutions of strings with length up to $N=15$.

For example the $12$ strings of length $7$ containing the substring $11011$ are

\begin{array}{cccc} \color{blue}{00}11011\quad&\quad\color{blue}{0}11011\color{blue}{0}\quad&\quad11011\color{blue}{00}\\ \color{blue}{01}11011\quad&\quad\color{blue}{0}11011\color{blue}{1}\quad&\quad11011\color{blue}{01}\\ \color{blue}{10}11011\quad&\quad\color{blue}{1}11011\color{blue}{0}\quad&\quad11011\color{blue}{10}\\ \color{blue}{11}11011\quad&\quad\color{blue}{1}11011\color{blue}{1}\quad&\quad11011\color{blue}{11}\\ \end{array}


Let $k$ be the number of blocks of 11011, let $j$ be the number of single overlaps of the blocks (giving $110111011$), and let $l$ be the number of double overlaps (giving $11011011$).

Then there are $\binom{k-1}{j}$ ways to choose the single overlaps, $\binom{k-1-j}{l}$ ways to choose the double overlaps,

$\binom{n-4k+l}{k-j-l}$ ways to choose the positions of the blocks (since there are $k-j-l$ dividers

and $n-5k+j+2l$ remaining digits), and $2^{n-5k+j+2l}$ ways to choose the other digits.

Using Inclusion-Exclusion, this gives $$\sum_{k=1}^{\lfloor\frac{n-2}{3}\rfloor}(-1)^{k+1}\sum_{j=0}^{k-1}\sum_{l=0}^{k-1-j}\binom{k-1}{j}\binom{k-1-j}{l}\binom{n-4k+l}{k-j-l}2^{n-5k+j+2l}$$.