What is the expected number of distinct strings from a single edit operation?

Substitutions are easy – we get $n$ different substitution results.

For insertions and deletions, we need the expected number of changes between $0$ and $1$. There are $n-1$ potential change locations, and each is a change with probability $\frac12$, so the expected number of changes is $\frac{n-1}2$, so the expected number of runs is $\frac{n+1}2$.

The result of a deletion is determined by the run in which we delete, so the expected number of deletion results is $\frac{n+1}2$.

We can count insertions separately according to whether they change the number of runs. If they don't, they just increment the length of some run, and we again expect $\frac{n+1}2$ of these. If they do increase the number of runs, that's because they insert a specific bit in any of $n+1$ locations that isn't a change location, of which we expect $\frac{n-1}2$, so we expect $n+1-\frac{n-1}2=\frac{n+3}2$ such locations.

Thus, in total we have

$$ \mathbb E(f(X))=n+\frac{n+1}2+\frac{n+1}2+\frac{n+3}2=\frac52(n+1)\;. $$


It may be useful to know that for a random string of length $n$

  • It has $n$ characters
  • The expected number of groups of identical characters is $\frac{n+1}2$
  • The expected number of pairs of identical characters is $\frac{n-1}2$
  • The number of ends is $2$

So for different types of edits:

  • The number of possible substitutions is $n$
  • The expected number of shrinkages of a group of identical characters is $\frac{n+1}2$
  • The expected number of expansions of a group of identical characters with the same character is $\frac{n+1}2$
  • The expected number of insertions of a different character into a pair of identical characters is $\frac{n-1}2$
  • The number of possible insertions of a different character at the beginning or end is $2$

making the expected number of possible edits $n+\frac{n+1}2+\frac{n+1}2+\frac{n-1}2+2 = \frac{5(n+1)}{2}$