Probability concept that distinguishes likelihood of sequences 0110101011101... and 000000000000...?

The sequences themselves do have the same probability. However, what makes the first seem more likely to you is the fact that it has a more even number of ones and zeros. Instead of testing the likelihood of the exact sequences, you can test the likelihood of the number of ones and the number of zeros.

For a fair coin, this is binomial with $p = \frac{1}{2}$, meaning that the probability that the number of ones in a sequence of length $n$ is $k$ is

$P_{\frac{1}{2}}(k) = {n\choose{k}}\frac{1}{2}^n$

The only thing that differentiates the sequences is essentially ${n\choose{k}}$, which represents the number of sequences of length n with k ones. This would indeed give a higher probability to your first sequence.

Another way of looking at it, is treating $p$ as unknown. For a given $p$, such that $P(x_i = 1) = p = 1 - P(x_i = 0) \forall i$, the probability of a sequence $x_1, ... x_n$ is

$P(x_1, ... x_n | p) = p^{ |\{i : x_i = 1\}|}\cdot (1 - p)^{|\{i : x_i = 0\}|}$

Now, which $p$ maximizes the likelihood for each sequence? How far is it from the true $p = \frac{1}{2}$ for a fair coin?


You can distinguish your three sequences in terms of their period.

The constant sequence $00000000...$, as well as $1111111111...$ have period $1$ and can be generated by the difference equation $a(n)=a(n-1)$. Closed forms for the sequence are $a(n)=0$ and $a(n)=1$

Sequences $010101...$ and $1010101...$ have period 2 and follow the recurrence $a(n)=a(n-2)$. The corresponding closed forms are $a(n)=\frac{1-(-1)^n}{2}$ and $a(n)=\frac{1+(-1)^n}{2}$. These are simple deterministic formulas, so these sequences are seen to be less random than the first (not random at all).

The patterns can be predicted linearly with zero prediction error after the initial values. One single value is enough for predicting sequence (2) and two values are needed for sequence (3), corresponding to the order of the linear models (first and second order).

The first sequence suggests no period. If it was periodic, the period would be at least $11$, eleven (the last two bits $01$ echoing the first two bits).


From another point of view, let us map your sequences to numbers. Interpret each bit in a positional system, each bit having weight $2^{-n}$, where $n$ is its position.

Your second sequence maps to the integer number $0$. Your third sequence maps to the rational number $$\sum_{k=0}^\infty \frac{1}{2^{2k+1}}=\frac{2}{3}$$

The first sequence would likely map to an irrational number, unless some periodicity appeared later on.

Since most numbers are irrational (Probability of Getting a Rational Number), sequences like the first one are more likely.

Tags:

Probability