Approximation to the monkey typewriter probability

A experiment with real monkeys suggested that the keys pressed are not independent - in that case they repeated letters a lot

Back to your question, it might be sensible to subtract $k-1$ from $n$ as you cannot get your desired string of $k$ characters with in the first $k-1$ attempts.

Ignoring that point, an approximation with large $n$ is $$1 - \left(1 - \frac{1}{k^m}\right)^n \approx 1- e^{-n/k^m} $$

If you want the right hand side to stay constant as $k$ increases by $1$ then you want $\frac{n_1}{(k+1)^m} \approx \frac{n_0}{k^m}$ so you want $\frac{n_1}{n_0} \approx \left(1+\frac1k\right)^m$.

Whether $e^{m/k}$ is a good approximation to that ratio depends on the particular values of $m$ and $k$


Expression:

$$1 - (1 - \frac{1}{k^m})^n$$

Questions:

Q1. Is there an approximation we can use in this instance?

Q3. Related to 1 - how do I compute the probability for the case where k = 26, m = 50, n = 1,000,000? I couldn't do it in Python.

I will give a single response to both questions.

My response here assumes that $k = 26.$ Consideration of other values of $k$ is in Question 2.

I am unfamiliar with the math mentioned in the other responses.

I'm taking this opportunity to explore trying to use logarithms, base 10. I am going to describe how the approach would work, and where the approach may fail.

Personally, I prefer base 10 rather than base $e$ because the logarithms involve smaller numbers, and because it becomes easier to relate to the decimal counting system.

You could reason that $\log_{10} (4)$ is just over 0.6,

and $\displaystyle\frac{1}{26}$ is just under $\displaystyle\frac{4}{100}$.

So you could estimate $\log_{10} \left(\frac{1}{26}\right)$ as about $(0.6 - 2.0)$.

Alternatively, you could have the computer provide its own approximation of $\log_{10} \left(\frac{1}{26}\right)$ to however many decimal points you want.

Once this is done, it is easy to compute $\log_{10} \left(\frac{1}{26^m}\right) = m \times \log_{10} \left(\frac{1}{26}\right)$

Once this is done, you simply use the computer to calculate the anti-log.
That is, if the $\log_{10} (x) = a,$ then $x = 10^a.$

Then, having computed the actual value of $r = \left(\frac{1}{26^m}\right)$
you then compute $s = \log_{10} (1-r).$

Then, you have that the $\log_{10} (1 - \frac{1}{k^m})^n = n \times s.$

Then, you convert the above to an anti-log and subtract it from 1.

Stumbling block

I have never actually experimented or researched the use of logarithms, on a personal computer, in this type of situation, where the exponent $n$ was anywhere near 1,000,000.

This may well be totally unworkable for such a large exponent.

One try, is to see if your computer handles such numbers when scientific notation is involved:

$u \times 10^v$, where $1 \leq u < 10.$

Another try is to look for a specialized software library customized to your language (e.g. python, c, java, ...) that is designed to handle such large exponents.

If these approaches do not work, then I think that the logarithm approach must be scrapped.

Q2. I would like to understand how this probability varies as we add one more character - by how much must we increase n to keep the probability the same?

If you are using logarithms, you can simply examine
$\log_{10} \left(\frac{1}{26}\right)$ vs $\log_{10} \left(\frac{1}{27}\right).$

More formally, $\frac{1}{26} - \frac{1}{27} = \frac{1}{26 \times 27}.$

In general, $\frac{1}{k} - \frac{1}{(k+1)} = \frac{1}{k \times (k+1)}.$

Another viewpoint is that by switching from $\frac{1}{k}$ to $\frac{1}{(k+1)}$

you are simply applying a scaling factor of $\frac{k}{(k+1)}.$

Therefore, as $\frac{1}{k}$ to $\frac{1}{(k+1)}$,

$\frac{1}{k^m}$ to $\frac{1}{k^m} \times \left(\frac{k}{(k+1)}\right)^m.$

Edit
I just realized that I failed to address this question.

by how much must we increase n to keep the probability the same?

The math is very ugly. Ironically, perhaps the simplest expression involves logarithms.

You want to solve for $x$, where

$$\left[1 - \left(\frac{1}{k^m}\right)\right]^n ~=~ \left[1 - \left(\frac{1}{(k+1)^m}\right)\right]^x. $$

This means that

$$\left(\frac{k^m - 1}{k^m}\right)^n ~=~ \left(\frac{(k+1)^m - 1}{(k+1)^m}\right)^x.$$

Taking logarithms, this means that

$$x \times \log_{(10)} \left(\frac{(k+1)^m - 1}{(k+1)^m}\right) ~=~ n \times \log_{(10)} \left(\frac{k^m - 1}{k^m}\right).$$

Q4. I would like to get some intuition about how this expression varies for $m$.

This question was not listed, but was broached in the query.

My explanation for this will necessarily be both limited and convoluted.

I will limit my explanation to how $\left[1 - \frac{1}{(26)^m}\right]$ varies with $m$.

First consider $d = \left[\frac{1}{(26)^m}\right]$.

As $m \to (m+1)$, this value goes from $d \to \left[\frac{d}{26}\right].$

This means that $(1 - d)$ goes to $\left[1 - \frac{d}{26}\right].$

Therefore, the expression moves $26$ times closer to $1$.

Tags:

Probability