How many possible phone words exist for a phone number of length N when also counting words less than length N within that phone number?

[2015-12-10] Update: Thanks to a comment from @JohnMachacek I could include a reference to a paper proving the conjecture about tight upper bounds.


Note: This is a partial answer addressing some upper bounds and looking at some special cases. But first I like to state the problem.

Current Situation:

We consider phone numbers as non-empty strings build from an alphabet $$\mathcal{A}=\{2,3,4,5,6,7,8,9\}$$ We ignore the digits $0$ and $1$ as the corresponding keys of OPs phone keypad do not contribute any alphabetical characters. Digits from $\mathcal{A}$ are mapped to either three or four characters. So, the digits are associated with weights of size $3$ or $4$.

For convenience only we simplify the problem and consider the same weight $m>0$ for each digit in $\mathcal{A}$.

OP is asking for the number $\varphi(w)$ of words and all subwords, which can be associated with a given phone number $w$ of length $n$. We can reformulate this problem by asking for all substrings of $w$, weighted with weight $m$ accordingly.

Example: If we look at the two phone numbers $3633$ and $3336$ both having length $w=4$, we get following substrings

\begin{align*} 3633&\quad\rightarrow\quad \{3633,363,633,33,36,63,3,6\}\\ 3336&\quad\rightarrow\quad \{3336,333,336,33,36,3,6\} \end{align*}

We observe, that even if the words contain the same digits together with the same multiplicities, the number of substrings is different according to the constellation of the blocks consisting of equal digits. While $3633$ has three different substrings of length two, the string $3336$ has two different substrings of length two. We obtain with respect to the number of substrings: \begin{align*} \varphi(3633)&=m^4+2m^3+\color{blue}{3}m^2+2m\\ \varphi(3336)&=m^4+2m^3+\color{blue}{2}m^2+2m\\ \end{align*}

Upper bounds

Finding a generating function which provides the distribution of $\varphi(w)$ for all different phone numbers of length $n$ is (regrettably) beyond the scope of this answer. But we can at least provide some upper bounds for all words of length $n$. If we consider a word $w$ of length $n$ and a substring of length $k, 1\leq k \leq n$ there are two limitations:

  • The number of substrings of length $k$ is limited by the size of the alphabet $\mathcal{A}$.

  • There are at most $n-k+1$ substrings of length $k$ in a word of length $n$.

Since the number of substrings of length $k$ is less or equal $\min\{|\mathcal{A}|^k,n-k+1\}$ we conclude: An upper bound for $\varphi(w)$ with length of $w$ equal to $n$ is \begin{align*} \varphi(w)\leq\sum_{k=1}^{n}\min\left\{|\mathcal{A}|^k,n-k+1\right\}m^k\tag{1} \end{align*}

If we do not consider the size of the alphabet, we can provide a closed expression for a somewhat larger upper bound and claim

The following is an upper bound for $\varphi(w)$ with length of $w$ equal $n$

\begin{align*} \varphi(w)\leq\frac{m\left(m^{n+1}-m(n+1)+n\right)}{(m-1)^2}\tag{2} \end{align*}

This holds true since according to (1) \begin{align*} \varphi(w)&\leq\sum_{k=1}^{n}\min\left\{|\mathcal{A}|^k,n-k+1\right\}m^k\\ &\leq\sum_{k=1}^{n}(n-k+1)m^k\\ &=(n+1)\sum_{k=1}^{n}m^k-\sum_{k=1}^{n}km^k\tag{3} \end{align*} Using the formula for the finite geometric series we get \begin{align*} \sum_{k=1}^{n}m^k&=\frac{1-m^{n+1}}{1-m}-1=m\frac{1-m^n}{1-m}\\ \sum_{k=1}^{n}km^k&=m\sum_{k=1}^nkm^{k-1}\\ &=m\frac{d}{dm}\left(\sum_{k=1}^nm^k\right)\\ &=m\frac{d}{dm}\left(\frac{m-m^{n+1}}{1-m}\right)\\ &=m\frac{nm^{n+1}-(n+1)m^n+1}{(1-m)^2} \end{align*} Putting these two results into (3) and the claim (2) follows.

Note: The closed expression (2) is not necessarily a tight bound. If we consider e.g. the current problem with an alphabet of size $7$ we observe the expression (1) produces closer bounds for words with length $n>7$.

With $|\mathcal{A}|=7$ and weight $m=3$ according to three characters for each digit we get following upper bounds for small values of $n$

\begin{array}{rcccccccccc} n&1&2&3&4&5&6&7&8&9&10\\ \text{upper bound (1)}&3&15&54&174&537&1629&4908&\color{blue}{14745}&\color{blue}{44265}&\color{blue}{132834}\\ \text{upper bound (2)}&3&15&54&174&537&1629&4908&14748&44271&132843\\ \end{array}

Tight upper bounds

In the comment section of the OEIS sequence A094913 an interesting conjecture claims the upper bound (1) is for binary alphabets even a tight upper bound.

In fact even more is true. For each number $n>0$ and each alphabet of size $t>0$ the expression at the RHS of (1) is a maximum.

This is stated in Theorem 8 of the paper Strings with Maximally Many Distinct Subsequences and Substrings by A. Flaxman, etal. The maximum can be achieved by a modified De Bruijn word.