Sets of unit fractions with sum $\leq 1$

Let $n_0$ be the smallest number such that the sum of the reciprocals of the integers from $n_0+1$ to $n$ is $<2$. It is easy to see that $n_0 \approx n/e^2$, since $\sum_{j>n/e^2}^{n} 1/j \approx \log n - \log (n/e^2) =2$. Now for any subset $A$ of $\{n_0 +1, \ldots, n\}$ either the sum of the reciprocals of elements in $A$ or the sum of the reciprocals of its complement must be $<1$. Therefore there are at least $$ \frac 12 2^{n-n_0} \asymp 2^{n(1-1/e^2)} $$ possible sets. My guess is that this exponent $1-1/e^2$ is correct -- note that $1-1/e^2 = 0.86466\ldots$.

Maybe my first guess is not right! Here's an upper bound, which gives an exponent around $0.91\ldots$ (my numerical calculations are pretty rough). For any positive $x$, an upper bound on the quantity we want is $$ e^x \prod_{j=1}^n (1+e^{-x/j}). $$ To see this, just expand out the product and terms with sum of reciprocals less than $1$ will contribute at least $1$, and the rest are positive. Now choose $x$ so as to minimize the above (a standard idea, known in analytic number theory as Rankin's trick).

Calculus shows that one must choose $x$ so that $$ 1= \sum_{j=1}^{n} \frac 1j \frac{1}{1+e^{x/j}}. $$ It is natural to guess that $x$ is of the shape $\alpha n$ for a constant $\alpha$, and then for large $n$ the condition on $\alpha$ becomes $$ 1= \int_0^1 \frac{1}{1+e^{\alpha/y}} \frac{dy}{y} = \int_1^\infty \frac{1}{1+e^{\alpha y}} \frac{dy}{y}. $$ If I calculated right, this gives $\alpha \approx 0.1273$. For this choice of $\alpha$ (and so $x$), one obtains the bound (approximately) $$ \exp\Big(n\Big( \alpha + \int_0^1 \log (1+e^{-\alpha/y}) dy\Big)\Big), $$ which seems to be about $$ \exp(-.631n) \approx 2^{0.911n}. $$ (I won't swear to the numerics -- someone should check.)

My second guess is that the upper bound is tight (and I think this could be proved with some effort). The idea is to choose $j$ to be in your set with probability $1/(1+\exp(x/j))$ with the same $x$ as in the upper bound. The expected value of $1/j$ with this distribution is $1$, by the choice of $x$. An entropy calculation for this distribution then gives the exponent. (More generally, in all the situations I know, the Rankin upper bound is pretty close to optimal.)


The bound proposed by Lucia is correct. I add some detail and I stress that It is an application of "large deviation" theorem which is very very standard.

Set the following independent Bernoulli random variable defined as $$X_i=\begin{cases} 0 \text{ with } p=1/2\\ \frac{1}{i} \text{ with } p=1/2\end{cases}$$ Then we have the number of subset is given by $$ 2^n \mathbb{P}(\sum_{i=1}^n X_i \leq 1)$$

And we can then adapte the proof of the well known Cramer theorem. For a lower bound $$\mathbb{P}(\sum_{i=1}^n X_i\leq 1)=\mathbb{P}(e^{-x\sum_{i=1}^n X_i}\geq e^{-x})\leq \frac{\mathbb{E}(e^{-x\sum_{i=1}^n X_i})}{e^{-x}}$$ which give because of the independence of $X_i$ the formula that Lucia have already stated (I didn't know it is called Rankin bound) $$\mathbb{P}(\sum_{i=1}^n {X}_i \leq 1)\leq e^{x} \prod_i^n (e^{-\frac{x}{i}}+1)/2^n$$

There exists $x_0$ which minimise the right part.

Then introduce $$\tilde{X}_i=\begin{cases} 0 \text{ with } p=\frac{1}{1+e^\frac{-x_0}{i}}\\ \frac{1}{i} \text{ with } p=\frac{e^\frac{-x_0}{i}}{1+e^\frac{-x_0}{i}} \end{cases}$$

Remark that $$ \mathbb{E}(\sum_i \tilde{X}_i)=\frac{\mathbb{E}(\sum_i X_i e^{-x_0\sum_{i=1}^n X_i})}{\mathbb{E}(e^{-x_0\sum_{i=1}^n X_i})}=\partial_x [ln(\mathbb{E}(e^{-x_0\sum_{i=1}^n X_i}))]_{x=x_0}$$ But because $x_0$ is a minimum, $\partial_x [ln(\mathbb{E}(e^{-x_0\sum_{i=1}^n X_i}))-ln(e^{-x})]_{x=x_0}=0$ and therefore $\mathbb{E}(\sum_i \tilde{X}_i)=1$. But because of convexity for any $\epsilon>0$ changing $\tilde{X}$ by $\tilde{X}^\epsilon$ by replacing $x_0$ by $x=x_0+\epsilon n$ in the definition give $\mathbb{E}(\sum_i \tilde{X}_i)\leq 1-\delta(\epsilon)$ with $\delta(\epsilon)>0$. We have then that $\mathbb{P}(\sum\tilde{X}^\epsilon_i\leq 1)\geq \delta(\epsilon)$ We can then state the lower bound $$ \delta(\epsilon)\leq \mathbb{E}(1_{\sum(\tilde{X}_i^\epsilon)\leq 1})\leq \frac{\mathbb{E}(1_{\sum X_i \leq 1} e^{-x \sum_i X_i})}{\mathbb{E}(e^{-x \sum_i X_i})} \leq \frac{\mathbb{E}(1_{\sum X_i \leq 1} )e^{-x}}{\mathbb{E}(e^{-x \sum_i X_i})}$$ And to conclude $$ \delta(\epsilon)\frac{\mathbb{E}(e^{-x \sum_i X_i})}{e^{-x}}\leq \mathbb{P}(\sum X_i \leq 1)$$ and therefore $$\lim \frac{1}{n}\log(\frac{\mathbb{E}(e^{-x \sum_i X_i})}{e^{-x}})\leq \lim \frac{1}{n}\log(\mathbb{P}(\sum X_i \leq 1))$$

which is true for any $\epsilon>0$. This is the end of the proof that the bound is tight


Let $R > 1$ and $\lambda \in \mathbb{R}$ be such that $$ \int_{1}^R \mathrm{tanh}(\frac{\lambda x}{2}) \frac{d x}{x} = \log R -2. $$ Then standard techniques in large deviation theory yield $$ \frac{1}{n} \log | \{ I \subseteq [n R^{-1},n] | \sum_{i \in I} \frac{1}{i} \leq 1 \} | \longrightarrow \int_{1}^R \phi(\lambda x)\frac{d x}{x^2} $$ where $\phi(u) = \log2 +\log \cosh (\frac{u}{2}) - \frac{u}{2} = \log(1 + e^{-u})$. For $R = e^2$ one has $\lambda =0$ and the limit is $(1 - e^{-2})\log 2$ : this recovers Lucia's first result.

Moreover, letting $R$ tend to infinity, then $\lambda$ tends to Lucia's constant $\alpha$, and the limit integral matches Lucia's upper bound. In particular Lucia's "second guess" was correct.

EDIT: Some details about the said "standard techniques". Let $(X_i)_i$ be independent Bernoulli random variable ($= \pm 1$ with probability $1/2$), and let $(t_i)_{i=1}^n$ be positive real numbers lying in a fixed interval $(-R,R)$. We are going to study the probability that $S = \sum_{i=1}^n t_i X_i$ is $\geq c$, for some $c \in \mathbb{R}$. Let $\lambda \in \mathbb{R}$ be such that $$ \frac{1}{n} \sum_{i=1}^n t_i \tanh(\lambda t_i) = c. $$ This amounts to say that $E[\frac{1}{n} S e^{\lambda S}] = c E[e^{\lambda S}]$. Such a $\lambda$ exists as soon as $|c| < \frac{1}{n} \sum_{i=1}^n t_i$. One checks that $$ E[(\frac{1}{n} S)^2 e^{\lambda S}]E[e^{\lambda S}]^{-1} = c^2 + \frac{1}{n^2} \sum_{i=1}^n t_i^2(1- \tanh(\lambda t_i)^2) $$ so that the variance of $\frac{1}{n} S$ w.r.t. the measure weighted by $e^{\lambda S} E[e^{\lambda S}]^{-1}$, is at most $R^2 n^{-1}$. In particular $$ E[ \mathbb{1}_{\frac{1}{n} S \in [c,c+\epsilon]} e^{\lambda S}] E[e^{\lambda S}]^{-1} = \frac{1}{2} + O(R^2 \epsilon^{-2} n^{-1}). $$ Thus $$ P( S \geq nc) \geq e^{-n \lambda (c+\epsilon)}E[ \mathbb{1}_{\frac{1}{n} S \in [c,c+\epsilon]} e^{\lambda S}] = e^{-n \lambda (c+\epsilon)} E[e^{\lambda S}] \left( \frac{1}{2} + O(R^2 \epsilon^{-2} n^{-1}) \right) $$ On the other hand $$ P( S \geq nc) \leq e^{-n \lambda c} E[e^{\lambda S}] $$ so that $$ \frac{1}{n} \log P( S \geq nc) = - \lambda c + \frac{1}{n} \log E[e^{\lambda S}] + o_{R}(1) $$ For the application above, just use $t_i = \frac{n}{2i}$ (indexed by $i$ between $n R^{-1}$ and $n$) and $c = \frac{1}{2} \sum_{n R^{-1} < j \leq n} \frac{1}{j} - 1$.