A community project: prove (or disprove) that $\sum_{n\geq 1}\frac{\sin(2^n)}{n}$ is convergent

Short answer: if $\frac 1 \pi$ is a normal number in base $2$, then the series converges in measure (but not necessarily converges in the usual sense). However, the normality of $\pi$ and $\frac 1 \pi$ has not yet been proved (and it's not known could it be proved). I didn't attempt to prove the reverse, that from convergency in measure follows normality of $\frac 1 \pi$.

Disclaimer: I would be glad if someone with good knowledge of measure theory checked and maybe helped to improve and make more rigorous the part that justifies introduction of the probability space. $\DeclareMathOperator{\E}{\mathbb{E}}$ $\DeclareMathOperator{\Var}{Var}$ $\DeclareMathOperator{\Cov}{Cov}$

First, because of periodicity of sine, $\sin\left(2^n\right) = \sin\left(2\pi\left\{\frac{2^n}{2\pi}\right\}\right) = \sin\left(2\pi \left\{2^{n-1}\frac{1}{\pi}\right\}\right),$ where $\{\}$ denotes taking a fractional part of a number.

I define a number $c$ to be normal in base $2$ if $$\left (\forall(a,b)\in\left\{(a,b):a\in(0,1)\wedge\ b\in (a,1)\right\}:\lim_{N\to\infty}\left(\frac 1 N\sum_{n=1}^{N}I\left[\left\{2^n c\right\}\in(a,b)\right] \right)=b-a,\right)\wedge\\\left(\forall a\in[0,1]: \lim_{N\to\infty}\left(\frac 1 N \sum_{n=1}^{N}I\left[\left\{2^nc\right\}=a\right]\right)=0\right).$$ This definition could be found e.g. here, page 127.

Below I'm assume that $\frac 1 \pi$ is a normal number in base $2.$

Let's introduce a probability space $(\Sigma, \mathcal{F}, P)$ satisfying Kolmogorov's axioms taking $\Sigma = (0,1),$ $\mathcal{F}$ a Borel algebra on $(0,1)$ and a probability measure $P$ so that the the measure of an open interval is equal to its length, and the measure of a point is equal to zero. This probability space describes probability space of a uniformly distributed on $(0,1)$ random value.

There are two remarks here. First, the probability I'm talking about in this answer is the frequentist probability that draw consequences from infinite (but fixed) sequences of points with known distribution they are sampled from. It is not the Bayesian probability that characterizes degrees of belief. Second, there is a theory of measurable spaces that generalizes the concept of probability space with less restrictive axioms. I don't use it because more restrictive probability axioms are enough for this problem and at this moment I'm more familiar with probability theory rather than general measure theory.

Let's define a sequence of real number $\xi_n$ to be so that $$\left (\forall(a,b)\in\left\{(a,b):a\in(0,1)\wedge\ b\in (a,1)\right\}:\lim_{N\to\infty}\left(\frac 1 N\sum_{n=1}^{N}I\left[\xi_n \in(a,b)\right] \right)=b-a,\right)\wedge\\\left(\forall a\in[0,1]: \lim_{N\to\infty}\left(\frac 1 N \sum_{n=1}^{N}I\left[\xi_n=a\right]\right)=0\right)\wedge\\ \left( \forall n > 0: \left\{2\xi_{n} - \xi_{n+1}\right\} = 0 \right). $$

This sequence can be drawn from our probability space, thus in the frequentist probability language it is a uniformly distributed sequence drawn from our probability space. Simultaneously a sequence $\{2^n c\}$ satisfies all conditions for $\xi_n,$ so it is possible to work with $\{2^n c\}$ as with a particular fixed sequence drawn from our probability space.

I call a series $\sum_{n=1}^\infty x_n$ converging in measure to value $S$ if $$\forall \varepsilon > 0: \lim\limits_{N \to \infty} \left(\frac 1 N \sum_{n=1}^{N} I \left[\left|S - \sum_{n=1}^{N} x_n \right| > \varepsilon \right] \right) = 0.$$

From this definition of convergency in measure it follows that the series converge iff $$\forall \varepsilon > 0 \ \exists N > 0: \lim\limits_{M\to\infty}\left(\frac 1 M \sum_{n=N}^{N+M} I\left[\left|\sum_{n=N}^{N+M} x_n\right| > \varepsilon \right] \right) = 0.$$

This expression is what I'm aiming to prove for $x_n=\frac{\sin(2\pi \xi_n)}{n}.$

Because $\xi_n$ as it is argued above could be treated as a sample from the defined above probability space, the sequences $\frac 1 M \sum_{n=N}^{N+M} I\left[\left|\sum_{n=N}^{N+M} x_n\right| > \varepsilon \right]$ and $\frac 1 M \sum_{n=N}^{N+M} I\left[\left|\sum_{n=N}^{N+M} x_n\right| > \varepsilon \right]$ become samples from corresponding to them probability spaces too, and the properties of the space they are sampled from could be inferred.

Thus the convergency in measure criterion defined above could be rephrased in terms of the corresponding to the sequence probability space as $$\forall \varepsilon > 0 \ \exists N > 0: \lim\limits_{M\to\infty} P\left(\left|\sum_{n=N}^{N+M} x_n\right| > \varepsilon \right) = 0.$$ In probability theory this type of convergency is called convergency in probability, and it is a weaker type of convergency than convergency with probability $1.$

Let's define $\Delta_{N,M}$ as $\Delta_{N,M} = \sum_{n=N}^{N+M} x_n.$ Then $\E\left[\Delta_{N,M}\right] = 0$ because sequence $2\pi \xi_n$ is uniform on $(0, 2\pi)$ and sine is symmetric.

From Chebyshev inequality $\forall \varepsilon > 0:\ P\left(\left|\Delta\right| > \varepsilon\right) < \frac{\E\left[\Delta_{N,M}^2\right]}{\varepsilon^2}.$ Thus to show the convergency in probability it is enough to show that $\lim\limits_{N\to\infty}\lim\limits_{M\to\infty} \E\left[\Delta_{N,M}^2\right] = 0.$

Let's show that $\lim\limits_{N\to\infty}\lim\limits_{M\to\infty} \E\left[\Delta_{N,M}^2\right] = 0$ using the idea from this question.

The variance of $\Delta_{N,M}$ could be expressed as

$$\E\left[\Delta_{N,M}^2\right] = \E\left[\left(\sum\limits_{n=N}^M x_n\right)^2\right] = \E\left[\sum\limits_{n=N}^M \sum\limits_{k=N}^M x_n x_k \right] = \sum\limits_{n=N}^M \sum\limits_{k=N}^M \E\left[ x_n x_k \right] =\\ 2\sum\limits_{n=N}^M \sum\limits_{k=n+1}^{M} \E\left[ x_n x_k \right] + \sum\limits_{n=N}^M \E\left[ x_n x_n \right] \leq 2\sum\limits_{n=N}^M \sum\limits_{k=n}^{M} \E\left[ x_n x_k \right] = 2\sum\limits_{n=N}^M \sum\limits_{k=0}^{M-n} \left|\E\left[ x_n x_{n+k} \right]\right|.$$

From this $$\left|\E\left[ x_n x_{n+k} \right]\right| = \left|\E\left[ \frac{\sin\left(2\pi \xi_n\right) \sin\left(2\pi \xi_{n+k}\right)}{n(n + k)}\right]\right|,$$ and as it is shown in Appendix 1, $$\left|\E\left[ \frac{\sin\left(2\pi \xi_n\right) \sin\left(2\pi \xi_{n+k}\right)}{n(n + k)}\right]\right| \leq \frac {2^{-k}}{n(n+k)}.$$

So $$\E\left[\Delta_{N,M}^2\right] \leq \sum\limits_{n=N}^M \sum\limits_{k=0}^{M - n} \frac{2^{-k}}{n(n+k)}.$$

As it is snown in Appendix 2, $$\lim_{N \rightarrow \infty} \lim_{M \rightarrow \infty} \sum\limits_{n=N}^M \sum\limits_{k=0}^{M - n} \frac{2^{-k}}{n(n+k)}=0,$$

and from this it follows that $\Delta_{N,M}$ converges in probability to zero as $M \rightarrow \infty$ and $N \rightarrow \infty.$

So the series converges in probability, or in measure introduced by uniformely distributed (in case of normality of $\frac 1 \pi$) sequence of numbers $\xi_n = \left\{2^n \frac 1 \pi\right\}.$

Appendix 1

Let $\xi_n = \left\{2^n a\right\},$ where $a$ is a normal number, be written as a binary fraction $\xi_n = 0.b_{n,1}b_{n,2}b_{n,3}... = \sum\limits_{m=1}^\infty b_{n,m}2^{-m},$ where each number $b_{n,m}$ is either $0$ or $1.$ Then $\xi_{n+k} = \sum\limits_{m=1}^\infty b_{n,m}2^{-m+k}I\left[m < k\right] = \sum\limits_{m=1}^\infty b_{n,m+k}2^{-m}$ and $\xi_n = \sum\limits_{m=1}^{k-1}b_{n,m} 2^{-m} + 2^{-k}\xi_{n+k}.$

Using the same probability measure as in the main part of the answer, that treats $\xi_n$ as uniformly distributed random variables on $(0,1),$ it is possible to treat $b_{n,k} = \lfloor 2^k \xi_n \rfloor \mod 2$ as random variables too. For each $n$ $b_{n,1}$ and $b_{n,2}$ should be independent, i.e. all possible combinations of values of $b_{n,1}$ and $b_{n,2}$ should be equiprobable, otherwise probabilities of $\xi_n$ being in subsets $\left(0,\frac 1 4\right],$ $\left(\frac 1 4, \frac 1 2\right],$ $\left(\frac 1 2 , \frac 3 4 \right],$ and $\left(\frac 3 4, 1 \right )$ would not be equal, that contradicts the assumption about uniform distribution of $\xi_n.$

The independence of $B_{n,k} = \sum\limits_{m=1}^{k-1}b_{n,m} 2^{-m}$ and $b_{n,k},$ $k >1$ could be shown by induction by $k$ using the same argument about uniform distribution of $\xi_n$ from the previous paragraph. From this independence follows independence of $B_{n,k}$ and $\sum\limits_{m=k}^{\infty}b_{n,m} 2^{-m},$ which is equivalent to independence of $B_{n,k}$ and $\xi_{n+k}.$

Using the obtained results let's estimate absolute value of covariance of $\sin \zeta_n$ and $\sin \zeta_{n+k},$ where $\zeta_n = 2\pi \xi_n:$

$$\E\left[\sin \zeta_n \sin \zeta_{n+k}\right] = \E\left[\sin\left(2\pi B_{n,k} + \zeta_{n+k}2^{-k}\right) \sin \zeta_{n+k}\right].$$

Because $\sin\left(\alpha\beta\right) = \sin\alpha\cos\beta + \cos\alpha\sin\beta,$ $$\sin\left(2\pi B_{n,k} + \zeta_{n+k}2^{-k}\right) = \sin\left(2\pi B_{n,k}\right) \cos\left(\zeta_{n+k}2^{-k}\right) + \cos\left(2\pi B_{n,k}\right) \sin\left(\zeta_{n+k}2^{-k}\right) = \sin\left(2\pi B_{n,k}\right) + 2^{-k} \zeta_{n+k} \cos\left(2\pi B_{n,k}\right) + o(2^{-k}),$$ and $$\E\left[\sin \zeta_n \sin \zeta_{n+k}\right] = \E\left[\sin\left(2\pi B_{n,k}\right) \sin \zeta_{n+k}\right] + \E\left[2^{-k} \zeta_{n+k} \cos\left(2\pi B_{n,k}\right) \sin \zeta_{n+k}\right] + o(2^{-k}).$$

From independence of $B_{n,k}$ and $\xi_{n+k}$ it follows that $\E\left[\sin\left(2\pi B_{n,k}\right) \sin \zeta_{n+k}\right] = 0.$

The absolute value of $\E\left[\cos\left(2\pi B_{n,k}\right)\right] = \sum\limits_{m=1}^{2^k}\cos\left(\frac{2\pi}{m}\right)$ is bounded by $1,$ and $\E\left[ \zeta_{n+k}\sin \zeta_{n+k} \right] = -2\pi,$ so the absolute value of $\E\left[\sin\zeta_{n} \sin\zeta_{n+k}\right]$ is bounded by $\frac C {2^k},$ where $C$ is some constant independent of $n.$

Appendix 2

Let's prove that for the double limit

$$\lim_{N \rightarrow \infty} \lim_{M \rightarrow \infty} \sum\limits_{n=N}^M \sum\limits_{k=0}^{M - n} \frac{2^{-k}}{n(n+k)}$$

the inner limit exists and the outer limit exists and is equal to zero.

The sum $\sum\limits_{k=0}^{M - n} \frac{2^{-k}}{n(n+k)}$ is bounded from above by $I_n = \frac 1 n \sum\limits_{k=0}^{\infty} \frac{2^{-k}}{n+k} = \frac 1 n \Phi\left(\frac 1 2, 1, n\right)$ for every $n,$ where $\Phi\left(z, s, a\right)$ is Lerch transcendent function. Using property 25.14.5 from this list, it is possible to rewrite $I_n$ as $\frac 2 n \int\limits_0^\infty \frac{e^{-nx}}{2-e^{-x}}dx.$ The integrand is bounded from above by $e^{-nx}$ and $I_n$ is bounded from above by $\frac 2 n \int\limits_0^\infty e^{-nx} dx = \frac 2 {n^2}.$

So

$$0 \leq \sum\limits_{n=N}^M \sum\limits_{k=0}^{M - n} \frac{2^{-k}}{n(n+k)} \leq 2 \sum\limits_{n=N}^M \frac {1}{n^2}.$$

The series $\sum\limits_{n=0}^\infty \frac{1}{n^2}$ converges as it could be shown using Maclaurin-Cauchy integral test, so as a consequence of the squeeze theorem the inner limit exists, and the outer limit exists and is equal to zero.


References:

[1] N.Bary: A Treatise on Trigonometric Series, Volume 1 & 2.
[2] A.Zygmund: Trigonometric Series, Volume 1 & 2.
[3] P.Erdos, S.Taylor: On The Set of Points of Convergence of a Lacunary Trigonometric Series and the Equidistribution Properties of Related Sequences.
[4] A.Zygmund: On Lacunary Trigonometric Series.
[5] JP.Kahane: Geza Freud and Lacunary Fourier Series.
[6] JP.Kahane: Lacunary Taylor and Fourier Series.

enter image description here

Discussion:

Following to the remark regarding the convergent of the series { $\sum_{n \geq 1} \sin(\xi \space 2^{n})/n$ } for almost every $\xi$ , Let { $\xi = 2^{m} \space\colon\space m \in \Bbb N$ } and define the function $f(m)$: $$ \boxed{ f(m) = \sum_{n=1}^{\infty} \frac{\sin(2^{m} \space 2^{n})}{n} } \\[8mm] $$ $$ \begin{align} f(m-1)-f(m) & = \sum_{n=1}^{\infty} \left[ \frac{\sin(2^{m-1} \space 2^{n})}{n} - \frac{\sin(2^{m} \space 2^{n})}{n} \right] \\[4mm] & = \small \left[ \frac{\sin(2^{m} 2^{0})}{1} - \frac{\sin(2^{m} 2^{1})}{1} \right] + \left[ \frac{\sin(2^{m} 2^{1})}{2} - \frac{\sin(2^{m} 2^{2})}{2} \right] + \left[ \frac{\sin(2^{m} 2^{2})}{3} - \text{...} \right] + \text{...} \\[4mm] & = \small \frac{\sin(2^{m} 2^{0})}{1} - \left[ \frac{\sin(2^{m} 2^{1})}{1} - \frac{\sin(2^{m} 2^{1})}{2} \right] - \left[ \frac{\sin(2^{m} 2^{2})}{2} - \frac{\sin(2^{m} 2^{2})}{3} \right] - \text{...} \\[4mm] & = \sin(2^{m}) - \sum_{n=1}^{\infty} \frac{\sin(2^{m} \space 2^{n})}{n(n+1)} \\ \end{align} $$ Applying summation by parts: $$ f(0)-f(N) = \small \left[ f(0)-f(1) \right] + \left[ f(1)-f(2) \right] + \text{...} + \left[ f(N-1)-f(N) \right] = \normalsize \sum_{m=1}^{N} \left[ f(m-1)-f(m) \right] \\[6mm] = \sum_{m=1}^{N} \sin(2^m) - \sum_{m=1}^{N} \sum_{n=1}^{\infty} \frac{\sin(2^m \space 2^n)}{n(n+1)} = \sum_{m=1}^{N} \sin(2^m) - \sum_{n=1}^{\infty} \frac{\sum_{m=1}^{N} \sin(2^m 2^n)}{n(n+1)} $$ Which implies: $$ \boxed{ \sum_{n=1}^{N} \sin(2^n) \quad\text{bounded}\quad \iff f(0)-f(N) = \sum_{n=1}^{\infty} \frac{\sin(2^{n}) - \sin(2^{n+N})}{n} \quad\text{convergent}\quad } $$ And the question is equivalent to show $|f(0)-f(N)|$ is convergent. Although it is not so clear how to argue the boundary, at least it is a result of subtracting two series with equally divergent speed and same term limit. As well as, if $f(m)$ converge for a value of $m$, then $f(m)$ converge for all values of $m$ and vies versa (implying the initial remark for the special case $\xi = 2^{m}$).

If $\sum \sin(2^{n})/n$ convergent then $\sum \sin(2^{n+N})/n$ convergent too, and everything is okay. On the other hand, assuming $\sum \sin(2^{n})/n$ divergent then $\sum \sin(2^{n+N})/n$ divergent too. And because of the equally speed and same limit, the subtracting "$\small \underline{\text{is potentially}}$" convergent, resulting in $\sum \sin(2^{n})/n$ convergent (false assumption). The cosine case is the same (with interest). $$ \sum_{n=1}^{N} \cos(2^n) = \mathcal{O}(1) \iff \sum_{n=1}^{\infty} \frac{\cos(2^{n}) - \cos(2^{n+N})}{n} = 2\sum_{n=1}^{\infty} \frac{\cos^{2}(2^{n-1}) - \cos^{2}(2^{n-1+N})}{n} $$


This answer is motivated by several "complaints" in the comments. As for example uttered by user1952009 : It is not easy to evaluate $\sum_{n < N} \sin(2^n)$ numerically, the error accumulate.
The OP's response to this is affirmative. So it may be not such a bad idea to think about carrying out a decent error analysis , instead of trying to immediately answer the question as it is. Why is this problem such a numerical hell ? Rather than issuing an answer, I shall be questioning the question.

Elementary error analysis of a (neat) function $f(x)$ commonly proceeds as follows, with $\Delta f$ the error in $f$ and $\Delta x$ the error in its argument, everything real valued and positive. $$ \Delta f = \left| f'(x)\right| \Delta x $$ Now I know that we are not going to have such a neat function. But anything quick and dirty at this moment, objections / refinements later on please. Consider: $$ f(x) = \sum_{n\geq 1}\frac{\sin(x^n)}{n} $$ Rationale :   if   the series is convergent ,   then   in an incredibly small neighborhood of $\,x=2\,$ it should be convergent as well. Take the derivative (eventually assuming that the sum is finite and taking cautiously the "limit" for $n\to\infty$ afterwards): $$ \left| f'(x=2) \right| = \left| \sum_{n\geq 1}\frac{\sin'(x^n)}{n} \right| = \left| \sum_{n\geq 1} x^{n-1}\cos(x^n) \right| = \left| \frac{1}{2} \sum_{n\geq 1} 2^n\cos(2^n) \right| = \infty $$ Because it cannot be assumed that sufficient many terms $\,\cos(2^n)\,$ are zero. What does the above mean? It means that an infinitesimally small disturbance $\,\Delta x\,$ in $\,x=2\,$ leads to an infinitely large error in $f(x=2)$. Therefore any attempt to obtain a decent numerical approximation of the result will turn out to be merely futile.

I am not a mathematician. But, as a physicist by education, with some background in mumerical analysis, I find the question utterly absurd, as has been motivated with the humble means I have at my disposal. I firmly believe in the consistency of real world mathematics, especially calculus. If the error in an outcome is infinite, then it has no sense to even talk about it.

EDIT. Here is a little Pascal program that shows how the derivatives $f'_n(x=2)$ of the partial sums explode with increasing $n$ (even much wilder than I thought):

program Jack;
procedure main(n : integer); var S,y : double; k : integer; m : Longint; begin y := cos(1); S := 0; m := 1; for k := 1 to n do begin y := 2*sqr(y)-1; m := m*2; S := S + y*m; Writeln(S/2); end; end;
begin main(32); end.
To save computation power and precision, use has been made of the cosine double-angle formula. Let $\,y_n = \cos\left(2^n\right)\,$ then we have the recursion $\,y_{n+1} = 2 y_n^2-1$ , with $\,y_0 = \cos(1)\,$ at the start.

Output:

-4.16146836547142E-0001
-1.72343407827437E+0000
-2.30543421350882E+0000
-9.96671005609590E+0000
 3.38086371200828E+0000
 1.59202950857539E+0001
-2.84250375171363E+0001
-3.35182547883263E+0001
-2.88707602845465E+0002
 2.16817449683092E+0002
 1.18934540854247E+0003
 2.83591818496121E+0003
 4.03523441160830E+0003
-2.75211772559970E+0003
 3.35809567013370E+0003
-2.02949854476886E+0004
-1.75365217799296E+0004
-1.48144097841554E+0005
 1.10291092975492E+0005
 6.05118508207478E+0005
 1.42463158035994E+0006
 1.88943665520724E+0006
-1.89279657909947E+0006
 3.36118140293160E+0006
-2.53285957166623E+0005
-3.06929284003735E+0007
 1.26541774184908E+0007
-9.56811684138041E+0006
-2.63286281183089E+0008
 1.59073869302580E+0008
-9.62791988287838E+0007
-9.62791988287838E+0007
In order to somehow understand the wild and explosive behaviour, notice that: $$ \sum_{k=1}^{n-1} 2^k < 2^n $$ Where the terms $2^k,2^n$ must be multiplied with seemingly random weights: $-1 < w_{k,n} < +1$ . That makes not hard to see why the next iteration can be ruining any convergence obtained so far.

EDIT. Consider the sequence $\, c_n = \cos\left(2^n \phi\right)$. Question is why the sequence is seemingly random. I think the following can be said.  If  there is some structure in the iterands to be revealed ,  then  the only regularity one can think about is : periodicity . Now suppose that the sequence is indeed repeating, starting with some angle $\,0 \le \phi < \pi$ , then without loss of generality : $$ \cos\left(2^n \phi\right) = \cos\left(\phi\right) \quad \Longleftrightarrow \quad 2^n \phi = \begin{cases} k\cdot 2\pi - \phi \\ l\cdot 2\pi + \phi \end{cases} \quad \Longleftrightarrow \\ \phi = \begin{cases} k/(2^n+1)\cdot 2\pi \\ l/(2^n-1)\cdot 2\pi \end{cases} \quad \mbox{with} \quad \begin{cases} k = 0,1,2,\cdots ,2^{n-1} \\ l = 1,2,\cdots , (2^{n-1}-1) \end{cases} $$ The angle we start with is $\,\phi_0 = 1$ , so it can be assumed that for some positive integer $m$ : $\phi = \left|2^m - j\cdot 2\pi\right|\,$ with $\,j\,$ a positive integer such that $\,0 \le \phi < \pi$ . Also assume that $\,k > 0\;$ - thus skipping the trivial solution, then: $$ \left|2^m - j\cdot 2\pi\right| = \begin{cases} k/(2^n+1)\cdot 2\pi \\ l/(2^n-1)\cdot 2\pi \end{cases} \quad \Longleftrightarrow \quad \pi = \begin{cases} 2^{m-1} /\left[j\pm k/(2^n+1)\right] \\ 2^{m-1} /\left[j\pm l/(2^n-1)\right] \end{cases} $$ In any case the conclusion would be that $\,\pi\,$ is a fraction of positive integers, a rational number. But we know that $\,\pi\,$ is irrational. Hence the assumption that $\,c_n\,$ is repetitive leads to a contradiction. This proves that there is NO periodicity in the sequence; it is random in that sense.
But there is more. In the answer by user Alexander Rodin the argument of the sine - hence of our cosine - is defined apart from periodicity $= 2\pi\left\{\frac{2^n}{2\pi}\right\}$ . It is argued that the values of this variable are uniformly distributed. This has been schecked numerically and in an affirmative sense, by making a histogram of the values $0 \le x \le \pi$ as generated by:

x := 1; y := cos(x);
while true do
begin
  y := 2*sqr(y)-1; x := arccos(y);
end;
Output with $45$ bins and $450,000$ samples:

enter image description here

The result can be greatly improved by enlarging the amount of samples.

Too long for a comment. The function is not differentiable, I knew it would come ..
For the sake of simplicity, consider instead the Heaviside step function $\,u(t)$ : $$ u(t) = \begin{cases} 0 & \mbox{for} & t < 0\\ 1 & \mbox{for} & t > 0\end{cases} $$ The derivative of $\,u(t)\,$ is known to be the Dirac delta : $u'(t) = \delta(t)$ , so despite of the fact that the Heaviside is not differentiable for $\,t=0$ , it yet has a derivative there, which is $\,\infty$ . This is a clear signal that there is an error at $\,t=0\,$ that will not disappear with refined analysis.
Mind that I didn't even try to define $\,u(t)\,$ for $\,t=0$ . Because, as a physicist, I find that $\,u(t)$ , rather than well-defined, must be multi-valued at $t=0$ , i.e. it is not even a function there.
Now everybody can see immediately that the error in a multi-valued $\,u(0)\,$ is simply $= 1$ .