Why is the "correct" proof of the chain rule correct? What is actually happening here?

There are two things wrong with your original proof, and the "EDIT" section of the original post is also wrong.

First problem: To define a function $E$, you have to say how to apply $E$ to an arbitrary number $h$. You haven't done that. Here is a better definition of $E$: $E(0) = 0$, and if $h \ne 0$ then \begin{equation} E(h) = \frac{f(g(a)+h) - f(g(a))}{h} - f'(g(a)). \end{equation} For $h \ne 0$, the formula defining $E(h)$ can be rearranged to read: \begin{equation} (E(h) + f'(g(a))) \times h = f(g(a)+h) - f(g(a)). \end{equation} But notice that this last equation is also true if $h=0$, since both sides are $0$, so the equation is true for all values of $h$. Plugging in $g(x)-g(a)$ for $h$, we get \begin{equation} (E(g(x)-g(a))+f'(g(a))) \times (g(x)-g(a)) = f(g(x))-f(g(a)). \end{equation} This is (almost) the same as your "in any case" equation.

Second problem: In your final calculation, you are mixing up the derivative with the value of the derivative at a particular point. The limit \begin{equation} \lim_{x \to a} \frac{f(g(x))-f(g(a))}{x-a} \end{equation} doesn't give you the derivative, it gives you the value of the derivative at $a$. So the proof should end like this: \begin{align} \left.\frac{d}{dx}f(g(x))\right|_{x=a} &= \lim_{x \to a} \frac{f(g(x))-f(g(a))}{x-a}\\ &= \lim_{x \to a} (E(g(x)-g(a))+f'(g(a))) \times \frac{g(x) - g(a)}{x-a}\\ &= f'(g(a))g'(a). \end{align}

There is a subtle point in the last step that you may be missing. Since $g$ is differentiable at $a$, it is continuous at $a$, so $\lim_{x \to a} (g(x) - g(a)) = g(a)-g(a) = 0$. But why does it follow that $\lim_{x \to a}E(g(x)-g(a)) = E(0) = 0$? The answer is: because $E$ is continuous at $0$. (Look in your calculus book in the section on continuous functions. You will find a theorem that says that if $\lim_{x \to a} f(x) = L$ and $g$ is continuous at $L$, then $\lim_{x \to a} g(f(x)) = g(L)$. That theorem is being used in this step.) So to have a complete proof, you need to verify that $E$ is continuous at $0$. To verify that, check that $\lim_{h \to 0} E(h) = 0 = E(0)$. In this limit, $h$ is approaching $0$ but it is not equal to $0$, so we can use the formula for $E(h)$ when $h \ne 0$: \begin{equation} \lim_{h \to 0} E(h) = \lim_{h \to 0} \left(\frac{f(g(a)+h)-f(g(a))}{h} - f'(g(a))\right) = f'(g(a))-f'(g(a)) = 0. \end{equation}

Finally, the problem with the "EDIT" section of the original post: You seem to think that by defining $E$, we are somehow changing the meaning of the expression \begin{equation} \frac{f(g(x))-f(g(a))}{g(x)-g(a)}. \end{equation} We are not. That expression still means what it meant before, so it is undefined when $g(x) = g(a)$. All we're doing is defining a new function $E$, and it is only formulas involving the letter $E$ whose meaning is affected by that definition. No justification is needed for this--you can define a new function however you want.


You can avoid the "correct" proof this way:

Case 1: $g'(a) \ne 0.$ Here the "fake proof" works! That's simply because $(g(x) - g(a))/(x-a)$ is nonzero for $x$ close to, but not equal to, $a.$ For such $x,$ we have $g(x)\ne g(a),$ and now the fake news is actually news.

Case 2: $g'(a) = 0:$ Because $f'(g(a))$ exists, there exists a constant $c>0$ and a $\delta > 0$ such that

$$\tag 1 |f(y)-f(g(a))|\le c|y-g(a)|\, \text { for } y\in (g(a)-\delta, g(a)+\delta).$$

Now $g$ is continuous at $a,$ so there exists $\gamma > 0$ such that $x\in (a-\gamma, a + \gamma)$ implies $g(x) \in (g(a)-\delta, g(a)+\delta).$ For such $x$ we can use $(1)$ to see

$$|f(g(x))-f((g(a))| \le c |g(x)-g(a)|.$$

Now divide by $|x-a|$ and let $x\to a.$ On the right we get limit $0$ because $g'(a)=0.$ Therefore the limit on the left is $0,$ which is exactly the same as saying $(f\circ g)'(a) = 0.$ That is the desired conclusion in this case.


Here is a "correct" proof:

From the usual definition of the derivative one immediately deduces the following

Lemma. A function $f$ is differentiable at the point $a$ with $f'(a)=A$ iff there is a function $m_{f,a}=:m$, continuous at $a$ with $m(a)=A$, such that for all $x$ one has $$f(x)-f(a)=m(x)(x-a)\ .$$

Under the hypotheses of the chain rule one therefore has $$f\bigl(g(x)\bigr)-f\bigl(g(a)\bigr)=m_{f,g(a)}\bigl(g(x)\bigr)\bigl(g(x)-g(a)\bigr)=m_{f,g(a)}\bigl(g(x)\bigr)m_{g,a}(x)(x-a)\ .$$ Since $g$ is continuous at $a$ the product $x\mapsto m_{f,g(a)}\bigl(g(x)\bigr)m_{g,a}(x)$ is continuous at $a$ as well, and takes the value $f'\bigl(g(a)\bigr)g'(a)$ there. By the reverse direction of the Lemma the chain rule follows.