Alternative proof of Taylor's formula by only using the linear approximation property

A few days back I wrote an answer with some detail about Taylor polynomials for maps between Banach spaces. You can see my answer here. What I proved is that (I'm sorry about the differences in notation)

Taylor Expansion Theorem:

Let $V$ and $W$ be Banach spaces over the field $\Bbb{R}$, let $U$ be an open subset of $V$, and fix a point $a \in U$. Let $f:U \to W$ be a given function which is $n$ times differentiable at $a$ (in the Frechet differentiable sense). Define the Taylor polynomial $T_{n,f}:V \to W$ by \begin{equation} T_{n,f}(h) = f(a) + \dfrac{df_a(h)}{1!} + \dfrac{d^2f_a(h)^2}{2!} + \dots + \dfrac{d^nf_a(h)^n}{n!} \end{equation} Then, $f(a+h) - T_{n,f}(h) = o(\lVert h \rVert^n)$.

Explicitly, the claim is that for every $\varepsilon > 0$, there is a $\delta > 0$ such that for all $h \in V$, if $\lVert h \rVert < \delta$, then \begin{equation} \lVert f(a+h) - T_{n,f}(h) \rVert \leq \varepsilon \lVert h \rVert^{n}. \end{equation}

The proof is pretty short if you know what you're doing. The idea is to use induction, and most importantly, the mean-value inequality for maps between Banach spaces. I don't think it is possible to derive $(3)$ directly from $(1)$ and $(2)$ alone, because $(2)$ talks about how the derivative $Df$ changes, while $(1)$ talks about how the function $f$ changes, and ultimately $(3)$ talks about how much $f$ changes. So, you somehow have to relate changes in $Df$ to changes in $f$... this is roughly speaking, what the mean-value inequality does.

The proof I showed in my other answer is pretty much from Henri Cartan's excellent book Differential Calculus. Also, Henri Cartan's book has a proof of the mean-value inequality which doesn't rely on integrals. Alternatively, you can take a look at Loomis and Sternberg's book Advanced Calculus. Here, they prove the mean value inequality in a rather elementary way without integrals, and it's also a relatively short proof. It is proven in Chapter 3, Theorem 7.4 (which uses theorem 7.3); this is on page 148-149 of the book (I prefer this proof to Cartan's proof).

For your other question, I assume you mean the following:

Does the existence of a polynomial $P$, which equals $f$ up to order $2$ at $x$ imply that $f$ is twice differentiable at $x$? Or more precisely, does the existence of a continuous linear map $A_1:E \to F$, and a symmetric continuous bilinear map $A_2:E \times E \to F$ such that for all $h \in E$, \begin{equation} f(x+h) = f(x) + A_1(h) + A_2(h,h) + o(\lVert h \rVert^2) \end{equation} imply that $f$ is twice Frechet differentiable at $x$?

The answer to this question is no. We can see this even in the single variable case (this following example is from Spivak's Calculus, page 413, 3rd edition). Take $E=F=\Bbb{R}$, and define $f: \Bbb{R} \to \Bbb{R}$ by \begin{equation} f(x) = \begin{cases} x^{n+1}& \text{if $x$ irrational} \\ 0 & \text{if $x$ rational} \end{cases} \end{equation} ($n\geq 2$). Then choose $x=0$ and the zero polynomial $P \equiv 0$. It is easy to verify that \begin{equation} f(0 + h) = 0 + o(|h|^n) \end{equation} However, if $a \neq 0$, then $f'(a)$ doesn't exist so $f''(0)$ is not even defined. Hence, what this shows is that the existence of a well-approximating polynomial does not guarantee that the function is sufficiently differentable.