Analytic continuation of the Dirichlet $\eta(s)$ series to $\Re(s) \gt -1$. Why does this work?

What you did amounts to a certain (not very sophisticated, but useful) type of series acceleration. This gives an analytic continuation because for every $s \in \mathbb{C}$ the function $x \mapsto x^{-s}$ changes slowly. The transformation is slightly obscured, however, because you grouped pairs of consecutive terms in the Dirichlet series, it will be more discernible when we write the series as $$\eta(s) = \sum_{n = 1}^{\infty} \frac{(-1)^{n-1}}{n^s}\,.$$

The transformation is "take the arithmetic mean of two successive partial sums". Now it is clear that if $(x_n)$ is a convergent sequence, then $\bigl(\frac{1}{2}(x_n + x_{n+1})\bigr)$ is also convergent and has the same limit. And of course the mean can converge even though the original didn't.

If we modify this idea to not always take the arithmetic mean of two successive terms but take the arithmetic mean of all terms so far encountered, we get Cesàro convergence (or Cesàro summability if $(x_n)$ is the sequence of partial sums of a series).

But when we have a series of the form $$\sum_{n = 1}^{\infty} (-1)^{n-1}f(n) \tag{$\ast$}$$ where $f \colon [1,\infty) \to \mathbb{C}$ is a slowly changing function (in discussing the idea we may keep the notion fuzzy), it is better to stick with just the mean of two successive partial sums. Such a mean has the form $$\sum_{n = 1}^{k-1} (-1)^{n-1}f(n) + \frac{1}{2}(-1)^{k-1}f(k)$$ and the difference to the next one is $\frac{(-1)^{k-1}}{2}\bigl(f(k) - f(k+1)\bigr)$.

That suggests we define $\Delta f \colon x \mapsto f(x) - f(x+1)$ and write the above mean as $$\frac{f(1)}{2} + \frac{1}{2}\sum_{n = 1}^{k-1} (-1)^{n-1}\Delta f(n)\,.$$ From this it is immediate that convergence of $(\ast)$ implies the convergence of $\sum (-1)^{n-1}\Delta f(n)$, and in that case $$\sum_{n = 1}^{\infty} (-1)^{n-1}f(n) = \frac{f(1)}{2} + \frac{1}{2}\sum_{n = 1}^{\infty} (-1)^{n-1} \Delta f(n)$$ holds. Now if $f$ changes slowly, then $\lvert\Delta f(n)\rvert$ is much smaller than $\lvert f(n)\rvert$, and one may expect that the second series converges faster than the first (typically that is the case). And $\Delta f(n)$ can converge to $0$, so that the second series may converge, even if $f(n)$ doesn't tend to $0$. This happens when the partial sums of $(\ast)$ oscillate in a suitable manner around a value without converging, as is the case for the $\eta$ series when $-1 < \Re s \leqslant 0$. And if $\Delta f$ is slowly changing too, then iterating the transformation likely further improves things. On the other hand, if $f(n)$ converges to $0$ quickly, then $\Delta f(n)$ is almost the same as $f(n)$ for large $n$, so this transformation doesn't help in that case.

Now let's come to the concrete example of the $\eta$-series, where $f_s(x) = x^{-s}$ (keeping in mind that we're interested in various values of $s$ simultaneously, so it's useful to have the parameter in the notation). This is a smooth function, and we thus can write $$\Delta f_s(x) = f_s(x) - f_s(x+1) = \int_0^1 -f_s'(x+t)\,dt = s\int_0^1 \frac{dt}{(x+t)^{s+1}}\,.$$ Hence $\Delta f_s(x) \approx \frac{s}{x}f_s(x)$. Indeed, $\lvert\Delta f_s(x)\rvert$ is much smaller than $\lvert f_s(x)\rvert$ for large $x$. While $f_s(x) \to 0$ only for $\Re s > 0$, we have $\Delta f_s(x) \to 0$ for $\Re s > -1$, and since $\Delta f_s(x)$ is then of bounded variation, the series $$\sum_{n = 1}^{\infty} (-1)^{n-1}\Delta f_s(n)$$ converges (for $\Re s > -1$) by Dedekind's criterion. It does so locally uniformly, hence $$\frac{1}{2} + \frac{1}{2}\sum_{n = 1}^{\infty} (-1)^{n-1}\Delta f_s(n) = \frac{1}{2} + \frac{1}{2}\sum_{n = 1}^{\infty} (-1)^{n-1}\biggl(\frac{1}{n^s} - \frac{1}{(n+1)^s}\biggr)$$ represents $\eta(s)$ on the half-plane $\Re s > -1$. If in this we group successive terms pairwise we obtain $$\eta(s) = \frac{1}{2} + \frac{1}{2}\sum_{n = 1}^{\infty} \bigl(\Delta f_s(2n-1) - \Delta f_s(2n)\bigr) = \frac{1}{2}\biggl(1 + \sum_{n = 1}^{\infty}\biggl(\frac{1}{(2n-1)^s} - \frac{2}{(2n)^s} + \frac{1}{(2n+1)^s}\biggr)\biggr)$$ for $\Re s > -1$, which is your average.

While $\sum (-1)^{n-1}\Delta f_s(n)$ converges absolutely for $\Re s > 0$ and conditionally for $-1 < \Re s \leqslant 0$ [excepting $s = 0$, when it converges absolutely since all terms vanish], your average converges absolutely for $\Re s > -1$ ($\Delta f_s$ also changes slowly, so the grouping causes cancellation).

However, in the form $\sum (-1)^{n-1} \Delta f_s(n)$ we still sum a slowly changing function with alternating signs, so the idea of doing the same thing once more forces itself upon us. This is less obvious in the paired form. Now \begin{align} \Delta^2 f_s(x) &= \Delta(\Delta f_s)(x) \\ &= \Delta f_s(x) - \Delta f_s(x+1) \\ &= s\int_0^1 \frac{1}{(x+t)^{s+1}} - \frac{1}{(x+1+t)^{s+1}}\,dt \\ &= s\int_0^1 (s+1)\int_0^1 \frac{du}{(x+t+u)^{s+2}}\,dt \end{align} and generally $$\Delta^m f_s(x) = \frac{\Gamma(s+m)}{\Gamma(s)} \idotsint_{[0,1]^m} \frac{dt_1\ldots dt_m}{(x + t_1 + \ldots + t_m)^{s+m}}$$ via induction shows that $$\sum_{n = 1}^{\infty} (-1)^{n-1} \Delta^m f_s(n)$$ converges absolutely for $\Re s > 1-m$ and conditionally for $-m < \Re s \leqslant 1-m$. The convergence is locally uniform, hence we obtain the representation $$\eta(s) = \sum_{\mu = 0}^{m-1} \frac{\Delta^{\mu}f_s(1)}{2^{\mu+1}} + \frac{1}{2^m}\sum_{n = 1}^{\infty} (-1)^{n-1}\Delta^m f_s(n)$$ in the half-plane $\Re s > -m$.

And if we keep going and don't stop, we arrive at $$\eta(s) = \sum_{\mu = 0}^{\infty} \frac{\Delta^{\mu} f_s(1)}{2^{\mu+1}}$$ which is the Euler transform of the $\eta$-series and converges (absolutely and locally uniformly) on the entire plane.

With regard to your addition, note hat the average of two divergent series $\sum a_n$ and $\sum b_n$ converges if and only if up to the terms of a convergent series $b_n$ is the negative of $a_n$ (yes, this is tautological, but still), you need cancellation to produce convergence where originally there was none. But as a first approximation the terms of the second series are thrice the corresponding term of the first, so there is reinforcement instead of cancellation. You would get cancellation (and convergence for $\Re s > -1$) if you subtracted the second from thrice the first (to obtain $\eta(s)$, a division by $3+(-1) = 2$ is also needed).