What's wrong with this argument that kinetic energy goes as $v$ rather than $v^2$?

In general, neither energies nor energy differences are invariant between frames. But conservation of energy is true in all frames, and we can use that to figure out where the problem is.

To recap, the person in the moving frame spends energy $E_v$ from their muscles to raise the kinetic energy of the object by $E_v$, which is just fine. The person in the original frame agrees the person in the moving frame spends energy $E_v$ from chemical energy in their muscles (everybody agrees on how hard somebody is breaking a sweat) and raises the kinetic energy of the object by $3 E_v$.

The extra $2 E_v$ of energy comes from the fact that the moving person started with a reservoir of kinetic energy: that of their own body, which is moving at speed $v$. This energy is reduced because the person slows down due to Newton's third law; it is "harvested" to be put into the object.

There's no way to avoid putting in this extra energy. If you try to reduce the change in speed by putting them in a large car, the energy comes from the kinetic energy of the car; the argument is just the same. If the car's speed is fixed too, the energy comes from the chemical energy of the gasoline. The very same explanation holds for a rocket, where this is called the Oberth effect. In all cases, there's no contradiction in taking energy to be quadratic in speed.

In case you're not convinced, here's an explicit calculation. We'll make the person's mass infinite for convenience. The loss of kinetic energy of the person is $$\Delta K = \frac{dK}{dp} \Delta p = \frac{p}{M} m v = m v^2 $$ where I used $K = p^2/2M$ for the person's kinetic energy. But this is just $2 E_v$ as stated above.

This question was actually settled experimentally in the $18^{\mathrm{th}}$ century in arguments over what quantity deserved the title "vis viva" (see also, this discussion on StackExchange). Long story short, the argument was settled by experiments conducted by Willem 's Gravesande and elucidated by Émilie du Châtelet. Those experiments showed, basically, that the amount you could deform clay by dropping a heavy ball into it depends on what we now call kinetic energy, $\frac{1}{2}mv^2$, not momentum.

The tricky thing, of course, is that the kinetic energy content of an object is observer dependent, and failing to take that into account is where you're running wrong with statements like "So the total energy used is $2E_v$"; you're adding kinetic energies observed by different people who are in different reference frames, and you can't do that. The Mythbusters actually ran head-long into this trap in a slightly different way in an episode about head on collisions with semi-trucks when one of them said that head on collisions were 4 times more dangerous than hitting a brick wall because there was 4 times the energy involved. They were thinking in terms of one of the drivers who sees double the approach speed, and therefore 4 times the energy. When they went back and tested that statement using colliding pendulums with clay on them, they found that there was only twice the energy in the collision, and that the deformation in the equal head-on collision is the same as the "running into a wall" one.

The reason? It's because to harvest the $4\times$ energy that the truck observers think there is, the collision would have to end with both trucks moving to the right (or left) with the initial speed of the right-bound (left-bound) truck. Because they both end the process in the Earth frame, the Earth frame is the one that assessed the kinetic energy available correctly. This is why in special relativity we focus so much on the "invariant mass" in collisions - that is the part of the energy that is actually available to do stuff during the collision, and all observers agree on it.

Specifically, if you have objects labeled $1$ and $2$ on a collision course, then the energy available to deform/heat the objects (or create new particles in a particle collider) is \begin{align} E_{\mathrm{COM}} &= \sqrt{(E_1-E_2)^2 - \left(\vec{p}_1-\vec{p}_2\right)^2 c^2} \\ & = \sqrt{\left(\sqrt{\vec{p}_1^2c^2 + (m_1c^2)^2 }-\sqrt{\vec{p}_2^2c^2 + (m_2c^2)^2 }\right)^2 - \left(\vec{p}_1-\vec{p}_2\right)^2 c^2}, \end{align} where $E_{\mathrm{COM}}$ is the energy in the center of mass (or momentum) frame.

Of course, that expression includes the mass energy. To make it useful for situations where that doesn't change you have to subtract off $m_1c^2$ and $m_2c^2$ to get the "kinetic" energy. That produces \begin{align} K_{\mathrm{COM}} & = \sqrt{(m_1 c^2)^2 + (m_2 c^2)^2 + 2 \vec{p}_1\cdot\vec{p}_2 c^2 - 2 E_1 E_2 } - m_1 c^2 - m_2 c^2. \end{align} Getting a low speed approximation of $K_{\mathrm{COM}}$ is a long process that has to be handled carefully because the square root function is not analytic near $0$. As the name suggests, though, it is the kinetic energy observed by someone who is in the center of mass frame the entire time, so we can do the derivation in the low speed limit from the start using $$\vec{v}_{\mathrm{COM}} \equiv \frac{m_1\vec{v}_1 + m_2 \vec{v}_2}{m_1 + m_2}$$ producing (using Galilean transformations) $$K_{\mathrm{COM}} = \frac{m_1}{2} \left(\vec{v}_1 - \frac{m_1\vec{v}_1 + m_2 \vec{v}_2}{m_1 + m_2}\right)^2 + \frac{m_2}{2} \left(\vec{v}_2 - \frac{m_1\vec{v}_1 + m_2 \vec{v}_2}{m_1 + m_2}\right)^2. $$ While the formula is considerably more complicated, every observer agrees that when $m_1$ and $m_2$ collide, this is the amount of energy available to "do stuff" because nothing $m_2$ and $m_2$ can do, in isolation, can affect how their center of mass is moving relative to everyone else (without interacting with the rest of the universe or throwing off some mass/energy, $m_3$).

Start with a decent definition of kinetic energy. I'd go for "The KE of a body is the amount of work it can do in coming to rest". Then consider something like a laden sledge of mass $m$ moving at speed $u$ on level ground. Imagine it being brought to rest by someone pulling on a rope attached to it. If you assume the sledge's deceleration to be uniform (to make life easy) then, using $W=Fs$, $F=ma$ and $v^2=u^2+2as$, you should be able to show that the amount of work the sledge does is $\frac{1}{2}m u^2$.