Is there a formal proof for the superposition theorem?

There are a few key bits of physics and math to understand here, but one does need to be very careful to avoid producing a circular argument. The key, concept I think is that of a

  • linear circuit element, for which the output is precisely proportional to the input.

The precise meaning of 'input' and 'output' depend on the precise device, but this does not matter so much. For a linear capacitor, if you double the charge on the plates, you double the potential difference across them. For a linear resistor, the potential drop between the terminals is proportional to the current through it. For a linear inductor, it is proportional to the rate of change of current.

It is important to note that not all circuit elements are linear. A diode, for example, will respond differently if you change its polarity. A light bulb will increase its resistance as the current increases. Iron-core inductors show hysteresis, so their inductance is different depending on whether their magnetization is increasing or decreasing. In general, most circuit elements will show some nonlinearity if you drive them hard enough (even if "hard enough" is "so hard that you fry it", which is also nonlinear behaviour).

The restriction to linear circuits, then, is part definition and part physics. You are explicitly ruling out, for your circuit, those elements which behave nonlinearly. If I hand you a circuit board with a complicated circuit printed on it, and you want to decide whether it is linear or not, you need to take it apart and measure the response curves of all its components. Are they all linear? Great! your circuit is linear.

Thus, when you start off your proof with "let $C$ be a linear circuit ...", you are assuming that this step (which is where most of the physics is) has already been completed. This is therefore a safe assumption to use in a proof, and it comes at the price of restricting the validity of the result to only those circuits that have been empirically checked to be linear.

This is essentially all you need. You know the (linear!) equations which connect the (charge / current / rate of change of current) in each element with the potential difference across it, and you can use Kirchhoff's laws (which embody charge conservation at each node and energy conservation along each loop, and therefore always hold) to link them up. This will naturally result in a linear system between your sources and your output. This linear system has the mathematical property that the final solution is the sum of what you'd get if each of the sources were turned on by itself in turn, which is what the superposition theorem, as stated in Wikipedia, states. If your system has the physical property corresponding to the mathematical linearity assumption, it will also have the physical property that corresponds to the mathematical result.