Motivation for vector spaces

(Disclaimer: This is not at all a historical account - it is a path of intuition that leads to vector spaces, not the only path)

Vector spaces! (a.k.a. stuff you can add and scale, but not multiply)

Vector spaces are intentionally capturing the notion of quantities that can be sensibly added together or scaled, but not necessarily anything else. This situation arises naturally in various phenomenon - I'll discuss two simple examples: durations and displacements. Then, we can discuss why a weaker set of axioms, extracted only from the goal of having addition and scaling, is a useful thing - especially when this is a suitable model of natural objects we care about. Finally, we get to $\mathbb R^n$ (and its rules for addition and multiplication) which arises through the use of coordinates on (finite dimensional) vector spaces.


Example 1: The space of durations

For durations, we can clearly have some idea that it ought to be possible to take two lengths of time and sum them together - certainly expressions such as $$1\text{ minute} + 17\text{ seconds}$$ $$1\text{ hour} + 34\text{ minutes}$$ make sense, where we can just imagine placing such intervals of time end-to-end and will naturally run into such things. It also makes sense to scale a duration - I can say that an hour is sixty times as long as a minute, and generally multiplications such as $$2\cdot( 51\text{ seconds})$$ $$\frac{1}7\cdot (1 \text{ hour})$$ clearly refer to meaningful durations - where I can take a duration and a positive scaling factor and get a new duration. What doesn't make sense*, however, is to try to multiply two durations: $$(1\text{ hour})\cdot (1\text{ hour}) =\,\,???$$ $$(60\text{ minutes})\cdot (60\text{ minutes}) =\,\,???$$ Sure, you could come up with arbitrary definitions for multiplication of durations, but it wouldn't really mean anything tangible. We could get around this sort of issue if we all agreed "duration will always be represented as a real number their length in minutes," but this would be a completely arbitrary choice and wouldn't tell us anything new about durations. If we want to stick to the natural phenomenon, we're left with only a couple operations on durations: we can add and we can scale. We might envision a duration as something that lives on a timeline - just being some segment of that line that we can lay end to end to add or dilate to scale - but that's about all you can do with line segments without adding any extra information.

You might generalize a bit to add some sign information to this - maybe we want to discuss how 5:00 PM is one hour before 6:00 PM, but 7:00 PM is one hour after. Okay, fine, we can revise our model to imagine that durations are more like arrows between points on a timeline - they have a length and a direction. This is much like how one gets from natural numbers to integers.

At this point all we really know is that we can add and subtract signed durations, as well as scale them by real factors. There's our first example of a vector space, albeit a one-dimensional one.


Example 2: The space of displacements

Displacements (changes in position) are a more canonical example of the same phenomenon - and are absolutely fundamental in any pursuit of physics. For the purposes of this post, let's imagine** that Earth is flat to avoid clever people poking holes in things. A displacement is just a relationship between points - for instance, I could say "this road is 5 miles north of that road" or "that road is 5 miles south of this road" or "that satellite is 254 miles above me." I could geometrically imagine them as just arrows in space, but where I'm not interested in where the arrow starts and ends, but only in the size and direction of the arrow itself, so that "5 miles north" is a displacement regardless of where I base it.

It should be clear that we can add these things together - certainly, "move 10 feet North, then 50 kilometers downwards" is a sensible way to describe the spatial relationship of two points, and can visually be thought of laying two arrows end to end - and this is needed to say, for instance, "my net change in position of two days is the sum of the change in the first day with the change in the second day," or to, more sophisticatedly, develop calculus to deal with the motion of objects through space and say "the net change in position is the integral of velocity (rates of displacement) over time."

Similarly, it makes sense to scale these arrows - saying "it's thrice three furlongs east" is sensible as is "it's half an angstrom above." We can also negate displacements by reversing their direction and even define scaling by negative factors to be scaling by a positive, then reversing direction.

Again, multiplication does not make sense: what is "two inches east times five meters north"? In fact, it makes even less sense than with durations, because now we have the issue of direction - where exactly is east times north? How's it relate to up times up? That's all utter nonsense!

All we really know is that we can add and subtract displacements, as well as scale them by real factors. Hm, if only we had a name for stuff that we can add and scale...


Vector spaces, abstractly

A vector space is meant to model this kind of behavior where it's possible to add and scale things - and we can sort of work out what axioms we expect to hold. There's certainly plenty of work there, but once we've decided that we want to talk about addition and scaling as two operations, the vector space axioms are all fairly reasonable to assume, even if their necessity may not immediately be apparent. It's not so objectionable to ask that there be a zero vector, or that addition be associative and commutative, or that additive inverses exist, nor should statements like "scaling by $a$ then $b$ is the same as scaling by $ab$" or "scaling $x+y$ by $a$ should give $a\cdot x + a\cdot y$" or "scaling $x$ by $a+b$ should give $a\cdot x + b\cdot x$" or "scaling by $1$ shouldn't do anything" - and that's literally all that the vector space axioms say.

The trick of this sort of axiomatization is this: a lot of things satisfy these axioms, be it durations, displacements, weights, velocities, or accelerations - or even more exotic things such as waves or abstract ones such as $\mathbb R^n$ or "the space of solutions to a recurrence relation" or "the space of continuous functions $\mathbb R\rightarrow\mathbb R$." This is to say: the axioms aren't too strong to exclude having lots of examples. However, the axioms are strong enough to prove useful facts - such as bringing ideas of matrices and bases and dimension to bear on problems in any of these instances - and we know we've done something right when we find a set of axioms that gives us abstract theorems that apply to loads of situations that we cared about before we had the axioms.

We would lose the broad applicability of the theory if we started insisting upon multiplication or anything else - yeah, there are things such as algebras, which are vector spaces with a multiplication rule, or inner product spaces or Banach spaces - and sometimes you do need this extra structure - but not everything is like that, so we can get more general results by not including anything we don't need. This is a bit different from the situation with complex numbers where it sometimes doesn't hurt to have extra stuff in your space - we're defining a class of things rather than a single thing, and we actually benefit from not requiring anything we're not going to use.


Coordinates (a.k.a. why $\mathbb R^n$ is so ubiquitous)

Let me finish this discussion by bringing coordinates into play. The final piece of the puzzle is that vector spaces are defined (up to isomorphism) by their dimension - which is to say, they can all be given coordinates.

In the two explicit examples, it should be fairly clear that a signed duration is just any expression of the form $x\text{ years}$ for any real $x$. So, durations can be represented by a single real number, and adding them together adds this number, whereas scaling them multiplies by that real number. This isn't inherent to the space of durations - you can choose any non-zero duration to base your coordinate system on and then say every other is some factor times that one - but it is possible.

Similarly, any displacement can be written uniquely as $$x\text{ meters north} + y\text{ meters east} + z\text{ meters up}$$ where we have three coordinates - and the sum of $$x_1\text{ meters north} + y_1\text{ meters east} + z_1\text{ meters up}$$ $$x_2\text{ meters north} + y_2\text{ meters east} + z_2\text{ meters up}$$ just turns out to be $$(x_1+x_2)\text{ meters north} + (y_1+y_2)\text{ meters east} + (z_1+z_2)\text{ meters up}$$ and scaling by a factor $c$ would give $$(cx)\text{ meters north} + (cy)\text{ meters east} + (cz)\text{ meters up}.$$ These are facts with real geometrical significance - they give us a way to talk about displacements mathematically. We might try to abbreviate these notations by just writing a tuple of $(x,y,z)$ instead to start being able to tersely write: $$(x_1,y_1,z_1)+(x_2,y_2,z_2)=(x_1+x_2,y_1+y_2,z_1+z_2)$$ $$c\cdot (x_1,y_1,z_1)=(cx_1,cy_1,cz_1)$$ and then, oh look, we just invented the vector space $\mathbb R^3$ consisting of tuples of three real numbers, with addition and scaling defined per-component - and this turns out to be a perfectly good representation of displacements. Of course, we might immediately start to generalize about other things we care about - a velocity can be written as $$x\text{ meters/second north} + y\text{ meters/second east} + z\text{ meters/second up}$$ and it's going to have the same rules for addition and multiplication of the tuples $(x,y,z)$ as before - suggesting that $\mathbb R^3$ can be used to represent these too.

Again, these notions aren't intrinsic to the space - I could just as well say any displacement can be written as $$x'\text{ lightyears northeast} + y'\text{ nanometers west} + z'\text{ smoots up-south}$$ where a "smoot" is any unit of distance. Sure, each displacement would have to be written using a different tuple $(x',y',z')$ than it was in the previous form - and we might find this form less intuitive - but it's equally correct, and gives another way we could represent the space using $\mathbb R^3$.

This process is essentially just picking a basis for our vector space - as long as we all agree which displacement is represented by $(x,y,z)$, we can happily use that representation pretty much everywhere. Even when our end goal isn't to talk about $\mathbb R^3$ (and it hardly ever is), the fact is that a lot of spaces we do care about can easily be manipulated through coordinates, which makes the space of tuples $\mathbb R^3$ rather fundamental - and, of course, it should bring us no obstacle to realize that we can have $\mathbb R^n$ for any $n$ that we like, with similar rules for addition and multiplication.

(*Okay, okay, you could get $1\text{ hour}^2$, whatever that means, and I'm sure physicists sometimes are happy to do this sort of thing - but clearly that's not a duration - it's transformed into something else)

(**Or, if you already believe this, don't stop believing it)


Here is one possible thought process.

It seems natural to attempt to generalize calculus to functions that take a list of numbers as input and return a list of numbers as output. This leads us to introduce $\mathbb R^n$. Since the fundamental strategy of calculus is to approximate a nonlinear function locally by a linear function, we must define what it means for a function to be "linear" in this new setting. This leads us to introduce the ideas of vector addition, scalar multiplication, and linear transformations (and matrices that describe linear transformations).

Now having become familiar with $\mathbb R^n$, we might want to extend calculus to handle functions that take as input, say, a matrix, or a symmetric matrix. And we might notice that the same fundamental operations (addition and scalar multiplication) also make sense when considering, say, the set of all $m \times n$ matrices, or the set of all $n \times n$ symmetric matrices. This motivates us to generalize the linear algebra ideas that we developed in $\mathbb R^n$ to more general sets where addition and scalar multiplication make sense, leading to the idea of a vector space.

A fantastic motivating example for introducing vector spaces is the set of all possible solutions to an $n$th-order homogeneous linear ordinary differential equation. That is a wonderful vector space. The set of all solutions to the ODE $y'(t) = A y(t)$ is another great example of a vector space. What is its dimension? What is a basis for it? Arguably, the idea of a vector space was introduced first in order to understand these spaces of solutions to linear differential equations. (I don't know the true history though.)

When understanding the effect of applying a linear transformation $T$ to a vector $v \in \mathbb R^n$, a reasonable strategy is to first seek out special vectors $v$ which have the nice property that $T(v)$ is just a scalar multiple of $v$. While $T$ might seem complicated, it is at least simple to understand what $T$ does to $v$. And then if some other vector $w$ can be written as a linear combination of these special "eigenvectors", then perhaps understanding what $T$ does to $w$ is not so hard either. But, when seeking eigenvectors for $T$, we often find that the entries of $w$ need to be complex. This motivates us to introduce $\mathbb C^n$ and to do linear algebra in $\mathbb C^n$.

Regarding multiplication, it seems that there is not a natural way to multiply two vectors in $\mathbb R^n$, resulting in a new vector in $\mathbb R^n$. So such a concept was not introduced despite efforts along those lines. (You can read the history of how mathematicians introduced quaternions, but had to give up commutativity of multiplication when doing so; and then octonions were introduced, sacrificing even associativity.)