Some questions about determinants

I'll sketch how you can get from the exterior power perspective to the sum over permutations perspective, but the treatment is rather rushed and may be hard to follow or appreciate without proper study of the subject. The key idea at work is that, working concretely with the vector space $\mathbf{R}^n$, the determinant is the unique alternating multilinear map taking $n$ vectors in $\mathbf{R}^n$ to a real number in $\mathbf{R}$ that sends the standard basis $(e_1,\dots,e_n)$ to $1$. I'll explain those terms as we go along. Also, it's very much worth noting that the determinant can be understood as the signed volume of a parallelepiped. (For intuition for the geometric perspective of linear algebra I recommend 3Blue1Brown's video series Essence of Linear Algebra.)

It is straightforward to verify that the tensor product is multilinear — that is, when all inputs but one are fixed, we get a linear map. So $v\otimes(u+w)=v\otimes u+v\otimes w$ and $(cv)\otimes w=c(v\otimes w)$. (In fact, the tensor product is in some sense the 'freest' such product.) Consider $(v+w)\otimes(v+w)$ under the $\wedge$ map. We find $$(v+w)\wedge(v+w)=v\wedge v+v\wedge w+w\wedge v+w\wedge w$$ by multilinearity. By definition of $\wedge$, the terms $v\wedge v$, $w\wedge w$ and $(v+w)\wedge(v+w)$ are all equal to zero, and so we have $$v\wedge w=-w\wedge v$$ in the exterior algebra. In fact, for the same reason, whenever we swap two terms in some wedge of vectors, a negative sign appears. So we can write things like $u\wedge v\wedge w=-u\wedge w\wedge v=w\wedge u\wedge v$. Roughly speaking, a map that has this property of picking up a minus sign every time we swap elements is said to be alternating. Notice that if we have a permutation of $n$ letters $\sigma\in S_n$ and $n$ vectors $v_1$, ..., $v_n$, we have the formula $$v_{\sigma(1)}\wedge\dots\wedge v_{\sigma(n)}=\operatorname{sgn}(\sigma)v_1\wedge\dots\wedge v_n,$$ where $\operatorname{sgn}(\sigma)$ is the sign of the permutation $\sigma$, since we can think of $\operatorname{sgn}(\sigma)$ as $(-1)^{T(\sigma)}$, where $T(\sigma)$ counts the number of transpositions of $\sigma$; that is, the number of swaps made.

A brief digression. You wrote:

It seems trivial but I'm not getting it.

I think this tensor products and exterior powers stuff is not very trivial, and is best appreciated within a larger framework (say, as a tool for differential geometry; see the references I provide at the end), as opposed to being contrived into a short two-paragraph definition that cuts through the theory to produce a definition of the determinant that I find rather unenlightening to people who do not already know the theory of tensor products and exterior powers. Okay, back to math.

Now if $V$ is an $n$-dimensional vector space, then the $n$-th exterior power of $V$ is denoted $\Lambda^nV$ and is by definition a vector space consisting of elements that are linear combinations of objects like $v_1\wedge\dots\wedge v_n$. If $\{e_1,\dots,e_n\}$ is a basis for $V$, then we may expand vectors like $v_j$ in the basis, so that $v_j=\sum_{1\le i\le n}v_{i,j}e_i$. By multilinearity we can now compute \begin{align*} v_1\wedge\dots\wedge v_n &=\left(\sum_{1\le i_1\le n}v_{i_1,1}e_{i_1}\right) \wedge\dots\wedge \left(\sum_{1\le i_n\le n}v_{i_n,n}e_{i_n}\right)\\ &=\sum_{1\le i_1\le n}\dots\sum_{1\le i_n\le n}v_{i_1,1}\dots v_{i_n,n} e_{i_1}\wedge\dots\wedge e_{i_n}. \end{align*} This is quite a big sum, with $n^n$ terms. But remember that the wedge product $\wedge$ is defined so that, whenever we have a repeated vector, we must get zero. (So things like $v\wedge w\wedge v=0$ since $v$ is repeated.) And so most of the terms $e_{i_1}\wedge\dots\wedge e_{i_n}$ vanish, except when $i_1$, ..., $i_n$ is a permutation of $1$, ..., $n$. So we have \begin{align*} &\sum_{1\le i_1\le n}\dots\sum_{1\le i_n\le n}v_{i_1,1}\dots v_{i_n,n} e_{i_1}\wedge\dots\wedge e_{i_n}\\ &\quad=\sum_{\sigma\in S_n}v_{\sigma(1),1}\dots v_{\sigma(n),n}e_{\sigma(1)}\wedge\dots\wedge e_{\sigma(n)}\\ &=\left(\sum_{\sigma\in S_n}\operatorname{sgn}(\sigma)v_{\sigma(1),1}\dots v_{\sigma(n),n}\right)e_1\wedge\dots\wedge e_n. \end{align*} Interesting. This large computation revealed to us that every element of $\Lambda^nV$ is a constant multiple of $e_1\wedge\dots\wedge e_n$; that is, $\dim(\Lambda^nV)=1$. And so if we have a linear map $T\colon V\to V$, and if $$\Lambda^nT:\Lambda^nV\to\Lambda^nV$$ is defined by sending $v_1\wedge\dots\wedge v_n$ to $T(v_1)\wedge\dots\wedge T(v_n)$, then it is just multiplication by a scalar. We call that scalar the determinant of $T$; that is, $T(e_1)\wedge\dots\wedge T(e_n)=\det(T)e_1\wedge\dots\wedge e_n$, by definition. But as we computed earlier, we already know what that scalar is:

$$\det(T)=\sum_{\sigma\in S_n}\operatorname{sgn}(\sigma)T_{\sigma(1),1}\dots T_{\sigma(n),n}$$

(The fact that we get $T_{\sigma(j),j}$ instead of $T_{j,\sigma(j)}$ is inconsequential — we may multiply by $\sigma^{-1}$ and everything works out since $S_n$ is a group. This also uses the fact that a matrix can be thought of as representing some linear map with respect to chosen bases, and understanding the interplay between the abstract linear maps between vector spaces perspective and the concrete matrices-as-arrays-of-numbers perspective is very useful. It is important to know that matrices represent linear maps, though there are also many other useful interpretations of matrices, fit for different occasions, like as the adjacency matrix of a graph, or as encoding weights in a weighted directed graph, in which case the determinant can be understood combinatorially as counting non-intersecting systems of paths via the Lindström–Gessel–Viennot Lemma! But I am digressing...)

One last thing. We did all those computations in terms of this abstract construction of a quotient of tensor spaces and whatnot, but these ideas can be dealt with more concretely, thinking about areas and volumes of parallelograms and parallelepipeds in 2D and 3D space. Let's return to the idea of an alternating multilinear map $$\det\colon\underbrace{\mathbf{R}^n\times\dots\times\mathbf{R}^n}_{\hbox{$n$ copies}}\to\mathbf{R}$$ that sends $(e_1,\dots,e_n)$ to $1$. You'll find that the requirements of alternation and multilinearity are rather natural requirements for any map that takes $n$ vectors in $n$-dimensional space and returns the hypervolume of a parallelotope. (For example, in $\mathbf{R}^2$, this is a map that takes two vectors $v$ and $w$, and returns the area of the parallelogram that they span. If you stretch one of the vectors by a factor of $c$, you would expect the area of the parallelogram they span to increase by that same factor; so $\det(cv,w)=c\det(v,w)$. If a vector is repeated, like in $\det(v,v)$, then we just get a line with no area and so $\det(v,v)=0$. Similarly, you would expect the area of the unit square to be $1$; that is, $\det(e_1,e_2)=1$. And so all the relevant axioms are motivated geometrically and concretely.) From here we may then repeat what is really the same computation as above, just dressed up differently: \begin{align*} \det(v_1,\dots, v_n) &=\det\left(\sum_{1\le i_1\le n}v_{i_1,1}e_{i_1},\dots, \sum_{1\le i_n\le n}v_{i_n,n}e_{i_n}\right)\\ &=\sum_{1\le i_1\le n}\dots\sum_{1\le i_n\le n}v_{i_1,1}\dots v_{i_n,n} \det(e_{i_1},\dots, e_{i_n})\\ &=\dots \end{align*} And so the whole abstract wedge product business is really just the essence of a determinant, abstracted out into its own algebraic thing.

Just for the fun of it, and to end off, let's work out the computation of the determinant in the plane. Let $e_1=\begin{pmatrix}1\\0\end{pmatrix}$ and $e_2=\begin{pmatrix}0\\1\end{pmatrix}$. Then \begin{align*} \det\left(\begin{pmatrix}a&b\\c&d\end{pmatrix}\right) &=\det\left(\begin{pmatrix}a\\c\end{pmatrix}, \begin{pmatrix}b\\d\end{pmatrix}\right)\\ &=\det(ae_1+ce_2,be_1+de_2)\\ &=\det(ae_1,be_1)+\det(ae_1,de_2)+\det(ce_2,be_1)+\det(ce_2,de_2)\\ &=ab\det(e_1,e_1)+ad\det(e_1,e_2)+bc\det(e_2,e_1)+cd\det(e_2,e_2)\\ &=ad\det(e_1,e_2)+bc\det(e_2,e_1)\\ &=ad-bc, \end{align*} just as we learned in high school. (The fact that $\det(e_2,e_1)=-1$ can be interpreted as the idea that the square going from $e_2$ to $e_1$ has orientation opposite to that of the unit square which goes from $e_1$ to $e_2$. But that's a story for another day.)

In conclusion, I think an ideal place to learn these ideas would be a text on manifolds, since the tools of tensors and exterior powers are part of the underlying machinery for geometry. See John Lee's Introduction to Smooth Manifolds or Loring Tu's An Introduction to Manifolds. See also Keith Conrad's notes on Tensor Products and Exterior Powers.