Is this explanation of normal subgroups and quotient groups correct?

It seems like you have understood the story pretty well.

To address your specific questions:

  1. Yes, seems good. I agree that you could do some work polishing this to make everything fully rigorous, including making some decisions about what you want to be a definition and what you want to be a theorem.
  2. I think this is pretty close to how most textbooks present the quotient groups and the first isomorphism theorem.
  3. Non-surjective homomorphisms are surjective onto their image. The first isomorphism theorem says that if $\phi: A \to B$ is a group homomorphism, then $\textrm{Im}(\phi) \cong A/\ker(\phi)$. So $\phi$ is picking out a subgroup of $B$ which is isomorphic to the quotient group $A/\ker(\phi)$.

I know you've gotten one satisfactory answer, but let me weigh in here a bit.

I'll first mention that what you present is generally correct, and a valid way to approach this. The idea of "breaking down a (finite) group into smaller pieces" is in fact behind the idea of classifying finite simple groups (groups that cannot be broken down), together with the theory of group extensions (trying to understand what a group $G$ "is" if you have a normal subgroup $N\triangleleft G$, and you understand both $N$ and $G/N$).

But let me offer you a different perspective and a different way into the isomorphism theorems...

After you learn about groups, and subgroups and Lagrange's Theorem, maybe Cauchy's Theorem, we come to a crossroads in how to try to better understand a given group.

One way to try to learn things about a given group $G$ is to just stare at it until you notice some interesting things about $G$. However, generally speaking, a much more fruitful approach in algebra is to take a less static approach and to consider two things: what the group $G$ "can do", and how it interacts with other groups.

What a group "can do" is in fact historically how groups were originally understood. The original notion of a group was a "group of permutations": a collection of operations acting on a set in specific ways. Even as late as the turn of the 20th Century, Burnside's book on groups still defines a group as a collection of "operators" acting on "some objects". It was Cayley who introduced the abstract definition of a group as a "set with a binary associative operations satisfying certain conditions", and then immediately went on to prove that this did not change the objects of study, as any "group of permutations" was a group under his new proposed definition, and any object that satisfied this new proposed definition could be understood as a "group of permutations". This is the notion behind Cayley's Theorem, and why it is, in my opinion, more important historically than practically today. But this already introduces the notion of functions: what does "can be understood as a group of permutations" mean? It means you can biject it with such a group in a way that respects the operation.

This also leads us to functions. To justify why we want to think about functions, let's consider two areas where functions play a major role: the real numbers/calculus, and linear algebra.

The key property of the real numbers was that they were "continuous": they have no 'holes'. Rather than just stare at real numbers and see if we can say interesting things about them, it turns out to be much more fruitful and interesting to consider functions from $\mathbb{R}$ to itself that respect this "continuity". And so we get the notion of continuous functions, and the study of continuous functions, as a way to shed light on the nature of the real numbers themselves.

Similarly, with Linear Algebra, staring at vector spaces only takes you so far; the real power of vector spaces only emerges when you start considering linear transformations.

In both cases, you don't just want any old function; you want functions that "preserve" whatever it is that makes your objects interesting. For real numbers, continuity; for vector spaces, the addition and scalar product.

So with groups. A group is characterized by three things (bear with me): a binary operation $G\times G\to G$, that assigns to any pair of elements $g_1,g_2\in G$ their "product" $g_1g_2$. A distinguished element $e_G\in G$ with the property that $ge_G=e_Gg=g$ for all $g\in G$. And a function $G\to G$ that assigns to every element $g\in G$ its "inverse", $g^{-1}$, which has the property that $gg^{-1}=g^{-1}g=e_G$.

So if we have two groups $G$ and $H$, then a "function that preserves this structure" would be a function $f\colon G\to H$, such that

  1. Respects products: if $g_1,g_2\in G$, then $f(g_1g_2) = f(g_1)f(g_2)$.
  2. Respects the identity: $f(e_G) = e_H$.
  3. Respects inverses: if $g\in G$, then $f(g^{-1}) = (f(g))^{-1}$.

It turns out that two of these conditions are superfluous, but that is how we want to start. As you know, if 1 holds for a function between groups, then 2 and 3 will automatically hold as well. One can then define a group homomorphism as simply a function that satisfies 1 and prove 2 and 3; I prefer to define it as a function that satisfies 1, 2, and 3, and then prove that if it satisfies 1, then it must satisfy 2 and 3. The reason I prefer is that I think it makes the definition more natural.

Okay, so these are the functions that will play the role of "linear transformations" and "continuous functions". We call them, as I mentioned above, "group homomorphisms." They also are the type of functions needed in Cayley's argument that any group "can be seen" as a group of permutations, because that corresponds to a one-to-one function $f\colon G\to S_X$ (for some set $X$), that satisfies 1, 2, and 3. So that $f(G)$ is "essentially the same" (as far as the group structure is concerned) as $G$, but now it consists of permutations on a set $X$.

Now, given a function $f\colon G\to H$ (in fact, any function between two sets $X$ and $Y$), there is a natural equivalence relation that we can define on $G$. Let us say that two elements $x,y\in G$ are "$f$-equivalent", $x\sim_f y$, if and only if $f(x)=f(y)$. This is easily verified to be an equivalence relation, and so it partitions $G$ into equivalence classes.

But because $f$ is a group homomorphism, we have the following consequences: if $x\sim_f y$ and $z\sim_f w$, then $xz\sim_f yw$, and $x^{-1}\sim_f y^{-1}$. So we can make the set of equivalence classes, $G/\sim_f$ into a group! Let $[x]_f$ be the equivalence class of $x$. Then we can define $[x]_f*[y]_f = [xy]_f$, $e_{G/\sim_f} = [e_G]_f$, and $([x]_f)^{-1} = [x^{-1}]_f$. It is then an easy exercise to show that this is indeed a group.

What relation does this group have with $G$ and with $f$? Well, that's the first isomorphism theorem: there is a bijective group homomorphism between the group $G/\sim_f$ and the image group $f(G)$, given by sending $[x]_f$ to $f(x)$.

How is this related to normal subgroups? Ah, well, these equivalence classes have an interesting property: because of the property that $x\sim_f y$ implies $x^{-1}\sim_f y^{-1}$, and if $x\sim_f y$ and $z\sim_f w$ then $xz\sim_f yw$, we have $$x\sim_f y \iff xy^{-1}\sim_f e_G.$$ That is: we can completely determine the equivalence relation by just knowing $[e_G]_f$. Moreover, this collection is a subgroup of $G$!

Will any subgroup work? No, it turns out it doesn't. If $N$ is a subgroup and we try to define an equivalence relation $x\sim y$ if and only if $xy^{-1}\in N$, we do get an equivalence relation, but we do not get an equivalence relation that lets you define a group structure on $G/\sim$. The condition that lets you do that is precisely that $N$ must be a normal subgroup. I go into much more detail about this in this answer.

So, "good" equivalence relations, those coming from functions (they are called "congruences"), correspond to normal subgroups. In fact, as with Cayley's Theorem before (which gave a separate definition of "group" and then showed it was really the same as the old one), so it is with "good" equivalence relations:

Theorem. A subgroup $N$ of a group $G$ is normal in $G$ if and only if there exists a group $H$ and a homomorphism $f\colon G\to H$ such that $N=[e_G]_f$.

This then leads to the usual First Isomorphism Theorem, which says: this construction of taking quotients of a group is "essentially the same" as looking at the image of $G$ under a group homomorphism, in that given any homomorphism $f\colon G\to H$, if $N=[e_G]_f$, then $G/N = G/\sim_f$ is "essentially the same" as $f(G)$: there is a bijective group homomorphism between them.

The Third Isomorphism Theorem corresponds to compositions of morphisms: if $f\colon G\to H$ and $g\colon H\to K$, then $f(G)/\sim_g$ is essentially the same as $g\circ f(G)/\sim_{g\circ f}$. That is, if $N\triangleleft G$, $K\triangleleft G$, $N\subseteq G$, then $K/N \triangleleft G/N$ and $(G/N)/(K/N)\cong G/K$.

The Fourth, or lattice, Isomorphism Theorem establishes a correspondence between the subgroups of $f(G)$ and the subgroups of $G$ that contain $[e]_f$. One then asks... okay, and what about other subgroups of $G$? That's what the Second Isomorphism Theorem gives you: if $f\colon G\to H$ is a homomorphism, and $K$ is an arbitrary subgroup of $G$, then $f(K)$ corresponds to $K/(K\cap N)$ (where $N=[e]_f$). And this image is "the same" as the image of $KN$. That is, $$\frac{K}{K\cap N} \cong \frac{KN}{N}.$$

So in summary:

  1. First Isomorphism Theorem tells you that images of groups correspond to quotients and vice-versa.

  2. Third Isomorphism Theorem tells you that this correspondence plays well with composition.

  3. Fourth Isomorphism Theorem tells you that there is a very nice correspondence between the subgroups of $f(G)$ and the subgroup of $G$ that contain $[e]_f$.

  4. And the Second Isomorphism Theorem tells you how the rest of the subgroups of $G$ behave under the homomorphism $f$.

Thus, the importance of normal subgroups corresponds simply to the importance of homomorphisms. Images of a group are like "shadows" of the group, and so will hopefully sometimes be easier to understand. Simple groups are the ones that we cannot simplify this way: we'll just have to stare at them intently until we understand them. And if we can understand simple groups, and we can understand how to put groups together (group extensions) from $N$ and $G/N$, then perhaps we can leverage our (hypothetical) understanding of simple groups into a (even more hypothetical) understanding of all groups. Turns out this is too naïve a hope, unfortunately, but perhaps it can help justify why we care about morphisms, normal subgroups, quotients, etc.