Why do sheaves embed in presheaves?

The universal properties you describe (by the way, you need to either restrict to the case that $C$ is essentially small or restrict to what are called small presheaves) tell you what cocontinuous functors out of sheaves or presheaves look like. But the embedding $\text{Sh}(C) \to \text{Psh}(C)$ of sheaves into presheaves isn't cocontinuous so the universal property isn't applicable here.

To my mind, the general point here is that if $F : C \to D$ is a functor and $D$ is a cocomplete category then the induced functor $\text{Psh}(C) \to D$ is always the left adjoint of a functor $D \to \text{Psh}(C)$, namely the restricted Yoneda embedding

$$D \ni d \mapsto \text{Hom}(F(-), d) \in \text{Psh}(C).$$

Taking $D = \text{Sh}(C)$ and $F : C \to \text{Sh}(C)$ the Yoneda embedding we get the right adjoint of sheafification, which is the inclusion of sheaves into presheaves. I suppose there's an additional question of why this adjoint is fully faithful.


One small additional remark to Qiaochu Yuan response and David Roberts comment to show that it is really the existence of an adjoint that is the important point here. (and that was really too long for a comment)

We can look at a more general situation, where the existence of an adjoint might fail: If $C$ is a non-small category, the "free co-completion of $C$" still exists: it is the category of presheaf over $C$ that are small colimits of representable (the presheaf of small co-finality). I will call them the "small presheaves" they always form a locally small category.

If I have a topology on $C$ then I can also try to construct a category of "small sheaves" that has the universal property you mentioned: I take the category of sheaves over $C$ which are the sheafification of small presheaves.

Depending on the topologies, several things might go wrong: the sheafification can be undefined, or one can get a non locally small category.

But for a well chosen topology it can happen that sheaficiation of small presheaves exists and that you get a locally small category this way. In this case this would be the solution to your universal problem.

In this situation the category of "small sheaves" that you constructed in still included in the category of all presheaves but is not necessarily included in the free co-completion. So you do have a sheafication functor from from the "free co-completion" to the "free co-completion with relation" as expected, but you no longer get this surprising functor in the other direction.

Let me give an explicit example where all of this happens:

Take $C$ to be the category of all ordinals (with morphisms given by the order relation) with the atomic topology (every non-empty sieve is a covering, in this case this is the same as saying that every map is a covering).

Then one can check the following:

1) A presheaf is an ordinal indexed collection of set with transition map $F(x) \leftarrow F(y)$ for $x<y$.

2) A presheaf on this category is small (a small colimit of representable) if and only it is empty after a certain rank.

3) A sheaf for this topology is a constant pre-sheaf. So no sheaf is a small presheaf.

4) The sheafification functor takes a presheaf $F$ to the constant presheaf whose value is the union of all the $F(a)$ quotiented by the action of the transition maps. It is well defined at least on the category of "small presheaf". and all sheaves are sheafification of a small presheaves.

5) Moreover, as expected, the category of small presheaves is universal among co-complete categories for functors $C \rightarrow E$ while the category of "small sheaves" is universal for functor $C \rightarrow E$ which send all maps to isos.