On the smooth structure of $\mathbb{R}P^n$ in Milnor's book on characteristic classes.

You already got 1) in the comments, so we know that $i$ is injective. Now to get the idea: if we already knew that $\mathbb{R}P^n$ is a smooth manifold, then Milnor and Stasheff remark, that $i$ is an diffeomorphism onto its image. Therefore there must be a way to show that for an atlas of charts $U_j\subset\mathbb{R}P^n$ (in the abstract definition) of projective space, their images $i(U_j)\subset\mathbb{R}P^n$ provide an atlas of charts (in the embedded definition) too. Let's take the standard charts $$U_j = \{(x_0:x_1:\dots:x_{j-1}:1:x_{j+1}\dots:x_n)\in\mathbb{R}P^n\},\;j=0\dots n$$ with the obvious homeomorphism $\varphi_j\colon\mathbb{R}^n\to U_j$. We should show step by step, that

i) for all $j$, the map $i\circ\varphi_j\colon\mathbb{R}^n\to i(U_j)$ is a homeomorphism and considered as a map $\mathbb{R}^n\to i(U_j)$ it is smooth, and that

ii) the derivatives of this maps have maximal rank $n$ everywhere.

In general, injective closed maps are homeomorphisms onto their image. And as $\mathbb{R}P^n$ is compact and $\mathbb{R}^F$ is Hausdorff, $i$ indeed is closed, thus $i\colon\mathbb{R}P^n\to i(\mathbb{R}P^n)$ is an homeomorphism. Consequently, for all $j$, the composition $i\circ\varphi_j\colon\mathbb{R}^n\to i(U_j)$ is a homeomorphism as a composition of such. To see i) completely, just have a look at the definition of smothness of maps to $\mathbb{R}^F$.

Ad ii). For sake of simplicity, we shall only consider the case $j=0$ and denote $\psi:=i\circ\varphi_0$. (The other cases really are analogous.) Here the map is given by $\psi\colon (x_1,\dots,x_n)\mapsto (f(1,x_1,\dots,x_n))_{f\in F}$.

[Unfortunately I didn't find a more illustrative way to proof ii) in this example, so let me imitate how I would proof the equivalence of the definitions we come across here in general, though this is rather technical. I'll skip some details, let me know if you want some more.]

The claim is that for each $x\in\mathbb{R}^n$ the vectors $\partial\psi/\partial x_k$ are linearly independent at $x$. Therefore it suffices to find $n$ functions $f_1,\dots f_n\in F$ such that the derivative of the map $\mathbb{R}^n\to \mathbb{R}^{\{f_1,\dots,f_n\}}$, $(x_1,\dots,x_n)\mapsto (f_k(1,x_1,\dots x_n))_k$ is regular at $x$. Obeserve that this is the same as taking $D\psi$ at $x$ and then projecting to $\mathbb{R}^{\{f_1,\dots,f_n\}}\subset\mathbb{R}^F$ and therefore if this is surjective, $\psi$ has to be of full rank.

Taking a smooth bump function $\rho\colon\mathbb{R}^n\to\mathbb{R}$ at $x\in\mathbb{R}^n$, (i.e. such that for some compact neighborhoods $V\subset U\subset\mathbb{R}^n$ of $x$, $\rho|V=1$ and $\rho|_{\mathbb{R}^n-U}=0$ constantly, ) we find smooth functions $f_k\colon\mathbb{R}^n\to\mathbb{R}$ ($k=1, 2,\dots n$), via $f_k(x_1,\dots x_n) = x_k\rho(x)$, that behave like projections near $x$ and vanish sufficiently far away. Hence the induced maps $f_k\circ\varphi_0^{-1}\colon U_0\to\mathbb{R}$ extend smoothly (i.e. lying in $F$) to whole $\mathbb{R}P^n$ by zero; by abuse of notation let's denote the extensions by $f_k$ too.

Sufficiently close to $x$ we have $f_k(1,x_1,\dots x_n) = x_k$ by construction, thus $$\left(\frac{\partial f_k(1,x_1,\dots x_n)}{\partial x_j}(x)\right)_{j} = (0,\dots 0, 1, 0, \dots 0)$$ with the $1$ in the $k$-th component. For short, if you prefer, $\left(\frac{\partial f_k(1,x_1,\dots x_n)}{\partial x_j}(x)\right)_{j,k} = \delta_{j,k}$. This gives the desired and shows that $\psi$ is "immersive".