How is CNOT operation realized physically?

The key to making a CNOT gate is being able to turn on and off the interactions between two qubits.

Following the physical system you suggest, let's say our qubits are the spin up and spin down degrees of freedom of two electrons in some atoms. These spins can interact with an external potential, which is how we do the one-qubit rotations, but they will also interact with each other significantly if they are close enough together. The total Hamiltonian will typically look like

$H=H_1 +H_2 -g\vec{S_1}\cdot\vec{S_2}$

where $H_1$, $H_2$ are the individual interactions the atoms have with the external potential, each of which is independent of the other. $g$ is just some coupling, which would depend on distance and possible other things. The physics of this is that two spins can lower their energies by being aligned with each other.

Let's say you can somehow turn $g$ on and off- the easiest way to imagine is just by suddenly bringing the atoms very close together for a time $T$, then quickly moving them apart. This has the following effect:

$|\uparrow \uparrow> \rightarrow e^{igT} |\uparrow \uparrow> \\ |\uparrow \downarrow> \rightarrow e^{-igT}|\uparrow \downarrow> \\ |\downarrow \uparrow > \rightarrow e^{-igT}|\downarrow \uparrow> \\ |\downarrow \downarrow> \rightarrow e^{igT}|\downarrow \downarrow>$

That is, it only does a phase shift that depends on whether the spins are aligned. Although it is not obvious, this, along with single qubit operations, is all it takes to get a CNOT gate. Specifically, a way to do this is to set $T=\frac{\pi}{4g}$ and perform the following sequence (adapted from http://www.ohio.edu/people/diao/papers/nmr.pdf):

CNOT=$Z^1_{-\pi/2}Y^2_{-\pi/2}Z^2_{-\pi/2}GY^2_{\pi/2}$, where $G$ is the interaction gate defined above, and the single qubit gates are labelled by the rotation axis, angle of rotation, and the qubit they operate on. Note that these are written in the order they would be applied to a state on the right, so the $Y^2_{\pi/2}$ gate is applied first and then they run from right to left.

Roughly speaking, what happens is that the first gate takes the input and puts it in a spin superposition if it started in a definite spin, so that the interaction can give you different phases for the different parts of the superposition (aligned vs anti-aligned spins). Then the three rotations afterwards change these conditional phase shifts into a conditional bit swap.