Reciprocal lattice in 2D

You don't need to. In fact, you can work directly in 2D and solve things explicitly, since the condition for the reciprocal basis that $b_i\cdot a_j = 2\pi\delta_{ij}$ reads in matrix notation $$ \begin{pmatrix} b_{1x} & b_{1y} \\ b_{2x} & b_{2y} \end{pmatrix} \begin{pmatrix} a_{1x} & a_{2x} \\ a_{1y} & a_{2y} \end{pmatrix} = 2\pi \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} , $$ so all you need to do is multiply with the explicit matrix inverse on the right to get $$ \begin{pmatrix} b_{1x} & b_{2x} \\ b_{1y} & b_{2y} \end{pmatrix} = \frac{2\pi}{a_{1x}a_{2y}-a_{1y}a_{2x}} \begin{pmatrix} a_{2y} & -a_{1y} \\ -a_{2x} & a_{1x} \end{pmatrix} . $$ (Note the equation has been transposed for clarity.) You can then check by hand that this matches the projection on the $x,y$ plane of the results you get from the 3D formulas by taking the third vector along $z$.


If, on the other hand, you do want to stick to the three-dimensional formalism (which then lets you use the same formulas for both cases) then you need to supplement your basis of the plane with a third vector to make the span three-dimensional. This third vector needs to have a nonzero $z$ component (or it would be linearly dependent on $a_1$ and $a_2$), but you need additional restrictions to specify it uniquely.

These restrictions come from the fact that

  1. you want $b_3\cdot a_1=b_3\cdot a_2=0$, so you want $b_3$ along the $z$ axis, and more importantly,
  2. you want $a_3\cdot b_1=a_3\cdot b_2=0$, where you want $b_1$ and $b_2$ to lie in the $xy$ plane because all your physics is two-dimensional, and this requires $a_3$ to lie along the $z$ axis.

As to the precise magnitude of $a_3$, it is irrelevant ─ it is easy to see that changing $a_3\mapsto \lambda a_3$ by any $\lambda \neq 0$ does not affect in any way the 3D $b_1$ and $b_2$ expressions you quote.


So to understand why you get what you get, it's important to understand what the reciprocal lattice is solving.

What is the problem the reciprocal lattice solves?

The problem is that we have a very nice formula for the dot product, $$\mathbf u \cdot \mathbf v = \sum_i u_i v_i.$$ This is derived directly from the fact that there are basis vectors $\hat e_i$ for the space such that $\mathbf u = \sum_i u_i~\hat e_i$ and $\mathbf v = \sum_i v_i~\hat e_i,$ combined with the fact that those basis vectors are orthogonal, $\hat e_i \cdot \hat e_j = \delta_{ij}$ where $\delta$ is the Kronecker delta symbol.

From those facts alone, plus the linearity of the dot product, one gets $$\mathbf u \cdot \mathbf v = \sum_i \sum_j u_i~v_j~(\hat e_i\cdot\hat e_j) = \sum_{ij} u_i ~v_j~ \delta_{ij} = \sum_i u_i v_i.$$Unfortunately, we're talking about a crystal, which has a crystal lattice. Or, similarly, one has the same problem with the moment-of-inertia tensor: it has principal axes about which there is a simple moment of inertia, but those axes might not be orthogonal. All of the properties of such systems like crystals are only “nice” when viewed in the basis of the atoms of the crystal, expressed as this basis of lattice vectors $\mathbf a_i.$ And those lattice vectors are seldom orthogonal, so instead we get some expression $$\mathbf u\cdot \mathbf v = \sum_{ij} u_i~v_j(\mathbf a_i \cdot \mathbf a_j) = \sum_{ij} g_{ij}~u_i~v_j,$$ which is more complicated. In fact, about the only nice thing one can say about this is that from the symmetry of the dot product, thankfully $g_{ij} = g_{ji}.$ How do we make this simpler?

Upper and lower indices

Well there is an analogue to the above formula, but it requires being more precise with the mathematics. I will give here the version that we call “abstract index notation.” Explaining this notation is the subject of the next section, sorry if that is demanding too much of one’s attention.

Definition of abstract index notation

So we have vectors in some vector space over some scalars, for example your vector space might be real 3D vectors $\mathbb R^3$ and your scalars are then generally just the real numbers $\mathbb R$. A covector (sometimes also called a one-form) is a linear mapping from vectors to scalars. Now that you have these dot products (you are in, we would say, a metric space) you know a bunch of covectors: $(\mathbf v\cdot)$, the function which takes some other vector and dots it with a fixed vector $\mathbf v$, is a covector for any $\mathbf v$. We assume that these exist in a one-to-one correspondence, so that for every covector there is a vector, too. So we have a set of scalars $\mathcal S$, a set of vectors $\mathcal V$, a set of covectors $\overline{\mathcal V}$, and our dot product $g : \mathcal V \to \mathcal V \to \mathcal S$ can be understood as a canonical linear map $g: \mathcal V \to \overline{\mathcal V}$ which is invertible with inverse $g^{-1}: \overline{\mathcal V} \to \mathcal V.$

We define that an $[m, n]$-tensor is a multi-linear map from $m$ covectors and $n$ vectors to a scalar. Since vectors map covectors to scalars (by applying the function to themselves) they are $[1, 0]$-tensors. Since covectors map vectors to scalars, they are $[0, 1]$-tensors. Two tensors can always be composed by the outer product, where I take an $[a,b]$-tensor and a $[c,d]-tensor$ and form an $[a+c,b+d]$-tensor, as follows: I take the first $a$ covectors and $b$ vectors and feed them to the first tensor to get a scalar; I take the remaining $c$ covectors and $d$ vectors and feed them to the second tensor to get a scalar; then I multiply the two scalars together. The dot product function $g$ and its inverse $g^{-1}$ allow us to convert canonically between all $[m,n]$-tensors for constant $m+n$, so you may just want to think of $2$-tensors which can either be observed as $[2,0]$- or $[1,1]$- or $[0, 2]$-tensors depending on how you adapt their inputs.

Mathematicians who study these things make one further axiomatic assumption: that any $[m, n]$-tensor can be decomposed in terms of these outer products. Any $[m, n]$ tensor is a sum of a bunch of outer products of $m$ vectors and $n$ covectors. This means that there are ways to contract any $[m, n]$-tensor to an $[m-1, n-1]$-tensor: decompose it into outer products and then apply the covector which handles this one input to the vector which handles that other input. So we need a notation that makes all of this easy to see and understand.

That notation works this way: create copies of the space of $[m, n]$-tensors with $m$ distinct Greek letters for upper indexes, $n$ distinct Greek letters for lower indexes. So $\mathcal T^{\alpha\beta}_{\gamma}$ is a copy of the space of $[2, 1]$-tensors. The letters are not variables, they do not stand in for numbers to be substituted in later; they are just symbols to help us tell apart different things. And then we write a tensor with the symbols which identify which space it belongs to. And when we want to contract a tensor, we repeat the index top and bottom. We can identify some other things too like the relabeling isomorphism $\delta^{\alpha}_\beta,$ and our $g$ that takes two vectors and produces a scalar is now $g_{\alpha\beta}$ in $\mathcal T_{\alpha\beta},$ and its inverse can be phrased as the existence of a $g^{\alpha\beta}$ such that $$g^{\alpha\beta}g_{\beta\gamma} = \delta^\alpha_\gamma.$$

So the actual vector is now written $v^\alpha$ and its covector is $v_\alpha = g_{\alpha\beta} v^\beta.$ And now the inner product is represented very nicely as $$ \mathbf u \cdot \mathbf v = u_\alpha ~v^\alpha. $$

Is this cheating?

In some ways this is cheating: basically just said “here is what looks nice, let’s write $(\mathbf u \cdot)$ as $u_\alpha$ and $\mathbf v$ as $v^\alpha$ and now we can write something that looks like $\sum_i u_i~v_i.$ Let us invent that as a way to make this look nice.” But actually there is something a little deeper going on here, and that deeper thing is precisely this reciprocal lattice.

So we want to single out certain vectors $e^\mu_k$ for $k=1,2,\dots D$ as our “basis vectors” now. To do this we need some covectors, $$e_{\mu}^\ell~e^\mu_k = \delta^\ell_k=\{1\text{ if } k=\ell \text{ else } 0\}.$$ Then any vector $v^\mu$ can be reconstructed from its upper-index components, $$\text{define } v^k = e^k_\mu ~v^\mu, ~~\text{ then }~v^\mu = \sum_{k=1}^D v^k ~e_k^\mu.$$But there is a symmetry here between what these are and it is clear that there must also be some lower-index components, $$\text{define } v_k = e^\mu_k~ v_\mu, ~~\text { then }~ v_\mu = \sum_{k=1}^D v_k~e^k_\mu,$$

and in terms of these components the dot product becomes cleanly also, $$\mathbf u \cdot \mathbf v = \sum_{k=1}^D u_k~v^k.$$Now these $u_k$ and $v^k$ terms are not cheating by inventing notation; they are real actual lists of numbers. The $v^k$ are the components of the vector in the $\mathbf e_k$ basis. But what are the $u_k$? In other words, what do these "dual components" of the vector really mean? This basis must not be the $\mathbf e_k$ basis we had earlier or we'd be able to prove the original form of the dot product law from $v^n = v_n,$ so it's got to be some other vector basis for the space. In fact, it's this reciprocal lattice basis. So we take our covectors and we cast them back into vectors,

$$\text{define } \bar e^{k\alpha} = g^{\alpha\beta}~e^k_\beta, \text{ so that } v^\alpha = \sum_{k=1}^D v_k~\bar e^{k\alpha},$$

and these $\bar e^{k\alpha}$ vectors are our reciprocal lattice basis.

Geometrically this means the reciprocal lattice vector dual to $\mathbf e_1$ is constructed by the following procedure:

  1. Find the (hyper-)plane spanned by a $\mathbf e_{2,3,\dots D}$. Identify a vector $\mathbf q$ perpendicular to that plane and therefore perpendicular to all of the other vectors.
  2. Figure out what $\mathbf q \cdot \mathbf a_1$ is and then scale $\mathbf q$ by the reciprocal of this number to a new vector $\bar{\mathbf e}^1$, so that $\bar{\mathbf e}^1 \cdot \mathbf e_1 = 1.$
  3. (Optional) for consistency with several solid-state textbooks, multiply by $2\pi.$

The reciprocal lattice therefore describes normal vectors $\mathbf b_i$ to planes that contain all of the vectors except the $\mathbf a_i$ that they correspond to.

Orientations and going to a lower dimensional space

Now one great way to find this is to look at an orientation tensor; in $n$ dimensions these have $n$ indices, so the 3D orientation tensor looks like $\epsilon_{\alpha\beta\gamma}$. The idea is that this is a totally antisymmetric tensor such that in the orthonormal basis the components work out so that $\epsilon_{123} = 1$ and any other permutation of indices is therefore either $+1$ if it's an even permutation of $123$ or $-1$ if it's an odd permuation, or $0$ if it's not a permutation.

As you may have guessed, in 3D $\epsilon_{\alpha\beta\gamma} u^\beta v^\gamma$ is in some sense the cross product between two vectors, except that it is a covector living in $\mathcal T_\alpha$ not a vector living in $\mathcal T^\alpha.$ But we have a bijection $g$ between those two spaces, so we can say $$[~\mathbf u \times \mathbf v~]^\mu = g^{\mu\alpha}~\epsilon_{\alpha\beta\gamma}~u^\beta~v^\gamma.$$

The orientation tensor, from its antisymmetry, gives us a vector orthogonal to the lattice vectors, fulfilling step 1 above. This now just needs to be normalized by its own dot product. So now you can see that the general formula must be (possibly $2\pi$ times) $$ \bar e^{1\mu} = \frac{g^{\mu\alpha}~\epsilon_{\alpha\beta\gamma}~e^\beta_2 ~e^\gamma_3}{\epsilon_{\rho\sigma\tau}~e^\rho_1~e^\sigma_2~e^\tau_3}$$ And that's where you get your definition,$$\mathbf b_1 = \frac{2\pi~\mathbf a_2 \times \mathbf a_3}{\mathbf a_1 \cdot (\mathbf a_2 \times \mathbf a_3)}.$$ The "cross product" is a shorthand for this orientation tensor, and the rest of it is just rescaling to make the dot product $\mathbf b_1 \cdot \mathbf a_1 = 2\pi.$

But if we were doing 4 dimensions our orientation tensor would say $\epsilon^{\alpha\beta\gamma\delta}$ and $\epsilon^{1234}$ would be +1 and so forth; and we'd use $a_{2\beta}, a_{3\gamma}, a_{4\delta}$ to form the corresponding dual vector.

What happens when we go from 3D to 2D, or 4D to 3D? Well, we need to remove one of those dimensions of the orientation tensor. But there's a really easy way to do that: apply it to whatever dimension we're removing.

So $\epsilon^{xy}$ in 2D is an orientation tensor which takes two vectors and returns a scalar; but it can be viewed as having the components $\epsilon^{xy3}$ from the 3D orientation tensor or $\epsilon^{xy34}$ from the 4D orientation tensor.

So that's why you can view this all as being $\mathbf b_1 = \operatorname{normalize}_{2\pi}\big(\mathbf a_2 \times \hat z\big)$ etc. -- it's just because there's an easy relationship between the orientations of these subspaces.