Normalization of basis vectors with a continuous index?

Any good basis should be complete. If the set of all $|x\rangle$ is complete, any other vector $|\psi\rangle$ in the Hilbert space of your system should be writeable as $|\psi\rangle=\sum_{x} |x\rangle\langle x|\psi\rangle$. This sum does not make sense for continuous variables $x$, hence the need to redefine the completeness relation with an integral (as Jan's answer demonstrates nicely). Once you employ integrals to define a meaningful completeness relation, then the relation $\langle x|y\rangle =\delta_{x,y}$ is not correct because it gives \begin{equation} |y\rangle=\int \mathrm{d}x |x\rangle\langle x|y\rangle=\int \mathrm{d}x |x\rangle\delta_{x,y}=0, \end{equation} which is inconsistent. The way out is to define $\langle x|y\rangle=\delta(x-y)$, which correctly gives \begin{equation} |y\rangle =\int \mathrm{d}x |x\rangle\langle x|y\rangle=\int \mathrm{d}x |x\rangle \delta(x-y)=|y\rangle. \end{equation} Now $\langle x|y\rangle=\delta(x-y)$ leads to $\langle x|x\rangle=\delta(0)$, that you may not like, but this is the best one can do.


I) We interpret OP's question (v2) as follows:

Why not normalize $$\tag{1} \langle x_1 | x_2\rangle~=~\delta_{x_1,x_2}~:=~\left\{ \begin{array}{ccl} 1 & \text{for} & x_1= x_2, \\ 0& \text{for} & x_1\neq x_2, \end{array} \right. $$ via a continuous Kronecker delta function rather than a Dirac delta distribution $$\tag{2} \langle x_1 | x_2\rangle~=~\delta(x_1-x_2)~? $$

In a nutshell, the reason is that the rhs. of eq. (1) is equal to the zero function almost everywhere wrt. the Lebesgue measure.

In this answer we would like to build intuition via Gaussian wave packets to argue that we should use a Dirac delta distribution normalization (2) (possibly modulo a conventional normalization constant) rather than a continuous Kronecker delta function normalization (1).

II) To be concrete, let us for simplicity view a ket

$$\tag{3} |\psi\rangle \quad\longleftrightarrow\quad\psi(x)$$

as a position wave function $\psi(x)\in L^2(\mathbb{R})$ in the Hilbert space

$$\tag{4} L^2(\mathbb{R})~=~{\cal L}^2(\mathbb{R})/\sim,$$

where we have modded out by an equivalence relation "$\sim$". Here ${\cal L}^2(\mathbb{R})$ is the set of square integrable functions. Two functions $\phi\sim\psi$ are equivalent iff $\phi$ and $\psi$ are equal almost everywhere (a.e.) wrt. the Lebesgue measure. See e.g. this Phys.SE post. Overlaps/inner products read

$$\tag{5} \langle \phi|\psi\rangle ~:=~ \int_{\mathbb{R}} \! \mathrm{d}x ~\phi(x)^{\ast}~\psi(x). $$

III) Now pragmatically, short of mathematically constructs such as distributions, what would represent a state localized at $x=x_1$? Let us allow the wave packet to be spread by a tiny amount $\epsilon>0$, say smaller than any experimental resolution. We can model such a wave function by an extremely narrowly peaked Gaussian function

$$\tag{6} |x_1\rangle \quad\longleftrightarrow\quad \psi_{x_1}(x)~=~ A\epsilon^{-p} \exp\left[-\left(\frac{x-x_1}{2\epsilon}\right)^2\right], $$

where $p\in\mathbb{R}$ is some fixed power and $A>0$ is a normalization constant to be determined below. The normalization of (6) is

$$\langle x_1|x_1\rangle ~\stackrel{(5)}{=}~ \int_{\mathbb{R}} \! \mathrm{d}x ~|\psi_{x_1}(x)|^2 ~\stackrel{\text{Gauss. int.}}{=}~ \sqrt{2\pi}A^2\epsilon^{1-2p} $$ $$\tag{7}\longrightarrow ~\left\{ \begin{array}{ccl} 0 & \text{if} & p<\frac{1}{2} \\ \sqrt{2\pi}A^2& \text{if} & p=\frac{1}{2} \\ \infty& \text{if} &p>\frac{1}{2} \end{array} \right\} \text{ for } \epsilon~\to~ 0^{+}. $$

To avoid that the normalization (7) disappears in the limit $\epsilon\to 0^{+}$, we must demand that the power $p\geq \frac{1}{2}$. The Kronecker normalization (1) [modulo an overall constant] corresponds to the power $p=\frac{1}{2}$.

IV) More generally, if we assume the ansatz (6), then the overlap between two such kets $|x_1\rangle$ and $|x_2\rangle$ reads in a distributional sense

$$ \langle x_1|x_2\rangle~\stackrel{(5)}{=}~\int_{\mathbb{R}} \! \mathrm{d}x ~\psi_{x_1}(x)^{\ast}~\psi_{x_2}(x) ~\stackrel{(6)}{=}~A^2\int_{\mathbb{R}} \! \mathrm{d}x ~\epsilon^{-2p}~ \exp\left[-\left(\frac{x-x_1}{2\epsilon}\right)^2-\left(\frac{x-x_2}{2\epsilon}\right)^2\right] $$ $$~\stackrel{\text{Gauss. int.}}{=}~\sqrt{2\pi}A^2\epsilon^{1-2p}\exp\left[-\frac{1}{2}\left(\frac{x_1-x_2}{2\epsilon}\right)^2\right]$$ $$\tag{8}\longrightarrow ~\left\{ \begin{array}{ccl} 0\text{ almost everywhere} & \text{if} & p<1 \\ 4\pi A^2~\delta(x_1-x_2)& \text{if} & p=1 \\ \text{too singular}& \text{if} &p>1 \end{array} \right\} \text{ for } \epsilon~\to~ 0^{+}.$$

In the last step we used the heat kernel representation of the Dirac distribution. The Dirac normalization (2) [modulo an overall constant] corresponds to the power $p=1$. In detail, if $f(x_1-x_2)$ is a test function, then eq. (8) states that

$$ \tag{9}\int_{\mathbb{R}}\! \mathrm{d}x_1~ f(x_1-x_2)~ \langle x_1|x_2\rangle ~\longrightarrow ~\left\{ \begin{array}{ccl} 0 & \text{if} & p<1 \\ 4\pi A^2~ f(0)& \text{if} & p=1 \\ \infty & \text{if} &p>1 \end{array} \right\} \text{ for } \epsilon~\to~ 0^{+}.$$

V) Physically, according to the Born rule, the integral (10) of the overlap

$$ \tag{10} \left| \int_{\mathbb{R}}\! \mathrm{d}x_1~ \langle x_1|x_2\rangle\right|^2 ~=~1 $$

is supposed to denote the tautological probability that a particle located at position $x_2$ belongs to the real axis $\mathbb{R}$ with probability 100%.

Comparing eqs. (9) and (10), we are naturally lead to choose the power $p=1$, and therefore the Dirac normalization (2). Note that the power $p=1$ means that the position state $|x_1\rangle$ fails to be normalizable, and in particular, it does not belong to the Hilbert space, cf. eq. (7).

VI) A more rigorous discussion of eqs. (2) and (10) can be given by introducing momentum eigenstates. It turns out that ultimately eq. (10) is problematic, cf. e.g. this Phys.SE post.


Why cant these basis vectors be normalized to one, only to the delta function?

Because that would make those continuously indexed vectors unsuitable for the role of a "continuous basis" for normalizable functions. Here is the explanation. Suppose some function $\psi(\mathbf r)$ is expressed as the integral $$ \psi(\mathbf r) = \int c(k) \phi_k(\mathbf r)\,dk, $$ where the functions $\phi_k$, $\phi_{k'}$ are orthogonal for $k\neq k'$: $$ \int \phi_k^*(\mathbf r) \phi_{k'}(\mathbf r)\,d^3\mathbf r = 0 $$ (at least in distributive sense).

The above expression of $\psi$ can be described as "linear combination of the basis functions $\phi_k(\mathbf r)$". If the function $\psi$ is to be used to calculate probability density according to the Born rule, we have to require $$ \int \psi^*\psi\, d^3\mathbf r = 1. $$ This leads to $$ \int \int c^*(k)c(k') (\phi_k, \phi_{k'}) \,dkdk' = 1,~~~(*) $$ where $(\phi_k, \phi_{k'})$ is a scalar product of two continuously-indexed functions: $$ (\phi_k, \phi_{k'}) = \int \phi_k^*(\mathbf r) \phi_{k'}(\mathbf r)\,d^3\mathbf r. $$

Imagine points of plane labeled by Cartesian coordinates $k,k'$. If we had $(\phi_k, \phi_{k'}) = \delta_{kk'}$ with ordinary Kronecker delta, the scalar product would be non-zero only on the diagonal $k=k'$ which has zero area, while all over the large area where $k\neq k'$ it would vanish. The integral in (*) would then vanish too and couldn't be equal to 1 as needed.

One way to make the integral in (*) have a non-zero value is to postulate that for the above orthogonal functions, $(\phi_k, \phi_{k'})$ is to be regarded as some singular distribution of the kind Dirac introduced - to bring substantial contributions from the diagonal $k=k'$ only.

In practice we choose functions $\phi_k(\mathbf r)$ such that they obey

$$ (\phi_k, \phi_{k'})= \delta(k-k'). $$

Then the integral in (*) is

$$ \int |c(k)|^2\,dk, $$ which can be non-zero and equal to 1 for properly normalized function $c(k)$.

In the language of kets, the kets $|x\rangle, |y\rangle$ are meant to be such that they satisfy the relation

$$ \langle x|y\rangle = \delta(x-y), $$

because only then the relation $$ |\psi\rangle = \int |x\rangle \langle x|\psi\rangle \, dx, $$ which is part of the motivation behind the formalism of kets, is valid and consistent with

$$ \langle \psi|\psi\rangle = 1. $$

The expression $$ \langle x|x\rangle $$ is not a valid expression and usually is not used in manipulations with the bra-ket formalism; if we used the above relation, we would get $\delta(x-x)$, which could be regarded as either positive infinity or not a meaningful number at all (since $\delta$ is not an ordinary function and does not have ordinary number-valued function values.)