Why does the expectation value of an operator $A$ take the form $\langle A\,\rangle=\int{\psi^* (x) A(x) \psi (x) dx}$ in QM?

There is a slight, but important aspect you are missing here. The expectation value of the observable $A$ is defined as

$$\langle A\rangle_\psi=\int\psi^*(x)A\psi(x)dx$$

where as the probability of being in the configuration $\psi(x)$ is

$$P=\int\psi^*(x)\psi(x)dx$$

But I already know that equation $(3)$ is wrong since in equation $(1)$ the operator $A$ is acting on $\psi(x)$, so it doesn't make any sense to move the operator to the front of the integrand just to make it look like equation $(2)$.

Yes of course. You are right. Now, we see the part you are missing. In quantum mechanics, we define the operators representing observables as Hermitian and an operator has got certain eigen functions.

If $\psi(x)$ is such an eigen function of the operator $A$, then you will have the eigen value equation

$$A\psi(x)=a\psi(x)$$

where $a$ is the corresponding eigen value which is a real number. In such a case,

$$\langle A\rangle_\psi=\int\psi^*(x)a\psi(x)dx=a\int\psi^*(x)\psi(x)dx=aP$$

where $P$ as defined above is the probability that the system can be found in the state $\psi(x)$. Hence we can say that the expectation value of an operator w.r.t a particular state is the eigen value of that state times the probability of being in that state. That's the difference between an expectation value and the eigen value.

Unless the wavefunction is normalized ($P=1$), we will not get the eigen value of the operator as it's expectation value.

Now, the wavefunction $\psi(x)$ need not be always an eigen function of $A$. In such cases, we expand our wavefunction as a superposition of the eigen functions of the operator $A$ in Dirac's bra-ket notation:

$$\vert \psi\rangle=\int d\zeta '\vert\zeta'\rangle\langle\zeta'\vert\psi\rangle $$

where {$\zeta_j$} forms a complete set of eigen functions of $A$ and $\displaystyle{\int d\zeta '\vert\zeta'\rangle\langle\zeta\vert}$ is the identity operator $1$ and $\vert\zeta'\rangle\langle\zeta'\vert$ is the projection operator $\Lambda_{\zeta'}$. The operations all happen in the appropriate Hilbert space spanned by the complete inner products of the eigen kets and eigen bras of the operator.

Before we proceed further, let's have a short brief on Dirac's formalism:

Short brief on Dirac's bra-ket notation: The ket, like the wavefunction represent a particular state of the system, but it's not actually the wavefunction of the system. It is represented as $\vert\psi\rangle$. The wave function of the system can be derived from the ket, and the ket representing a state, called the state ket, is a vector in the vector space spanned by the eigen kets of the operator $A$, just as like we speak the eigen functions of the operator $A$. Now, for the wavefunction, we have a corresponding complex wave function. Similarly, the complex dual of a state ket is called a state bra and is represented by $\langle\psi\vert$.
So, expectation value of some operator of the quantum mechanical system is what we want to measure. The first thing we consider is that we represent the general state ket (which is of course undefined) as a linear superposition of the eigen kets of the operator (which are known, once you solve the eigen value equation). It's like writing a vector as a linear combination of the independent coordinates. However, a vector space is a different thing. But the concept is the same. So, a general state ket $\vert\alpha\rangle$ can be expanded in terms of the complete eigen vectors of the operator $A$ as: $$\vert\alpha\rangle=\sum_{a'}c_{a'}\vert a'\rangle=c_{a'}\vert a'\rangle+c_{a''}\vert a''\rangle+c_{a'''}\vert a'''\rangle+...$$ where the kets $\vert a'\rangle, \vert a''\rangle,\vert a'''\rangle...$ are the eigen kets of $A$ and are complete. The set {$a'$} are the corresponding eigen values. The expansion coefficients $c_{a'},c_{a''},...$ are the probability amplitudes of the corresponding eigen kets. This can be understood in the coming paragraphs where we define the inner product of a ket and a bra.
We represent the state of the system in question as a linear combination of the eigen kets of the observable, whose expectation value is to be measured. This vector is represented as a ket and is defined in a complex vector space called the ket space. So, the ket space is spanned by the eigen kets of the operator. This means the eigen kets of the operator forms the basis vectors of our vector space. Since there is a one-to-one correspondence between a ket and the corresponding bra, we can define a space spanned by eigen bras and is called a bra space. If we take the inner product of the state ket and the state bra, defined respectively in the ket space and the bra space, we will get a complete inner product space called the Hilbert space. All the quantum "mechanics" happen in the Hilbert space.
Why do we need an inner product space? Well, the ket and bra are complex vectors and they are useless, unless we can extract some information from them. To obtain that, we take the inner product of the ket and bra. The inner product is taken between a bra and a ket. The inner product between the state ket $\vert\alpha\rangle$ and the state bra $\langle\beta\vert$ is denoted as $\langle\beta\vert\alpha\rangle$. It gives the probability amplitude that the system, found initially in the state $\vert\alpha\rangle$ to be found in the state $\vert\beta\rangle$, whose square of the modulus gives the probability of the same. The inner product is a real number. This probability is the fundamental thing that accompanies all the rest of the operations, which you will see in the coming discussions. The probability is a real number and must be positive. So the inner product explained above should be positive.
Now lets look back where we defined $c_{a'}$ as the probability amplitude of the state defined by the ket $\vert\alpha\rangle$ to be found in the state $\vert a'\rangle$, which is an eigen state of the operator $A$. For that, we take the inner product of $\vert\alpha\rangle$ with the eigen bra $\langle a'\vert$, we get
$$\langle a'\vert\alpha\rangle=\sum_{a'}c_{a'}\langle a'\vert a'\rangle=c_{a'}$$ where we have used an important relation called the orthonormality condition of two kets. If two kets $\vert a'\rangle$ and $\vert a''\rangle$ are orthogonal (independent) and normalized (so that the inner product of the ket with it's own bra gives $1$), then the orthonormality condition states that $$\langle a'\vert a''\rangle=\delta_{a',a''}$$ which is $1$ if the two kets are the same and $0$ when they are not. So, we demand the eigen kets of the operators to be orthonormal so that they satisfy the above orthonormality condition. So, we have got $c_{a'}$ as the probability amplitude of the eigen ket $\vert a'\rangle$. Hence the square of its modulus give us the probability that the system is found to be in the eigen state $\vert a'\rangle$: $$\vert c_{a'}\vert^2=\vert\langle a'\vert\alpha\rangle\vert^2$$ Now, we see that $$\sum_{a'} \vert c_{a'}\vert^2=\sum_{a'}\vert\langle a'\vert\alpha\rangle\vert^2=1$$ a requirement by the probability conservation theorem.
Now, what happens if we take the inner product of a general ket and the corresponding bra? That answer will give us the probability to find the system to be in that state. If the state kets are normalized, then this probability will be one.
Now, while taking the inner product of a state ket with a state bra, we are combining the two spaces- the ket and bra spaces- somehow to get a complete inner product space called Hilbert space. All the information about the state is hidden in this Hilbert space. So we ask the state ket to reveal some information, for example the energy. We do this by operating the state ket buy the energy operator. Then we will get the value of energy, which is present in the Hilbert space. So, the operations on state ket happens in the Hilbert space.
Now, let's see the operation of the operators on the state kets. Its similar to the operation of the operators on a wavefunction. The operator $A$ acting on the general ket $\vert\alpha\rangle$ is given by
$$A\vert\alpha\rangle=A\sum_{a'}c_{a'}\vert a'\rangle=A\sum_{a'}\left(\langle a'\vert\alpha\rangle\right)\vert a'\rangle=A\sum_{a'}\vert a'\rangle\langle a'\vert\alpha\rangle$$
When we compare both sides that the effect of $\displaystyle{\sum_{a'}\vert a'\rangle\langle a'\vert}$ is just like operating by the identity operator $1$. Hence $\displaystyle{\sum_{a'}\vert a'\rangle\langle a'\vert}=1$ is regarded as the identity opertor. Now, what does the outer product $\Lambda_{a'}=\vert a'\rangle\langle a'\vert$ gives us? Even though the inner product is a scalar, the outer product is an operator. To see this, let it act on the ket $\vert\alpha\rangle$
$$\Lambda_{a'}\vert\alpha\rangle=\vert a'\rangle\langle a'\vert\vert\alpha\rangle=\vert a'\rangle\left(\langle a'\vert\vert\alpha\rangle\right)=c_{a'}\vert a'\rangle.$$
The ket $\vert\alpha\rangle$ is a combination of the all possible eigen kets. When we operate this ket with $\Lambda_{a'}$, the operator selects the portion of the ket $\vert\alpha\rangle$ parallel to $\vert a'\rangle$. Hence it is known as the projection operator. Comparing the identity operator and the projection operator, we find that
$$\sum_{a'} \Lambda_{A'}=1$$
Okay, now we are almost equipped with the tools for the further discussion. We have only considered above discrete spectrum cases only. The above facts holds for continuous spectrum. All we have to do is just replace the summation by an integral and the Kronecker delta symbol by the Dirac delta function.
Note: This is not a complete description about Dirac's notation. There are a lot of things to see. However I've limitations here. You can found more illuminating discussions on Dirac's notation in Modern Quantum Mechanics by J. J. Sakurai.

Now, we continue. The expectation value is defined as

$$\langle A\rangle_\psi=\langle\psi\vert A\vert\psi\rangle$$

Substituting the above expansion of $\vert\psi\rangle$ in the equation, we get

$$ \begin{align} \langle A\rangle_\psi&=\iint d\zeta'd\zeta''\langle\psi\vert\zeta'\rangle\langle\zeta'\vert A \vert\zeta''\rangle\langle\zeta''\vert\psi\rangle\\ &= \iint d\zeta'd\zeta''\langle\psi\vert\zeta'\rangle\zeta' \delta\left(\zeta''-\zeta'\right)\langle\zeta''\vert\psi\rangle\\ &=\int d\zeta' \zeta' \langle\psi\vert\zeta'\rangle\langle\zeta'\vert\psi\rangle \end{align} $$

Now, $\langle\zeta'\vert\psi\rangle$ is defined as an inner product of two kets. It gives the probability that the system is transferred from state $\vert\psi\rangle$ to the state $\vert\zeta'\rangle$ and is the transition probability. If I represent $\langle\zeta'\vert\psi\rangle=c_{\zeta'}$, which in general is a complex number and is the transition amplitude, then $\langle\psi\vert\zeta'\rangle=\langle\zeta'\vert\psi\rangle^*=c^*_{\zeta'}$. Hence

$$\langle A\rangle_\psi=\int d\zeta ' \zeta' \vert c_{\zeta'}\vert^2$$

which means the expectation value ofthe operator $A$ is the eigen ket of $A$ times the probability of the system to be found in that particular eigen state of $A$.


An expectation value is a probability weighted average.

Recall that the eigenvalues of an operator $\hat{Q}$ are the possible results of a measurement of the related observable $q$, and that $$ \hat{Q}\left|\psi_i\right\rangle = q_i \left|\psi_i\right\rangle \;,$$ where the $\psi_i$s are the eigenstates of $\hat{Q}$. From this it follows1 that given a state written in the basis of $\hat{Q}$ $$ | \psi \rangle = \sum_i c_n |\phi_i\rangle \;,$$ we can write the expectation value as \begin{align*} \left\langle\psi\right| \hat{Q}\left|\psi\right\rangle &= \sum_i \left\langle\psi_i\right|c^*_i \hat{Q} c_i \left|\psi_i\right\rangle \\ &= \sum_i c_i^* c_i \left\langle\psi_i\right| q_i \left|\psi_i\right\rangle \\ &= \sum_i c_i^* c_i \left\langle\psi_i|\psi_i\right\rangle\\ \tag{1} &= \sum_i P(q_i) q_i \;, \end{align*} where $P(q_i) = c_i^* c_i$ is the Born-rule probability of finding $q_i$ in your measurement.

But because the normalization requirement for $\left|\psi\right\rangle$ is $\left\langle\psi|\psi\right\rangle = \sum_i \langle \psi_i|c_i^* c_i | \psi_i \rangle = \sum_i P(q_i) = 1$, equation (1) is exactly the definition of a weighted average of $q$ over the probabilities $P(q_i)$.


1I'm going to do this in the form for a operator with a discrete spectrum, but the transition to a continuous space follows in the usual way.


This is essentially the Born interpretation of the wave function.

Let $A$ be an observable. For the purpose of illustration, let me limit myself to the following simple scenario:

  1. There exists an orthonormal set of eigenfunctions $u_n(x)$ of $A$. This means that $$Au_n=a_nu_n,\\\intop \text d x u_n(x)^*u_m(x)=\delta _{nm},\\\psi =\sum _n c_n u_n. $$ The last equation is assumed to be valid, with appropriate coefficients, for every wave functions $\psi$. The coefficients may be found from the second equation, if the functions $u_n$ are known.
  2. All eigenvalues are non degenerate: $a_n=a_m \implies n=m$.

Now, to your question. Born's interpretation$^1$ tells us to compute the coefficients $c_n$ of the wave function $\psi$ and to take the squares of the moduli $\vert c_n\vert^2$. If we perform an experiment which is able to determine with certainty the value of the observable $A$, the probability to find $a_n$ is $\vert c_n\vert^2.$

That's all, this is how quantum mechanics works: it tells (1) which are the possible outcomes of the measurement of $A$ - just compute the eigenvalues of $A$ - and (2) what is the probability of a certain outcome - just compute the coefficients of $\psi$.

Now, the expectation value rule is just a corollary of Born's interpretation, since any expectation value should be defined by:$$\langle A \rangle = \sum _n a_n p_n,$$ where $p_n$ is the probability of the value $a_n$. If you plug $p_n=\vert c_n\vert ^2$ and use the eigenvalue equation satisfied by every $u_n$, you will discover that this definition is equivalent to: $$\langle A\rangle = \intop \text d x \psi^* A\psi.$$


$^1$ I don't know the history of quantum mechanics. No idea if this is how Born would have spelled his interpretation, probably not. This is what you will find in modern textbooks, and it is (more or less) the way physicists use $\psi$ to compute things.