How to learn Integral Transform?

Laplace transforms were derived in a very strange way by Oliver Heaviside, who is considered by many to be the Father of modern Electrical Engineering. He created 'operator' methods for solving ordinary differential equations. (The 'D' operator was Heaviside's notation, and the algebraic method was his, including using partial fractions and his 'cover up' method for decomposing into partial fractions.) Most of what he did was not very rigorous, but it was brilliant, it worked, and he always checked his answers. The reason you have trouble tracing back to the source is because Heaviside was so arrogant and nasty to people at the time, that they vindictively set out to keep his name out of everything. Honestly. He used to openly and viciously insult Lord Kelvin. Heaviside was banned from publishing several times throughout his life for his open attacks through Journal articles.

Heavside deliberately set out to turn differentiation into multiplication, and he came up with expressions that morphed into something similar to what is now called the Laplace transform. But it didn't start off as something called the Laplace transform; when people found integral expressions similar to what Heaviside was using that could be named after someone else, they jumped at the chance to write Heaviside's name out of it. Heaviside noticed that time evolution operators for time-invariant systems (such as circuits) would have an exponential property. That is, if the solution operator acted on a state $x$ at time $0$, then the state $S(t)x$ at a time t seconds later when evolved again by $t'$ seconds should be the same as the state obtained by evolving the original state by $t+t'$ seconds. In other words, the solution operator would satisfy $S(t')S(t)x=S(t'+t)x$. Very abstract, very general for such systems, and obviously leading to something exponential. That's where the exponential in the Laplace transform comes from, and that's the level Heaviside worked at during the late 1800's! His operator methods allowed him to solve problems nobody else at the time could; otherwise people at the time would have gladly ignored Heaviside.

We now recognize that many differential equation solution operators can be viewed in this abstract way of Heaviside. For example, if you have Laplace's equation on a half plane, $x \in\mathcal{R}$, $y > 0$, and you look at a solution operator that takes boundary data $f$ at $y=0$ to a function $g=L(y)f$ at $y > 0$, which is the slice of the solution at $y > 0$, and then solve Laplace's equation with that new boundary function, and look at the slice $L(y')g=L(y')L(y)f$ of the new solution, you should get $L(y'+y)f$. There's a general exponential property of time evolution operators; and there's a general exponential property connected with uniqueness of solutions of differential equations. The Laplace transform is intimately connected with these ideas. $C_{0}$ semigroup theory is based on this observation, and is also connected with the Laplace transform. The operator formalism is definitely traceable back to Heaviside.

Most integral transforms arise out of integral 'sums' of eigenfunctions of second order ordinary differential equations on $[0,\infty)$ or $(-\infty,\infty)$. Because the integrals use eigenfunctions, these 'transforms' turn the original operator into multiplication by the eigenvalue parameter. For example, the Fourier transform originated in trying to write a function $f$ as an integral sum of eigenfunctions of $\frac{d^{2}}{dx^{2}}$: $$ f(x) = \int_{0}^{\infty}\{a(s)\cos(sx)+b(s)\sin(sx)\}ds $$ The problem was to find the coefficient functions $a(s)$ and $b(s)$ in terms of $f$. Then $-\frac{d^{2}}{dx^{2}}$ is formally turned into multiplication of the coefficient functions by $s^{2}$, i.e., $$ -f''(x) = \int_{0}^{\infty}\{ s^{2}a(s)\cos(sx)+s^{2}b(s)\sin(sx)\}ds. $$ That's the idea behind most of the integral transforms: you start with a symmetric ordinary differential operator $Lf=-\frac{d}{dx}p\frac{d}{dx}f + qf$, you look for the eigenfunctions $Lf_{\lambda}=\lambda f_{\lambda}$ and you write a general $f$ as integral and/or discrete sums of the eigenfunctions $f_{\lambda}$, summing over $\lambda$. On old reference (out of print) written at the level of Advanced Calculus and dealing with general theory of integral transforms is R.V. Churchill's book listed below with an Amazon link.

R.V. Churchill, "Operational Mathematics": Amazon link

Wikipedia page for Heaviside: Oliver Heavside

Overview of Heaviside's work, along with links to his publications: Heaviside Operator Calculus.
I highly recommend this person's web page; it's entertaining, informative, and has excellent references.