What generates the clock signal in a fast CPU and how does it work?

Actually crystal oscillators can easily go up to 10's of MHz. Above that in most cases a PLL (Phase Locked Loop) is used, which is an oscillator that is not very accurate in itself, but can be tuned (its frequency can be adjusted somewhat). The frequency of this high-frequency oscillator is divided by a suitable factor (dividing a signal by a power of 2 is easy and totally accurate), and then compared to a let's say a 10 MHz oscillator. The comparison is used to adjust the high-frequency oscillator. Thus a high frequency is made with (almost) the accuracy of the lower frequency crystal oscillator.

In most cases, the circuitry to do all this is built into the processor chip. This is so it can be configured under software control, and routing such a high-frequency signal between chips is a nightmare.

You don't need a crystal to oscillate, any reactive component, like a capacitor or inductor, with an amplifier can do the job. In fact, a crystal is equivalent to an R, L and C in series, all in parallel with a C. The advantage of a crystal is that the resonant frequency is very precise. To generate higher frequencies, people use other resonant components (e.g. inductors and capacitors inside chips) in their oscillator circuit.

With some oscillator circuits the frequency can be varied with an applied voltage (VCOs). These are used to generate high frequencies accurately, by dividing the output frequency and comparing it to an accurate low frequency source like a crystal then adjusting the control voltage appropriately. A PLL (phase locked loop) is one example, which generates a voltage proportional to the difference in phase between the divided high frequency clock and the reference clock.