Why doesn't "add more cores" face the same physical limitations as "make the CPU faster"?

Summary

  • Economics. It's cheaper and easier to design a CPU that has more cores than a higher clock speed, because:

  • Significant increase in power usage. CPU power consumption increases rapidly as you increase the clock speed - you can double the number of cores operating at a lower speed in the thermal space it takes to increase the clock speed by 25%. Quadruple for 50%.

  • There's other ways to increase sequential processing speed, and CPU manufacturers make good use of those.


I'm going to be drawing heavily on the excellent answers at this question on one of our sister SE sites. So go upvote them!


Clock speed limitations

There are a few known physical limitations to clock speed:

  • Transmission time

    The time it takes for an electrical signal to traverse a circuit is limited by the speed of light. This is a hard limit, and there is no known way around it1. At gigahertz-clocks, we are approaching this limit.

    However, we are not there yet. 1 GHz means one nanosecond per clock tick. In that time, light can travel 30cm. At 10 GHz, light can travel 3cm. A single CPU core is about 5mm wide, so we will run into these issues somewhere past 10 GHz.2

  • Switching delay

    It's not enough to merely consider the time it takes for a signal to travel from one end to another. We also need to consider the time it takes for a logic gate within the CPU to switch from one state to another! As we increase clock speed, this can become an issue.

    Unfortunately, I'm not sure about the specifics, and cannot provide any numbers.

    Apparently, pumping more power into it can speed up switching, but this leads to both power consumption and heat dissipation issues. Also, more power means you need bulkier conduits capable of handling it without damage.

  • Heat dissipation/power consumption

    This is the big one. Quoting from fuzzyhair2's answer:

    Recent processors are manufactured using CMOS technology. Every time there is a clock cycle, power is dissipated. Therefore, higher processor speeds means more heat dissipation.

    There's some lovely measurements at this AnandTech forum thread, and they even derived a formula for the power consumption (which goes hand in hand with heat generated):

    Formula
    Credit to Idontcare

    We can visualise this in the following graph:

    Graph
    Credit to Idontcare

    As you can see, power consumption (and heat generated) rises extremely rapidly as the clock speed is increased past a certain point. This makes it impractical to boundlessly increase clock speed.

    The reason for the rapid increase in power usage is probably related to the switching delay - it's not enough to simply increase power proportional to the clock rate; the voltage must also be increased to maintain stability at higher clocks. This may not be completely correct; feel free to point out corrections in a comment, or make an edit to this answer.


More cores?

So why more cores? Well, I can't answer that definitively. You'd have to ask the folks at Intel and AMD. But you can see above that, with modern CPUs, at some point it becomes impractical to increase clock speed.

Yes, multicore also increases power required, and heat dissipation. But it neatly avoids the transmission time and switching delay issues. And, as you can see from the graph, you can easily double the number of cores in a modern CPU with the same thermal overhead as a 25% increase in clock speed.

Some people have done it - the current overclocking world record is just shy of 9 GHz. But it is a significant engineering challenge to do so while keeping power consumption within acceptable bounds. The designers at some point decided that adding more cores to perform more work in parallel would provide a more effective boost to performance in most cases.

That's where the economics come in - it was likely cheaper (less design time, less complicated to manufacture) to go the multicore route. And it's easy to market - who doesn't love the brand new octa-core chip? (Of course, we know that multicore is pretty useless when the software doesn't make use of it...)

There is a downside to multicore: you need more physical space to put the extra core. However, CPU process sizes constantly shrink a lot, so there's plenty of space to put two copies of a previous design - the real tradeoff is not being able to create larger, more-complex, single cores. Then again, increasing core complexity is a bad thing from a design standpoint - more complexity = more mistakes/bugs and manufacturing errors. We seem to have found a happy medium with efficient cores that are simple enough to not take too much space.

We've already hit a limit with the number of cores we can fit on a single die at current process sizes. We might hit a limit of how far we can shrink things soon. So, what's next? Do we need more? That's difficult to answer, unfortunately. Anyone here a clairvoyant?


Other ways to improve performance

So, we can't increase the clock speed. And more cores have an additional disadvantage - namely, they only help when the software running on them can make use of them.

So, what else can we do? How are modern CPUs so much faster than older ones at the same clock speed?

Clock speed is really only a very rough approximation of the internal workings of a CPU. Not all components of a CPU work at that speed - some might operate once every two ticks, etc..

What's more significant is the number of instructions you can execute per unit of time. This is a far better measure of just how much a single CPU core can accomplish. Some instructions; some will take one clock cycle, some will take three. Division, for example, is considerably slower than addition.

So, we could make a CPU perform better by increasing the number of instructions it can execute per second. How? Well, you could make an instruction more efficient - maybe division now takes only two cycles. Then there's instruction pipelining. By breaking each instruction into multiple stages, it's possible to execute instructions "in parallel" - but each instruction still has a well-defined, sequential, order respective to the instructions before and after it, so it doesn't require software support like multicore does.

There is another way: more specialised instructions. We've seen things like SSE, which provide instructions to process large amounts of data at one time. There are new instruction sets constantly being introduced with similar goals. These, again, require software support and increase complexity of the hardware, but they provide a nice performance boost. Recently, there was AES-NI, which provides hardware-accelerated AES encryption and decryption, far faster than a bunch of arithmetic implemented in software.


1 Not without getting quite deep into theoretical quantum physics, anyway.

2 It might actually be lower, since electrical field propagation isn't quite as fast as the speed of light in a vacuum. Also, that's just for straight-line distance - it's likely that there's at least one path that's considerably longer than a straight line.


Physics is physics. We can't keep packing more transistors into ever smaller spaces forever. At some point it gets so small that you deal with weird quantum crap. At some point we can't pack twice as many transistors in a year as we used to (which is what moore's law is about).

Raw clockspeeds mean nothing. My old Pentium M was about half the clock speed of a contemporary desktop CPU (and yet in many respects faster) – and modern systems are barely approaching the speeds of systems 10 years ago (and are clearly faster). Basically 'just' bumping up the clockspeed does not give real performance gains in many cases. It may help in some singlethreaded operations, but you're better off spending the design budget on better efficiency in terms of everything else.

Multiple cores let you do two or more things at once, so you don't need to wait for one thing to finish for the next one. On the shorter term, you can simply pop two existing cores into the same package(for example with the Pentium Ds, and their MCM, which was a transitional design) and you have a system that's twice as fast. Most modern implementations do share things like a memory controller of course.

You can also build smarter in different ways. ARM does Big-Little – having 4 'weak' low power cores working alongside 4 more powerful cores so you have the best of both worlds. Intel lets you down throttle (for better power efficency) or overclock specific cores (for better single thread performance). I remember AMD does something with modules.

You can also move things like memory controllers (so you have lower latency) and IO related functions (the modern CPU has no north bridge) as well as video (which is more important with laptops and AIW design). It makes more sense to do these things than 'just' keep ramping up the clockspeed.

At some point 'more' cores may not work – though GPUs have hundreds of cores.

Multicores as such lets computers work smarter in all these ways.


Simple answer

The simplest answer to the question

Why doesn't "add more cores" face the same physical limitations as "make the CPU faster"?

is actually found within another part of your question:

I would expect the conclusion to be "therefore, we'll have to have bigger computers or run our programs on multiple computers."

In essence, multiple cores is like having multiple "computers" on the same device.

Complex answer

A "core" is the part of the computer that actually processes instructions (adding, multiplying, "and"ing, etc). A core can only execute a single instruction at one time. If you want your computer to be "more powerful" there are two basic things you can do:

  1. Increase throughput (increase clock rate, decrease physical size, etc)
  2. Use more cores in the same computer

The physical limitations to #1 are primarily the need to dump heat caused by the processing and the speed of an electron in the circuit. Once you split off some of those transistors to a separate core, you alleviate the heat issue to a large degree.

There's an important limitation to #2: you have to be able to split your problem up into multiple independent problems, and then combine the answer. On a modern personal computer, this isn't really a problem, as there are loads of independent problems all vying for computational time with the core(s) anyway. But when doing intensive computational problems, multiple cores only really help if the problem is amenable to concurrency.