Why do we have CPUs with all the cores at the same speeds and not combinations of different speeds?

This is known as heterogeneous multiprocessing (HMP) and is widely adopted by mobile devices. In ARM-based devices which implement big.LITTLE, the processor contains cores with different performance and power profiles, e.g. some cores run fast but draw lots of power (faster architecture and/or higher clocks) while others are energy-efficient but slow (slower architecture and/or lower clocks). This is useful because power usage tends to increase disproportionately as you increase performance once you get past a certain point. The idea here is to get performance when you need it and battery life when you don't.

On desktop platforms, power consumption is much less of an issue so this is not truly necessary. Most applications expect each core to have similar performance characteristics, and scheduling processes for HMP systems is much more complex than scheduling for traditional SMP systems. (Windows 10 technically has support for HMP, but it's mainly intended for mobile devices that use ARM big.LITTLE.)

Also, most desktop and laptop processors today are not thermally or electrically limited to the point where some cores need to run faster than others even for short bursts. We've basically hit a wall on how fast we can make individual cores, so replacing some cores with slower ones won't allow the remaining cores to run faster.

While there are a few desktop processors that have one or two cores capable of running faster than the others, this capability is currently limited to certain very high-end Intel processors (as Turbo Boost Max Technology 3.0) and only involves a slight gain in performance for those cores that can run faster.

While it is certainly possible to design a traditional x86 processor with both large, fast cores and smaller, slower cores to optimize for heavily-threaded workloads, this would add considerable complexity to the processor design and applications are unlikely to properly support it.

Take a hypothetical processor with two fast Kaby Lake (7th-generation Core) cores and eight slow Goldmont (Atom) cores. You'd have a total of 10 cores, and heavily-threaded workloads optimized for this kind of processor may see a gain in performance and efficiency over a normal quad-core Kaby Lake processor. However, the different types of cores have wildly different performance levels, and the slow cores don't even support some of the instructions the fast cores support, like AVX. (ARM avoids this issue by requiring both the big and LITTLE cores to support the same instructions.)

Again, most Windows-based multithreaded applications assume that every core has the same or nearly the same level of performance and can execute the same instructions, so this kind of asymmetry is likely to result in less-than-ideal performance, perhaps even crashes if it uses instructions not supported by the slow cores. While Intel could modify the slow cores to add advanced instruction support so that all cores can execute all instructions, this would not resolve issues with software support for heterogeneous processors.

A different approach to application design, closer to what you're probably thinking about in your question, would use the GPU for acceleration of highly parallel portions of applications. This can be done using APIs like OpenCL and CUDA. As for a single-chip solution, AMD promotes hardware support for GPU acceleration in its APUs, which combine a traditional CPU and a high-performance integrated GPU onto the same chip, as Heterogeneous System Architecture, though this has not seen much industry uptake outside of a few specialized applications.

What you're asking is why are current systems using Symmetric multiprocessing rather than Asymmetric multiprocessing.

Asymmetric multiprocessing were used in the old days, when a computer was enormous and housed over several units.

Modern CPUs are cast as one unit, in one die, where it is much simpler not to mix CPUs of different types, since they all share the same bus and RAM.

There is also the constraint of the clock that governs the CPU cycles and RAM access. This will become impossible when mixing CPUs of different speeds. Clock-less experimental computers did exist and were even pretty fast, but the complexities of modern hardware imposed a simpler architecture.

For example, Sandy Bridge and Ivy Bridge cores can't be running at different speeds at the same time since the L3 cache bus runs at the same clock speed as the cores, so to prevent synchronization problems they all have to either run at that speed or be parked/off (link: Intel's Sandy Bridge Architecture Exposed). (Also verified in the comments below for Skylake.)

[EDIT] Some people have mistaken my answer to mean saying that mixing CPUs is impossible. For their benefit I state : Mixing of differing CPUs is not beyond today's technology, but is not done - "why not" is the question. As answered above, this would be technically complicated, therefore costlier and for too little or no financial gain, so does not interest the manufacturers.

Here are answers to some comments below :

Turbo boost changes CPU speeds so they can be changed

Turbo boost is done by speeding up the clock and changing some multipliers, which is exactly what people do when overclocking, except that the hardware does it for us. The clock is shared between cores on the same CPU, so this speeds up uniformly the entire CPU and all its cores.

Some phones have more than one CPU of different speeds

Such phones typically have a custom firmware and software stack associated with each CPU, more like two separate CPUs (or like CPU and GPU), and they lack a single view of system memory. This complexity is hard to program and so Asymmetric multiprocessing was left in the mobile realm, since it requires low-level close-to-the-hardware software development, which is shunned by general-purpose desktop OS. This is the reason that such configurations aren't found in the PC (except for CPU/GPU if we stretch enough the definition).

My server with 2x Xeon E5-2670 v3 (12 cores with HT) currently has cores at 1.3 GHz, 1.5 GHz, 1.6 GHz, 2.2 GHz, 2.5 GHz, 2.7 GHz, 2.8 GHz, 2.9 GHz, and many other speeds.

A core is either active or idle. All cores that are active at the same time run at the same frequency. What you are seeing is just an artifact of either timing or averaging. I have myself also noted that Windows does not park a core for a long time, but rather separately parks/unparks all cores far faster than the refresh rate of Resource Monitor, but I don't know the reason for this behavior which probably is behind the above remark.

Intel Haswell processors have integrated voltage regulators that enable individual voltages and frequencies for every core

Individual voltage regulators differ from clock speed. Not all cores are identical - some are faster. Faster cores are given slightly less power, creating the headroom to boost the power given to weaker cores. Core voltage regulators will be set as low as possible in order to maintain the current clock speed. The Power Control Unit on the CPU regulates voltages and will override OS requests where necessary for cores that differ in quality. Summary: Individual regulators are for making all cores operate economically at the same clock speed, not for setting individual core speeds

Why do we not have variants with differing clock speeds? ie. 2 'big' cores and lots of small cores.

It's possible that the phone in your pocket sports exactly that arrangement - the ARM big.LITTLE works exactly as you described. There it's not even just a clock speed difference, they can be entirely different core types - typically, the slower clocked ones are even "dumber" (no out-of-order execution and other CPU optimizations).

It's a nice idea essentially to save battery, but has its own shortcomings; the bookkeeping to move stuff between different CPUs is more complicated, the communication with the rest of the peripherals is more complicated and, most importantly, to use such cores effectively the task scheduler has to be extremely smart (and often to "guess right").

The ideal arrangement is to run non-time-critical background tasks or relatively small interactive tasks on on the "little" cores and wake the "big" ones only for big, long computations (where the extra time spent on the little cores ends up eating more battery) or for medium-sized interactive tasks, where the user feels sluggishness on the little cores.

However, the scheduler has limited information about the kind of work each task may be running, and has to resort to some heuristic (or external information, such as forcing some affinity mask on a given task) to decide where to schedule them. If it gets this wrong, you may end up wasting a lot of time/power to run a task on a slow core, and give a bad user experience, or using the "big" cores for low priority tasks, and thus wasting power/stealing them away from tasks that would need them.

Also, on an asymmetric multiprocessing system it's usually more costly to migrate tasks to a different core than it would be on an SMP system, so the scheduler generally has to make a good initial guess instead of trying to run on a random free core and moving it around later.

The Intel choice here instead is to have a lower number of identical intelligent and fast cores, but with very aggressive frequency scaling. When the CPU gets busy it quickly ramps up to the maximum clock speed, does the work the fastest it can and then scales it down to go back to lowest power usage mode. This doesn't place particular burden on the scheduler, and avoids the bad scenarios described above. Of course, even when in low clock mode, these cores are "smart" ones, so they'll probably consume more than the low-clock "stupid" big.LITTLE cores.

Why do we have CPUs with all the cores at the same speeds and not combinations of different speeds?

Tags:

Cpu

Cpu Architecture

Cpu Cores

Multi Core

Related

Recent Posts