Do multi-core CPUs share the MMU and page tables?

Take a look at this scheme. This is high level view of all that there is in a single core on a Corei7 cpu. The picture has been taken from Computer Systems: A Programmer's Perspective, Bryant and Hallaron. You can have access to diagrams in here, section 9.21.

Computer Systems: A Programmer's Perspective, 2/E (CS:APP2e)Randal E. Bryant and David R. O'Hallaron, Carnegie Mellon University


TL;DR - There is a seperate MMU per CPU, but an MMU generally has several LEVELS of page tables and these may be shared.

For instance, on an ARM the top level (PGD or page global directory name used in Linux) covers 1MB of address space. In simple systems, you can map in 1MB sections. However, this normally points to a 2nd level table (PTE or page table entry).

One way to implement multi-CPU efficiently is to have a separate top level PGD per CPU. The OS code and data will be consistent between cores. Each core will have its own TLB and L1-cache; L2/L3 caches maybe shared or may not. The maintenance of data/code caches depend on whether they are VIVT or VIPT, but that is a side issue and shouldn't affect the use of MMU and multi-core.

The process or user portion of the 2nd level page tables remain the same per process; otherwise they would have different memory or you would need to synchronize redundant tables. Individual cores may have different sets of 2nd level page tables (different top level page table pointer) when they run different processes. If it is multi-threaded, and running on two CPUs then the top level table may contain the same 2nd level page table entries for the process. In fact, the entire top level page table maybe identical (but different memory) when two CPUs run the same process. If thread local data is implemented with an MMU a single entry could differ. However, thread local data is usually implemented in other ways due to TLB and cache issue (flushing/coherency).

The image below may help. The CPU, PGD, and PTE entries in the diagram are sort of like pointers.

Multi-cpu MMU

The dashed line is the only difference between running different processes and the same processes (multi-threading case) with the MMU; it is an alternate to the solid line running from the CPU2 PGD to the process B PTE or 2nd level page table. The kernel is always a multi-threaded CPU application.

When a virtual address is translated, different bit portions are indexes into each table. If a virtual address is not in the TLB, then the CPU must do a table walk (and fetch different table memory). So a single read of a process memory would result in three memory accesses (if the TLB wasn't present).

The access permission of the kernel code/data are obviously different. In fact, there will probably be other issues such as device memory, etc. However, I think the diagram should make it obvious how the MMU manages to keep multi-threaded memory the same.

It is entirely possible that an entry in the 2nd level table could be different per thread. However, this would incur a cost when switching threads on the same CPU so normally data for all 'thread locals' is mapped and some other way to select the data is used. Normally the thread local data is found via a pointer or index register (special per CPU) which is mapped/points to data inside the 'process' or user memory. 'Thread local data' is not isolated from other threads, so if you have a memory overwrite in one thread you could kill another threads data.


Sorry for previous answer. Deleted the answer.

TI PandaBoard runs on OMAP4430 Dual Cortex A9 Processor. It has one MMU per core. It has 2 MMU for 2 cores.

http://forums.arm.com/index.php?/topic/15240-omap4430-panda-board-armcortex-a9-mp-core-mmu/

The above thread provides the info.

In Addition , some more information on ARM v7

Each core has the following features:

  1. ARM v7 CPU at 600 MHz
  2. 32 KB of L1 instruction CACHE with parity check
  3. 32 KB of L1 data CACHE with parity check
  4. Embedded FPU for single and double data precision scalar floating-point operations
  5. Memory management unit (MMU)
  6. ARM, Thumb2 and Thumb2-EE instruction set support
  7. TrustZone© security extension
  8. Program Trace Macrocell and CoreSight© component for software debug
  9. JTAG interface
  10. AMBA© 3 AXI 64-bit interface
  11. 32-bit timer with 8-bit prescaler
  12. Internal watchdog (working also as timer)

The dual core configuration is completed by a common set of components:

  1. Snoop control unit (SCU) to manage inter-process communication, cache-2-cache and system memory transfer, cache coherency
  2. Generic interrupt control (GIC) unit configured to support 128 independent interrupt sources with software configurable priority and routing between the two cores
  3. 64-bit global timer with 8-bit prescaler
  4. Asynchronous accelerator coherency port (ACP)
  5. Parity support to detect internal memory failures during runtime
  6. 512 KB of unified 8-way set associative L2 cache with support for parity check and ECC
  7. L2 Cache controller based on PL310 IP released by ARM
  8. Dual 64-bit AMBA 3 AXI interface with possible filtering on the second one to use a single port for DDR memory access

Though all these are for ARM , it will provide general idea.


Answers here so far seem to be unaware of the existence of the Translation Lookaside Buffer (TLB), which is the MMU's way of converting the virtual addresses used by a process to a physical memory address.

Note that these days the TLB itself is a complicated beast with multiple levels of caching. Just like a CPU's regular RAM caches (L1-L3), you wouldn't necessarily expect it's state at any given instant to contain info exclusively about the currently running process but for that to be moved in piecemeal on demand; see the Context Switch section of the wikipedia page.

On SMP, all processors' TLBs need to keep a consistent view of the system page table. See e.g this section of the linux kernel book for one way of handling it.