When to do or not do INVLPG, MOV to CR3 to minimize TLB flushing

In the simplest possible terms; the requirement is that anything the CPU's TLB could have remembered that has changed has to be invalidated before anything that relies on the change happens.

The things that the CPU's could have remembered include:

  • the final permissions for the page (the combination of read/write/execute permissions from the page table entry, page directory entry, etc); including whether the page is present or not (see the warning below)
  • the physical address of the page
  • the "accessed" and "dirty" flags
  • the flags that effect caching
  • whether it's a normal page or a large (2 or 4 MiB) page or a huge (1 GiB) page

WARNING: Because Intel CPUs don't remember "not present" pages, documentation from Intel may say that you don't need to invalidate when changing a page from "not present" to "present". Intel's documentation is only correct for Intel CPUs. It is not correct for all 80x86 CPUs. Some CPUs (mostly Cyrix) do remember when a page was "not present" and because of those CPUs you do have to invalidate when changing a page from "not present" to "present".

Note that due to speculative execution you can not cut corners. For example, if you know a page has never been accessed you can't assume it's not in the TLB because the TLB may have been speculatively fetched.

I have chosen the words "before anything that relies on the change happens" very carefully. Modern CPUs (especially for long mode) do cache the higher level paging structures (e.g. PDPT entries) and not just the final pages. This means that if you change a higher level paging structure but the page table entries themselves remain the same, you still need to invalidate.

It also means that it is possible to skip the invalidation if nothing relies on the change. A simple example of this is with the accessed and dirty flags - if you aren't relying on these flags (to determine "least recently used" and which pages to send to swap space) then it doesn't matter much if the CPU doesn't realise that you've change them. It is also possible (not recommended for single-CPU but very recommended for multi-CPU) to skip the TLB invalidation in cases where you'd get a page fault if the CPU is using the old/stale TLB information, where the page fault handler invalidates if and only if it's actually necessary.

In addition; "anything the CPU's TLB could have remembered" is a little tricky. Often an OS will map the paging structures themselves into the virtual address space to allow fast/easy access to them (e.g. the common "recursive mapping" trick where you pretend the page directory is a page table). In this case when you change a page directory entry you need to invalidate the effected normal pages (as you'd expect) but you also need to invalidate anything the change effected in any mappings.

For which to use (INVLPG or reloading CR3) there are several issues. For a single page INVLPG will be faster. If you change a page directory (effecting 1024 pages or 512 pages, depending on which flavour of paging) then using INVLPG in a loop may or may not be more expensive that just reloading CR3 (it depends on CPU/hardware, and the access patterns for the code following the invalidation).

There are 2 more issues that come into this. The first is task switching. When switching between tasks that use different virtual address spaces you have to change CR3. This means that if you change something that effects a large area (e.g. a page directory) you can improve overall performance by doing a task switch early, rather than reloading CR3 now (for invalidation) and then reloading CR3 soon after (for the task switch). Basically, it's a "kill 2 birds with one stone" optimisation.

The other thing is "global pages". Typically there's pages that are the same in all virtual address spaces (e.g. the kernel). When you reload CR3 (e.g. during a task switch) you don't want TLBs for the pages that remain the same to be invalidated for no reason, because that would hurt performance more than necessary. To fix that and improve performance, (for Pentium and later) there's a feature called "global pages" where you get to mark these common pages as global and they are not invalidated when you reload CR3. In that case, if you need to invalidate global pages you need to use either INVPLG or change CR4 (e.g. disable and then reenable the global pages feature). For larger areas (e.g. changing a page directory and not just one page) it's the same as before (messing with CR4 may be faster or slower than INVLPG in a loop).


To your first question:

  1. You can always use INVLPG and you can do any change possible. Use of INVLPG is always safe.
  2. Reloading CR3 does not invalidate global pages in the TLB. So sometimes you must use INVLPG as reloading CR3 has no effect.
  3. INVLPG must be used for every page involved. If you are changing multiple pages at a time then there comes a point where reloading CR3 is faster than a multitude of INVLPG calls.
  4. Don't forget the Address Space Identifier extension on modern CPUs.

To your second question:

A page that is not mapped can not be cached in the TLB (assuming you invalidated it correctly when you unmapped it previously). So any change from not-present does not need INVLPG or CR3 reloading.