which CPUs support MOVBE instruction?

This instruction was originally unique to the Intel® Atom™ processor.

From Intel side:

The Intel® Compilers 11.0 allow you to target the Intel® Atom™ processor using the /QxSSE3_ATOM or -xSSE3_ATOM compiler options. These options enable the generation of the movbe instruction which is unique to the Intel® Atom™ processor.

In other microarchitectures (http://instlatx64.atw.hu/ with uop info from https://agner.org/optimize/):

  • Mainstream Intel: Haswell and later. Including Haswell Xeon (Ex-xxxx v3).
    Decodes as 2 or 3 uops, about the same as bswap + load or store.
  • Mainstream AMD: Excavator, and Ryzen-family. Steamroller and earlier don't have it.
    Decodes efficiently to a single uop.

non-mainstream CPUs:

  • Legacy in-order Intel Atom: all
  • Intel Silvermont-family out-of-order Atom: all. Decodes efficiently to a single uop.
  • AMD Jaguar. Decodes efficiently to a single uop.

  • Intel Xeon Phi: Knight's Landing (based on Silvermont) and later. (Maybe not on Knight's corner.)


It appears that all Atom processors support MOVBE; at any rate, the first and least capable (the Atom 230) does. (See e.g. http://www.linuxquestions.org/questions/linux-hardware-18/proc-cpuinfo-output-816192/ for evidence.) I don't believe any non-Atom Intel processors support MOVBE; at any rate, recent Core i7 processors appear not to (see e.g. http://www.techsupportforum.com/forums/f108/i7-running-on-3-of-8-threads-522063.html and search for "movbe" for evidence).

You can detect MOVBE support at runtime using CPUID.