Why is x86 ugly? Why is it considered inferior when compared to others?

The main knock against x86 in my mind is its CISC origins - the instruction set contains a lot of implicit interdependencies. These interdependencies make it difficult to do things like instruction reordering on the chip, because the artifacts and semantics of those interdependencies must be preserved for each instruction.

For example, most x86 integer add & subtract instructions modify the flags register. After performing an add or subtract, the next operation is often to look at the flags register to check for overflow, sign bit, etc. If there's another add after that, it's very difficult to tell whether it's safe to begin execution of the 2nd add before the outcome of the 1st add is known.

On a RISC architecture, the add instruction would specify the input operands and the output register(s), and everything about the operation would take place using only those registers. This makes it much easier to decouple add operations that are near each other because there's no bloomin' flags register forcing everything to line up and execute single file.

The DEC Alpha AXP chip, a MIPS style RISC design, was painfully spartan in the instructions available, but the instruction set was designed to avoid inter-instruction implicit register dependencies. There was no hardware-defined stack register. There was no hardware-defined flags register. Even the instruction pointer was OS defined - if you wanted to return to the caller, you had to work out how the caller was going to let you know what address to return to. This was usually defined by the OS calling convention. On the x86, though, it's defined by the chip hardware.

Anyway, over 3 or 4 generations of Alpha AXP chip designs, the hardware went from being a literal implementation of the spartan instruction set with 32 int registers and 32 float registers to a massively out of order execution engine with 80 internal registers, register renaming, result forwarding (where the result of a previous instruction is forwarded to a later instruction that is dependent on the value) and all sorts of wild and crazy performance boosters. And with all of those bells and whistles, the AXP chip die was still considerably smaller than the comparable Pentium chip die of that time, and the AXP was a hell of a lot faster.

You don't see those kinds of bursts of performance boosting things in the x86 family tree largely because the x86 instruction set's complexity makes many kinds of execution optimizations prohibitively expensive if not impossible. Intel's stroke of genius was in giving up on implementing the x86 instruction set in hardware anymore - all modern x86 chips are actually RISC cores that to a certain degree interpret the x86 instructions, translating them into internal microcode which preserves all the semantics of the original x86 instruction, but allows for a little bit of that RISC out-of-order and other optimizations over the microcode.

I've written a lot of x86 assembler and can fully appreciate the convenience of its CISC roots. But I didn't fully appreciate just how complicated x86 was until I spent some time writing Alpha AXP assembler. I was gobsmacked by AXP's simplicity and uniformity. The differences are enormous, and profound.


Couple of possible reasons for it:

  1. x86 is a relatively old ISA (its progenitors were 8086s, after all)
  2. x86 has evolved significantly several times, but hardware is required to maintain backwards compatibility with old binaries. For example, modern x86 hardware still contains support for running 16 bit code natively. Additionally, several memory-addressing models exist to allow older code to inter-operate on the same processor, such as real mode, protected mode, virtual 8086 mode, and (amd64) long mode. This can be confusing to some.
  3. x86 is a CISC machine. For a long time this meant it was slower than RISC machines like MIPS or ARM, because instructions have data interdependency and flags making most forms of instruction level parallelism difficult to implement. Modern implementations translate the x86 instructions into RISC-like instructions called "micro-ops" under the covers to make these kinds of optimizations practical to implement in hardware.
  4. In some respects, the x86 isn't inferior, it's just different. For example, input/output is handled as memory mapping on the vast majority of architectures, but not on the x86. (NB: Modern x86 machines typically have some form of DMA support, and communicate with other hardware through memory mapping; but the ISA still has I/O instructions like IN and OUT)
  5. The x86 ISA has a very few architectural registers, which can force programs to round-trip through memory more frequently than would otherwise be necessary. The extra instructions needed to do this take execution resources that could be spent on useful work, although efficient store-forwarding keeps the latency low. Modern implementations with register renaming onto a large physical register file can keep many instructions in flight, but lack of architectural registers was still a significant weakness for 32-bit x86. x86-64's increase from 8 to 16 integer and vector registers is one of the biggest factors in 64bit code being faster than 32-bit (along with the more efficient register-call ABI), not the increased width of each register. A further increase from 16 to 32 integer registers would help some, but not as much. (AVX512 does increase to 32 vector registers, though, because floating-point code has higher latency and often needs more constants.) (see comment)
  6. x86 assembly code is complicated because x86 is a complicated architecture with many features. An instruction listing for a typical MIPS machine fits on a single letter sized piece of paper. The equivalent listing for x86 fills several pages, and the instructions just do more, so you often need a bigger explanation of what they do than a listing can provide. For example, the MOVSB instruction needs a relatively large block of C code to describe what it does:

    if (DF==0) 
      *(byte*)DI++ = *(byte*)SI++; 
    else 
      *(byte*)DI-- = *(byte*)SI--;
    

    That's a single instruction doing a load, a store, and two adds or subtracts (controlled by a flag input), each of which would be separate instructions on a RISC machine.

    While MIPS (and similar architectures) simplicity doesn't necessarily make them superior, for teaching an introduction to assembler class it makes sense to start with a simpler ISA. Some assembly classes teach an ultra-simplified subset of x86 called y86, which is simplified beyond the point of not being useful for real use (e.g. no shift instructions), or some teach just the basic x86 instructions.

  7. The x86 uses variable-length opcodes, which add hardware complexity with respect to the parsing of instructions. In the modern era this cost is becoming vanishingly small as CPUs become more and more limited by memory bandwidth than by raw computation, but many "x86 bashing" articles and attitudes come from an era when this cost was comparatively much larger.
    Update 2016: Anandtech has posted a discussion regarding opcode sizes under x64 and AArch64.

EDIT: This is not supposed to be a bash the x86! party. I had little choice but to do some amount of bashing given the way the question's worded. But with the exception of (1), all these things were done for good reasons (see comments). Intel designers aren't stupid -- they wanted to achieve some things with their architecture, and these are some of the taxes they had to pay to make those things a reality.