How do assembly languages work?

Your CPU doesn’t execute assembly. The assembler converts it into machine code. This process depends on both the particular assembly language and the target computer architecture. Generally those go hand in hand, but you might find different flavors of assembly language (nasm vs. AT&T, for example), which all translate into similar machine code.

A typical (MIPS) assembly instruction such as “And immediate”

andi $t, $s, imm

would become the 32-bit machine code word

0011 00ss ssst tttt iiii iiii iiii iiii

where s and t are numbers from 0–31 which name registers, and i is a 16-bit value. It’s this bit pattern that the CPU actually executes. The 001100 in the beginning is the opcode corresponding to the andi instruction, and the bit pattern that follows — 5-bit source register, 5-bit target register, 16-bit literal — varies depending on the instruction. When this instruction is placed into the CPU, it responds appropriately by decoding the opcode, selecting the registers to be read and written, and configuring the ALU to perform the necessary arithmetic.


The instructions in assembly code map to the actual instruction set and register names for the CPU architecture you're targeting. mov is an X86 instruction, and eax and others are the names of (in this case general purpose) registers defined it the Intel x86 reference manual.

Same thing for other architectures - the assembly code maps quite directly to the actual names of the operations as defined in the chip's specifications/documentation.

That mapping is way more simple than for instance compiling C code.


What you see there are mnemonics, which make it easy for a programmer to write assembly; it is however not executable in mnemonic form. When you pass these assembly instructions through an assembler, they are translated into machine code they represent, which is what the CPU and its various co-processors interpret and execute (it's generally taken down into smaller units by the CPU, called micro-ops).

If you're curious as to how exactly it does that, well that's a long process, but this has all that information.

All the semantics, etc. are handled by the assembler, which checks for validity and integrity where possible (one can still assemble invalid code however!). This basically makes assembly a low-level language, even though it has a 1 to 1 correlation to the outputted machine code (except when using macro based assemblers, but then the macros still expand to 1 to 1).


Computers are basically built out of logic gates. Though this is an abstract idealization of the real physical machinery, it is close enough to the truth that we can believe it for now. At a very basic level, these things work just like true/false predicates. Or if you've ever played minecraft, it works a lot like redstone. The field which studies how to put together logic gates to make interesting complex circuits, like computers, is called computer architecture. It is traditionally viewed as a mixture of computer science and electrical engineering.

The most basic logic gates are things like AND, and OR which just take bits together and smash out some boolean operation between them. By creating feed back loops in logic gates you can store memory. One type of standard memory circuit is called a flip-flop, and it is basically a little loop of wire together with some AND gates and power to keep it stable. Putting together multiple latches lets you create bit vectors, and these things are called registers (which are what things like eax and ebx represent). There are also many other types of parts, like adders, multiplexors and so on which implement various pieces of boolean logic. Here is a directory of some circuits:

http://www.labri.fr/perso/strandh/Teaching/AMP/Common/Strandh-Tutorial/Dir.html

Your CPU is basically a bunch of these things stuck together, all built out of the same basic logic gates. The way that your computer knows how to keep on executing instructions is that there is a special piece of machinery called a clock which emits pulses at regular intervals. When your CPU's clock emits a pulse it sets off a sequence of reactions in these logic gates that causes the CPU to execute an instruction. For example, when it reads an instruction that says "mov eax, ebx", what ends up happening is that the state of one of these registers (ebx) gets copied over to the state of another (eax) just in time before the next pulse of comes out of the clock.

Of course this is a gross oversimplification, but as a high level picture it is essentially correct. The rest of the details take awhile to explain, and there are a few things here that I neglected due to unnecessary subtlety (for example, in a real CPU sometimes multiple instructions get executed in a single clock; and due to register paging sometimes eax isn't always the same thing; and sometimes due to reordering occasionally the way that instructions get executed gets moved around, and so on). However, it is definitely worth learning the whole story since it is actually quite amazing (or at least I like to think so!) You would be doing yourself a great favor to go out and read up on this stuff, and maybe try building a few circuits of your own (either using real hardware, a simulator, or even minecraft!)

Anyway, hope that answers a bit of your question about what mov eax, ebx does.