What is the purpose of CS and IP registers in Intel 8086 assembly?

The instruction that will be executed next is that at memory address equal to:

16 * CS + IP

This allows 20 bits of memory to be addressed, despite registers being only 16 bits wide (and it also creates two distinct ways to encode most of the addresses).

The effect of CS is analogous to that of the other segment registers. E.g., DS increments data accesses (that don't specify another segment register) by 16 * DS.

CS

The instructions that modify CS are:

  • ljmp (far jump)
  • lcall (far call), which pushes ip and cs to the stack, and then far jumps
  • lref (far return), which inverses the far call
  • int, which reads IP / CS from the Interrupt Vector Table
  • iret, which reverse an int

CS cannot me modified by mov like the other segment registers. Trying to encode it with the standard identifier for CS, which GNU GAS 2.24 does without complaining if you write:

mov %ax, %cs

leads to an invalid code exception when executed.

To observe the effect of CS, try adding the following to a boot sector and running it in QEMU as explained here https://stackoverflow.com/a/32483545/895245

/* $1 is the new CS, $1f the new IP. */
ljmp $1, $after1
after1:
/* Skip 16 bytes to make up for the CS == 1. */
.skip 0x10
mov %cs, %ax
/* cs == 1 */

ljmp $2, $after2
after2:
.skip 0x20
mov %cs, %ax
/* cs == 2 */

IP

IP increases automatically whenever an instruction is executed by the length of the encoding of that instruction: this is why the program moves forward!

IP is modified by the same instructions that modify CS, and by the non-far versions of those instructions as well (more common case).

IP cannot be observed directly, so it is harder to play with it. Check this question for alternatives: Reading Program Counter directly


since the 8086 processor uses 20 bits addressing, we can access 1MB of memory, but registers of 8086 is only 16 bits,so to access the data from the memory we are combining the values present in code segment registers and instruction pointer registers to generate a physical address, it is done by moving the value of CS 4 bits towards left and then adding it with the value IP

EXAMPLE:

value of CS is 1234Hex(hexa decimal)

value of IP is 5678Hex

now value of CS after moving 4 bits left is 12340Hex then after adding with IP value it is 179B8Hex which is the physical address


Since the Instruction Pointer (IP) is 16 bit it means you can only have 64k instructions (2^16), which wasn't much even in the 80s. So to expand the address space you have a second register which addresses 64k blocks. You could consider cs:ip together as one 32 bit register which is then capable of addressing 2^32 bytes...ie 4G which is what you get on a processor which uses 32 bit addresses. The 8086 was using 20 bits of addresses, so you could access 1M of memory.


The physical address is calculated from 2 parts. i) segment address. ii) offset address. The CS(code segment register) is used to address the code segment of the memory i.e a location in the memory where the code is stored. The IP(Instruction pointer) contains the offset within the code segment of the memory. Hence CS:IP is used to point to the location (i.e to calculate the physical address)of the code in the memory.

Tags:

X86

Intel

X86 16