How does the CPU communicate with the GPU before drivers are loaded?

"Are there some base instructions...?" Yes, exactly. All GPUs are required to implement one of several simple interfaces - they're too primitive to be called "instruction sets" - which platform firmware ("BIOS" or "UEFI") and drivers that are included with the OS know how to talk to. The usual choice of "simple interface" these days is "VGA" ("Video Graphics Array"), the register-level interface originally defined for the video cards of that standard. (Now 30+ years old!)

For example, if Device Manager on a Windows system IDs the graphics card as the "Microsoft Basic Video Adapter", the OS was unable to find a specific driver for the card and has loaded the VGA-compatible driver instead.

Well, technically, Windows always loads that driver (so it can display e.g. the boot progress screens), then (fairly late in the boot) identifies and loads the "real" driver for your graphics card.

The VGA standard only supports a few low-res graphics modes and text modes, and does not involve what I'd call "running programs" or even "instructions" on the GPU. In essence, system firmware or the "base video driver" just puts it into the desired mode and then writes bits to a bitmap; bits in the bitmap directly correspond to pixels on the screen. Any arithmetic that has to be done to draw lines or curves is done in the CPU. Which is a very low-level and slow way to make stuff show up on the screen. But it's enough for displays from, and simple interactions with, the firmware, for OS installations and early boot-progress screens, etc.

Wikipedia: Video Graphics Array


I'll try to clear up the "voodoo" behind all this, by explaining how the old hardware worked. Modern GPUs do not work like this but they emulate the CPU-to-graphics-card interface.

tl;dr

Graphics chips/cards in the 80s and early 90s had to produce output extremely quickly (relative to clock speed) so they did not execute instructions, but rather had fixed circuits. They just sucked data out of RAM as they went, so the CPU simply needed to dump data in RAM in the right place, and the graphics chip would pick it up and throw it on the screen. The CPU could also set various configuration variables on the graphics chip.

Details:

In the 80s, home computers had a really "dumb" graphics chip that had a few fixed behaviours. It will make the most sense if I go through the pipeline backwards.

CRT Monitors

These monitors needed analog inputs. In other words, higher voltage = brighter output. Colour monitors had 3 channels (Red, Green and Blue (or, eg. YUV or YIQ)). These voltages adjusted the strength of an electron beam. Simple stuff.

CRT monitors literally used electromagnets to deflect the electron beam from left to right, then start again a little bit lower and go left to right, and so on from top to bottom. Then back to the top and repeat.

DAC

Graphics chips had a "digital to analog" converter (a very common electrical component). This converted digital values (e.g. 2, 4, or 8 bits) to voltages that could be supplied to the monitor.

Scanning

Graphics chips had to "keep up" with the electron beam, sending the right value to the DAC so that it could output the corresponding voltage at the right time. (Clocks were used for this which I won't go into.) There wasn't time to execute instructions here. Everything was hard-wired and took a small, fixed number of clock cycles.

Video modes

Early chips were not very fast and had limited RAM. Because of this, they tended to allow the selection of various modes and other configuration parameters, for example background colour, font selection, cursor location and size, palette selection, and sprites. Most offered a high-resolution "character-only" mode, and lower-resolution pixel-by-pixel modes.

The three noteworthy VGA modes are:

  • 16(ish) colour 80x25 text mode (this is essentially what a BIOS loading screen looks like)
  • 16 colour 640x480 high-res mode
  • 256-colour 320x200 high-colour mode

Painting pixels

Very roughly, depending on the graphics system, the pipeline looks something like this:

Current pixel location ⇒ Process character/font/sprite/pixel/config data ⇒ Pixel values ⇒ Palette ⇒ DAC

It's that 2nd step that needs to read from a couple of RAM locations. For instance, in Text Mode, a 1-byte character would be looked up. This would form an index into a font table. A bit would be looked up from this table, indicating whether that pixel should be the foreground or background colour. A third byte would be fetched to get that foreground/background colour. All in all, 3 bytes read from RAM.

But this "flow" is pretty much a set of simple fixed circuits that are arranged exactly like, well, the flow just described.

Memory bus interface

Intel CPUs have this annoying legacy thing called an IO bus but it's not important so I'll pretend it's not there.

CPUs access RAM by broadcasting a READ or WRITE request, and an address, on the memory bus. Although most of the valid addresses elicit a response from RAM, certain ranges are handled by devices instead. For instance, READing from a particular address might give you information about keyboard keypresses.

By writing to the right parts of the "graphics range", you can write both the screen content, and also set the graphics card configuration parameters. The "dumb" graphics chip doesn't execute any instructions. It just keeps plodding along, having a few bytes flowing through its circuits and outputting voltages.

With VGA, there is actually RAM on the graphics card, because you can configure the graphics card to pre-processed data before it gets written to graphics RAM, to boost performance in some situations.

VESA

Graphics cards after VGA offered higher resolutions and good colour depth but worked with similar principles. Many modern graphics cards still provide compatibility with this to allow higher res during booting. But VGA is the "foolproof" one that practically every card will emulate.