Does a CPU completely freeze when using a DMA?

If there is a single memory interface, there would be hardware to arbitrate between requests. Typically a processor would be given priority over I/O without starving I/O, but even with I/O always having priority the processor would have some opportunities to access memory because I/O tends to have lower bandwidth demands and to be intermittent.

In addition, there is typically more than one interface to memory. Higher performance processors typically have caches (if DMA is not coherent, the caches do not even have to be snooped; even with snooping, overhead would generally be small because of the bandwidth difference between cache and main memory or (when the DMA transfers to L3 cache) between L3 cache and L1 cache), providing a separate interface to access memory. Microcontrollers will often access instructions from a separate flash-based memory, allowing fetch to proceed during DMA to on-chip memory, and often have tightly coupled memory with an independent interface (allowing many data accesses to avoid DMA conflicts).

Even with a single memory interface, the peak bandwidth will generally be higher than the bandwidth typically used. (For instruction fetch, even a small buffer with wider than average fetch loading from memory would allow instruction fetch from the buffer while another agent is using the memory interface, exploiting the tendency of code not to branch.)

Also note that because a processor accesses data, if there is a single memory interface, there must be a mechanism for arbitration between data accesses and instruction accesses.

If the processor (with a single memory interface) was forced to implement a copy from an I/O device buffer to main memory, it would also have to fetch instructions to perform the copy. This could mean two memory accesses per word transferred even in an ISA with memory-memory operations (a load-store ISA could require three memory accesses or more if post-increment memory addressing is not provided); that is in addition to the I/O access which in old systems might share the same interface as main memory. A DMA engine does not access instructions in memory, and so avoids this overhead.


You are correct that the CPU cannot be accessing the memory during a DMA transfer. However there are two factors which in combination allow apparent parallel memory access by the CPU and the device performing the DMA transfer:

  • The CPU takes multiple clock cycles to execute an instruction. Once it has fetched the instruction, which takes maybe one or two cycles, it can often execute the entire instruction without further memory access (unless it is an instruction which itself access memory, such as a mov instruction with an indirect operand).
  • The device performing the DMA transfer is significantly slower than the CPU speed, so the CPU will not need to halt on every instruction but just occasionally when the DMA device is accessing the memory.

In combination, these two factors mean that the device performing the DMA transfer will have little impact on the CPU speed.

EDIT: Forgot to mention that there's also the factor of CPU cache, which as long as the code that the CPU is executing is in the cache then it won't need to access real memory to fetch instructions, so a DMA transfer is not going to get in the way (although if the instruction needs to access memory then obviously a real memory access will take place - potentially having to wait for a break in the DMA device's use of the memory).


Since there is only one bus system, which is blocked by the memory access of the DMA, the CPU can not work whilest the DMA is moving data and is therefore halted.

The idea behind this is the following:

If you want to copy consecutive data from memory, then the CPU would have to do something like that:

Calculate address->read data->calculate new address (+ 1 word)->read data ...

Whereas the DMA does the calculation of the new address in parallel (depending on mode) and is therefore faster. So the DMA can work at full bus throughput (theoretically).