Why does RAM (any type) access time decrease so slowly?

It's because it's easier and cheaper to increase the bandwidth of the DRAM than to decrease the latency. To get the data from an open row of ram, a non trivial amount of work is necessary.

The column address needs to be decoded, the muxes selecting which lines to access need to be driven, and the data needs to move across the chip to the output buffers. This takes a little bit of time, especially given that the SDRAM chips are manufactured on a process tailored to high ram densities and not high logic speeds. To increase the bandwidth say by using DDR(1,2,3 or 4), most of the logic can be either widened or pipelined, and can operate at the same speed as in the previous generation. The only thing that needs to be faster is the I/O driver for the DDR pins.

By contrast, to decrease the latency the entire operation needs to be sped up, which is much harder. Most likely, parts of the ram would need to be made on a process similar to that for high speed CPUs, increasing the cost substantially (the high speed process is more expensive, plus each chip needs to go through 2 different processes).

If you compare CPU caches with RAM and hard disk/SSD, there's an inverse relationship between storage being large, and storage being fast. An L1$ is very fast, but can only hold between 32 and 256kB of data. The reason it is so fast is because it is small:

  • It can be placed very close to the CPU using it, meaning data has to travel a shorter distance to get to it
  • The wires on it can be made shorter, again meaning it takes less time for data to travel across it
  • It doesn't take up much area or many transistors, so making it on a speed optimized process and using a lot of power per bit stored isn't that expensive

As you move up the hierarchy each storage option gets larger in capacity, but also larger in area and farther away from the device using it, meaning the device must get slower.


C_Elegans provides one part of the answer — it is hard to decrease the overall latency of a memory cycle.

The other part of the answer is that in modern hierarchical memory systems (multiple levels of caching), memory bandwidth has a much stronger influence on overall system performance than memory latency, and so that's where all of the latest development efforts have been focused.

This is true in both general computing, where many processes/threads are running in parallel, as well as embedded systems. For example, in the HD video work that I do, I don't care about latencies on the order of milliseconds, but I do need multiple gigabytes/second of bandwidth.


I don't have that much insights, but I expect it is a bit of all.

Economic

For the majority of computers/telephones, the speed is more than enough. For faster data storages, SSD has been developed. People can use video/music and other speed intensive tasks in (almost) real time. So there is not so much need for more speed (except for specific applications like weather prediction etc).

Another reason is to process a very high RAM speed, CPUs are needed which are fast. And this comes with a lot of power usage. Since the tendency of using them in battery devices (like mobile phones), prevents the use of very fast RAM (and CPUs), thus makes it also not economically useful to make them.

Technical

By the decreasing size of chips/ICs (nm level now), the speed goes up, but not significantly. It is more often used for increasing the amount of RAM, which is needed harder (also a economic reason).

Fundamental

As an example (both are circuits): the easiest way to get more speed (used by SSD), is to just spread the load over multiple components, this way the 'processing' speeds adds up too. Compare using 8 USB sticks reading from at the same time and combining the results, instead of reading data from 1 USB stick after each other (takes 8 times as long).