Why do IOPS matter?

Solution 1:

This is because sequential throughput is not how most I/O activity occurs.

Random reads/write operations are more representative of normal system activity, and that's usually bound by IOPS.

Streaming porn from one of my servers to our customers (or uploading to our CDN) is more sequential in nature and you'll see the impact of throughput there.

But maintaining the database that catalogs the porn and tracks user activity through the site is going to be random in nature, and limited by the number of small I/O operations/second that the underlying storage is capable of.

I may need 2,000 IOPS to be able to run the databases at peak usage, but may only see 30MB/s throughput at the disk level because of the type of activity. The disks are capable of 1200MB/s, but the IOPS are the limitation in the environment.

This is a way of describing the capacity potential of a storage system. An SSD may have the ability to do 80,000 IOPS and 600MB/s throughput. You can get that throughput with 6 regular 10k SAS disks, but the would only yield around 2,000 IOPS.

Solution 2:

Throughput

Throughput is useful when you're doing things like copying files. When you're doing almost anything else it's random reads and writes across the disk that will limit you.

IOPS

IOPS typically specify the size of each data packet. For example, AWS gp2 can do 10,000 IOPS with a 16KiB payload size. That multiplies out to 160MiB/sec. However, it's probably unlikely that you'll use the full payload size all the time, so actual throughput will probably be lower. NB KiB is 1024 bytes, KB is 1000 bytes.

Because IOPS specify a packet size that does give you total throughput as well. Whereas high throughput doesn't mean you have high IOPS.

Scenarios

Consider these scenarios:

  • Booting your PC. Consider the difference between an SSD and a spinning disk in your computer, which is something many people have first hand experience with. With a spinning disk the boot time can be a minute, whereas with an SSD this can come down to 10 - 15 seconds. This is because higher IOPS leads to lower latency when information is requested. The throughput of the spinning disk is quite good, 150MB/sec, though the SSD is likely higher this isn't why it's faster - it's the lower latency to return information.
  • Running an OS update. It's going all over the disk, adding and patching files. If you had low IOPS it would be slow, regardless of throughput.
  • Running a database, for example selecting a small amount of data from a large database. It will read from the index, read from a number of files, then return a result. Again it's going all over the disk to gather the information.
  • Playing a game on your PC. It likely loads a large number of textures from all over the disk. In this case IOPS and throughput are likely required.

LTO Tape

Consider for a moment a tape backup system. LTO6 can do 400MB/sec, but (I'm guessing here) probably can't even do one random IOP, it could be as low as seconds per IOP. On the other hand it can probably do a whole lot of sequential IOPS, if an IOPS is defined as reading or writing a parcel of data to tape.

If you tried to boot an OS off tape it would take a long time, if it worked at all. This is why IOPS is often more helpful than throughput.

To understand a storage device you probably want to know if it's random or sequential IOPS, and the IO size. From that you can derive throughput.

AWS

Note that AWS does publish both IOPS and throughput figures for all its storage types, on this page. General purpose SSD (gp2) can do 10,000 16KiB IOPS, which gives a maximum of 160MB/sec. Provisioned IOPS (io1) is 20,000 16KiB IOPS, which gives a maximum of 320MB/sec.

Note that with gp2 volumes you get 30IOPS per GB provisioned, so to get 10,000 IOPS you need a 333.33GB volume. I don't recall if io1 volumes have a similar limitation (it's been a while since I did the associate exams where that kind of thing is tested), but I suspect they do, and if so it's probably 60IOPS per GB.

Conclusion

High sequential throughput is useful, and in some cases is the limiting factor to performance, but high IOPS is likely to be more important in most cases. You do still of course need reasonable throughput regardless of IOPS.


Solution 3:

While ewwhite's answer is completely correct, I wanted to provide some more concrete numbers just to help put why the difference matters in perspective.

As ewwhite already correctly stated, most non-streaming applications primarily perform non-sequential disk operations, which is why IOPS matter in addition to theoretical peak throughput.

When a coworker and I first installed SSDs in our development systems to replace the HDDs we'd previously been using, we ran some performance measurements on them that really highlighted why this matters:

SATA HDD Results:

Sequential Read Throughput: ~100 MB/s
Non-Sequential Read Throughput (2k blocks, IIRC): ~1 MB/s

PCIe-attached SSD Results:

Sequential Read Throughput: ~700 MB/s
Non-sequential Read Throughput (2k blocks, IIRC): ~125 MB/s

As you can clearly see from the example, just listing a max throughput for each device would give an extremely inaccurate picture of how they compare. The SSD is only about 6-7x as fast as the HDD when reading large files sequentially, but it's over 100x as fast when reading small chunks of data from different parts of the disk. Of course, with HDDs, this limitation is largely due to the fact that HDDs must physically move the r/w head to the desired track and then wait for the desired data to spin under the head, while SSDs have no physical parts to move.

Our compile times improved much more dramatically than a simple comparison of the maximum throughputs would have suggested. Builds that previously took over 30 minutes now finished in about a minute, since the disk I/O during a large build consists of reading and writing lots of separate source files which aren't individually very large and may be scattered physically all over the disk.

By providing both throughput and IOPS numbers, you can get a much better idea of how a given workload will perform on a given storage device. If you're just streaming large amounts of data that isn't fragmented, you'll get pretty close to the maximum throughput. However, if you're doing a lot of small reads and/or writes that are not stored sequentially on the disk, you'll be limited by IOPS.


Solution 4:

To perform an IO operation the drive(s) must go through a series of operations. For a mechanical hard drive they need to.

  1. Seek to the right track and select the right head.
  2. Wait for the platter to rotate to the right position.
  3. Actually transfer the data.

The time taken for 3 depends on the size of the block of data, but the time taken for 1 and 2 is independent of the size of the request.

The headline throughput and IOPs figures represent extreme cases. The headline throghput figures represent the case where each operation involves a large block of data, so the drive spends most of it's time actually moving data.

The headline IOPs figure represent the case where the blocks of data are very small so the majority of time is spent seeking the heads and waiting for the platters to rotate.

For many workloads the blocks are sufficiently small that the number of blocks to be transferred is far more important than the size of the blocks.


Solution 5:

There are two types of bottleneck that you can experience on IO volumes (or IO in general in fact).

Actual performance is indeed measured to include a component based on the volume of data moved, scaled by the available bandwith or similar, unitcost * size, but there is also an overhead associated with requests, that is constant, be that disk, network, or numerous other things.

unitcost * size + overhead. the equation of a line.

If the unitcost is large, or the size is large, then it makes sense to charge based around these volumes, such as mobile phone networks, on the other hand sometimes the overheads are far more critical.

You can do a simple experiment of this yourself, create a directory with a few 1GB files (or whatever is practical, something that large enough it takes several seconds to read/write it), and then create a folder with a million 100 byte files (note, thats 0.1GB of data), and then see what happens to your throughput when you start trying to move all this stuff say between different partitions/disks - you'll get performance throttled by throughput for the large files, and throttled by the number of files for the smaller stuff.

I would assume amazon are aware of both charging models and have simply found one better represents their infrastructure's capabilities.

There is a limit on the size of an IOP that is broadly related to the ammount the store can transfer in a "cycle" anyway, thus large requests still end up costing you multiple IOPS.

There's a nice piece here from amazon themselves about IOPS and costing, and 'savings' they pass on through optimisations

I/O Characteristics and Monitoring

Not read it all but it looks interesting, if you're curious about this area.