Just what is 'A big database'?

One way to figure it is by observing your test queries.

A small database is one where indexes don't matter.

A medium database is one where queries take longer than one second if you don't have an appropriate index in place.

A big database is one where queries often take hours to optimize, using a combination of query design, index modification, and many test cycles.


There isn't a threshold where a small database becomes medium or a medium database becomes large. Generally, when I hear these terms, I think of particular orders of magnitude in terms of total records being stored.

  • Small: Fits in a spreadsheet.
  • Medium: Fits in memory on a commodity server.
  • Large: Fits in a commodity cloud offering.
  • Very large: Fits in a specialized environment; unusual storage, latency, or throughput characteristics.

As poster dkretz suggested, you could also think about it in terms of the properties each kind of database has. Categorizing it this way, I'd say:

  • Small: Performance is not a concern. Your queries run fine without making any special optimizations. You see only a marginal performance difference when using front-line enhancements like indexes.

  • Medium: Your database probably has one or more staff that are assigned part-time to its maintenance and care. These people pay attention to the database's health; their primary administrative responsibility is to prevent unacceptable performance problems and minimize downtime.

  • Large: Probably has dedicated staff member(s) whose job is to work on the database and improve performance, as well as make sure that application changes don't cause schema breakage over the lifetime of the database. Metrics about the health and status of the database are monitored closely. Significant expertise is required to understand and perform optimizations.

  • Very large: The database stores vast amounts of information that must be readily accessible. Performance optimizations are absolutely required to wring every last ounce of speed out of each queries, and without it, the database would be much less usable or even impossible to use. The database may be using sophisticated or innovative replication or clustering techniques, pushing the boundaries of current technology.

Note that these are entirely subjective, and that someone may very well have a perfectly legitimate alternate definition of "large".

Tags:

Database