What are some best practices and "rules of thumb" for creating database indexes?

As @David Aldridge mentioned, the majority of databases perform many more reads than they do writes and in addition, appropriate indexes will often be utilised even when performing INSERTS (to determine the correct place to INSERT).

The critical indexes under an unknown production workload are often hard to guess/estimate, and a set of indexes should not be viewed as set once and forget. Indexes should be monitored and altered with changing workloads (that new killer report, for instance).

Nothing beats profiling; if you guess your indexes, you will often miss the really important ones.

As a general rule, if I have little idea how the database will be queried, then I will create indexes on all Foriegn Keys, profile under a workload (think UAT release) and remove those that are not being used, as well as creating important missing indexes.

Also, make sure that a scheduled index maintenance plan is also created.


Here's a slightly simplistic overview: it's certainly true that there is an overhead to data modifications due to the presence of indexes, but you ought to consider the relative number of reads and writes to the data. In general the number of reads is far higher than the number of writes, and you should take that into account when defining an indexing strategy.

When it comes to which columns to index I'v e always felt that the designer ought to know the business well enough to be able to take a very good first pass at which columns are likely to benefit. Other then that it really comes down to feedback from the programmers, full-scale testing, and system monitoring (preferably with extensive internal metrics on performance to capture long-running operations),


Some of my rules of thumb:

  • Index ALL primary keys (I think most RDBMS do this when the table is created).
  • Index ALL foreign key columns.
  • Create more indexes ONLY if:
    • Queries are slow.
    • You know the data volume is going to increase significantly.
  • Run statistics when populating a lot of data in tables.

If a query is slow, look at the execution plan and:

  • If the query for a table only uses a few columns, put all those columns into an index, then you can help the RDBMS to only use the index.
  • Don't waste resources indexing tiny tables (hundreds of records).
  • Index multiple columns in order from high cardinality to less. This means: first index the columns with more distinct values, followed by columns with fewer distinct values.
  • If a query needs to access more than 10% of the data, a full scan is normally better than an index.