Foreign keys - link using surrogate or natural key?

Neither SQL nor the relational model are disturbed by foreign keys that reference a natural key. In fact, referencing natural keys often dramatically improves performance. You'd be surprised how often the information you need is completely contained in a natural key; referencing that key trades a join for a wider table (and consequently reduces the number of rows you can store in one page).

By definition, the information you need is always completely contained in the natural key of every "lookup" table. (The term lookup table is informal. In the relational model, all tables are just tables. A table of US postal codes might have rows that look like this: {AK, Alaska}, {AL, Alabama}, {AZ, Arizona}, etc. Most people would call that a lookup table.)

On big systems, it's not unusual to find tables that have more than one candidate key. It's also not unusual for tables that serve one part of the enterprise to reference one candidate key, and tables that serve another part of the enterprise to reference a different candidate key. This is one of the strengths of the relational model, and it's a part of the relational model that SQL supports pretty well.

You'll run into two problems when you reference natural keys in tables that also have a surrogate key.

First, you'll surprise people. Although I usually lobby strongly for the Principle of Least Surprise, this is one situation where I don't mind surprising people. When the problem is that developers are surprised by the logical use of foreign keys, the solution is education, not redesign.

Second, ORMs aren't generally designed around the relational model, and they sometimes embody assumptions that don't reflect best practice. (In fact, they often seem to be designed without ever having input from a database professional.) Requiring an ID number in every table is one of those assumptions. Another one is assuming that the ORM application "owns" the database. (So it's free to create, drop, and rename tables and columns.)

I have worked on a database system that served data to hundreds of application programs written in at least two dozen languages over a period of 30 years. That database belongs to the enterprise, not to an ORM.

A fork that introduces breaking changes should be a show-stopper.

I measured performance with both natural keys and surrogate keys at a company I used to work at. There's a tipping point at which surrogate keys begin to outperform natural keys. (Assuming no additional effort to keep natural key performance high, like partitioning, partial indexes, function-based indexes, extra tablespaces, using solid-state disks, etc.) By my estimates for that company, they'll reach that tipping point in about 2045. In the meantime, they get better performance with natural keys.

Other relevant answers: In Database Schema Confusing


The main reason I support surrogate keys is that natural keys are often subject to change and that means all related tables must be updated which can put quite a load on the server.

Further in the 30 years I have been using a variety of databases on many topics, the true natural key is often fairly rare. Things are supposedly unique (SSN) are not, things that are unique at a particular time can become non-unique later and some things like emails addresses and phone numbers may be unique, but they can be re-used for different people at a later date. Of course some things simply don't have a good unique identifier like names of people and corporations.

As to avoiding joins by using a natural key. Yes that can speed up the select statements that don't need the joins, but it will cause the places where you still need the joins to be slower as int joins are generally faster. It will also probably slow down inserts and deletes and will cause performance problems on updates when the key changes. Complex queries (which are slower anyway) will be even slower. So simple queries are faster but reporting and complex queries and many actions against the database can be slower. It is a balancing act, that may tip one way or the other depending on how your database is queried.

So there is not a one-size fits all answer. It depends on your database and how it will be queried and what type of information is stored in it. You may need to do some testing to find out what works best in your own environment.