What are the best practices regarding lookup tables in relational databases?

There is a third approach which has some of the advantages of your two options - put an actual code in the code table. By this I mean a short character sequence that captures the essence of the full value and is unique. For your given example it may be

Idn: 1
Name: Democrats
Code: D      (or DEM)

The Code is carried into transactional tables as a foreign key. It is short, intelligible and somewhat independent of the "real" data. Incremental changes to the a name would not suggest a code change. Should Republicans decamp en masse, however, a change of code may be necessary, with its attendant problems that a surrogate id would not incur.

This style has been termed an abbreviation encoding. I can recommend Celko's writing on this. Google books holds several examples. Search for "Celko encoding".

Other examples: 2 or 3 letter encodings for countries, 3-letter encoding (GBP, USD, EUR) for currency codes. Short, self-explaining and not changing (and there is an ISO for them).

Each of Idn, Code and Name will be unique so each is a candidate key and any one could be chosen as the primary key. So for the example given Idn could be removed from the table definition and Code used instead. Different DBMS handle integers and strings in their own way, so there may be performance considerations. It may be useful to have Idn as the FK in some tables and Code in others.


By IDN, I take it you mean an IDENTITY, SEQUENCE or AUTO_INCREMENT field? You should take a look here and here.

Note, section 5 (Misusing Data values as Data Elements) of the first reference, underneath figure 10

Of course you can have a separate table for the sales persons and then reference it using a foreign key, preferably with a simple surrogate key such as sales_person_id , shown above.

So, this expert thinks that you should "deference" surrogate keys. It is really quite a basic SQL technique and shouldn't cause problems in your day-to-day SQL. It appears that there is an error in figure 10 - the sales_person in SalesData should be a surrogate key (i.e. a number), not text. I'm inferring this from the quote above.

What you should avoid at all costs is the temptation (very common for novice database programmers) to commit the error outlined in section (1) Common Lookup Tables. This is commonly called the MUCK (Massively Unified Code Key) approach (not by accident :-) notably by Joe Celko, also sarcasticlly known as the OTLT - One True Lookup Table) and leads to all sorts of difficulties. Novice programmers appear to feel that a single code/lookup/whatever table is "cleaner" and will be more efficient when nothing could be further from the truth.

From the second reference above:

Normalization eliminates redundant data, thus making the task of enforcing data integrity vastly simpler, but the process of creating a MUCK is something else entirely.MUCK's do not eliminate redundant data, rather they are an elimination of what are PERCEIVED to be redundant tables, but as I will demonstrate, fewer tables does not equal simplicity.

You might also want to take a look at the related EAV (Entity Attribute Value) paradigm which I deal with here.