What are the differences between a superkey and a candidate key?

Candidate key is a super key from which you cannot remove any fields.

For instance, a software release can be identified either by major/minor version, or by the build date (we assume nightly builds).

Storing date in three fields is not a good idea of course, but let's pretend it is for demonstration purposes:

year  month date  major  minor
2008  01    13     0      1
2008  04    23     0      2
2009  11    05     1      0
2010  04    05     1      1

So (year, major, minor) or (year, month, date, major) are super keys (since they are unique) but not candidate keys, since you can remove year or major and the remaining set of columns will still be a super key.

(year, month, date) and (major, minor) are candidate keys, since you cannot remove any of the fields from them without breaking uniqueness.


A super key is any combination of columns that uniquely identifies a row in a table. A candidate key is a super key which cannot have any columns removed from it without losing the unique identification property. This property is sometimes known as minimality or (better) irreducibility.

A super key ≠ a primary key in general. The primary key is simply a candidate key chosen to be the main key. However, in dependency theory, candidate keys are important and the primary key is not more important than any of the other candidate keys. Non-primary candidate keys are also known as alternative keys.

Consider this table of Elements:

CREATE TABLE elements
(
    atomic_number   INTEGER NOT NULL PRIMARY KEY
                    CHECK (atomic_number > 0 AND atomic_number < 120),
    symbol          CHAR(3) NOT NULL UNIQUE,
    name            CHAR(20) NOT NULL UNIQUE,
    atomic_weight   DECIMAL(8,4) NOT NULL,
    period          SMALLINT NOT NULL
                    CHECK (period BETWEEN 1 AND 7),
    group           CHAR(2) NOT NULL
                    -- 'L' for Lanthanoids, 'A' for Actinoids
                    CHECK (group IN ('1', '2', 'L', 'A', '3', '4', '5', '6',
                                     '7', '8', '9', '10', '11', '12', '13',
                                     '14', '15', '16', '17', '18')),
    stable          CHAR(1) DEFAULT 'Y' NOT NULL
                    CHECK (stable IN ('Y', 'N'))
);

It has three unique identifiers - atomic number, element name, and symbol. Each of these, therefore, is a candidate key. Further, unless you are dealing with a table that can only ever hold one row of data (in which case the empty set (of columns) is a candidate key), you cannot have a smaller-than-one-column candidate key, so the candidate keys are irreducible.

Consider a key made up of { atomic number, element name, symbol }. If you supply a consistent set of values for these three fields (say { 6, Carbon, C }), then you uniquely identify the entry for an element - Carbon. However, this is very much a super key that is not a candidate key because it is not irreducible; you can eliminate any two of the three fields without losing the unique identification property.

As another example, consider a key made up of { atomic number, period, group }. Again, this is a unique identifier for a row; { 6, 2, 14 } identifies Carbon (again). If it were not for the Lanthanoids and Actinoids, then the combination of { period, group } would be unique, but because of them, it is not. However, as before, atomic number on its own is sufficient to uniquely identify an element, so this is a super key and not a candidate key.

Tags:

Database