Is there a name for this database schema of key values?

It's called Entity-Attribute-Value (also sometimes 'name-value pairs') and it's a classic case of "a round peg in a square hole" when people use the EAV pattern in a relational database.

Here's a list of why you shouldn't use EAV:

  • You can't use data types. It doesn't matter if the value is a date, a number or money (decimal). It's always going to be coerced to varchar. This can be anything from a minor performance problem to a massive gut-ache (ever had to chase down a one-cent variation in a monthly roll-up report?).
  • You can't (easily) enforce constraints. It requires a ridiculous amount of code to enforce "Everyone needs to have a height between 0 and 3 metres" or "Age must be not null and >= 0", as opposed to the 1-2 lines that each of those constraints would be in a properly-modelled system.
  • Related to above, you can't easily guarantee that you get the information you need for each client (age might be missing from one, then the next might be missing their height etc.). You can do it, but it's a hell of a lot more difficult than SELECT height, weight, age FROM Client where height is null or weight is null.
  • Related again, duplicate data is a lot harder to detect (what happens if they give you two ages for one client? De-EAVing the data, as below, will give you two rows of results if you have one attribute doubled. If one client has two separate entries for two attributes, you'll get four rows from the query below).
  • You can't even guarantee that the attribute names are consistent. "Age_yr" might become "AGE_IN_YEARS" or "age". (Admittedly this is less of a problem when you're receiving an extract versus when people are inserting data, but still.)
  • Any sort of nontrivial query is a complete disaster. To relationalise a three-attribute EAV system so you can query it in a rational fashion requires three joins of the EAV table.

Compare:

SELECT cID.ID AS [ID], cH.Value AS [Height], cW.Value AS [Weight], cA.Value AS [Age]
FROM (SELECT DISTINCT ID FROM Client) cID 
      LEFT OUTER JOIN 
    Client cW ON cID.ID = cW.ID AND cW.Metric = "Wt_kg" 
      LEFT OUTER JOIN 
    Client cH ON cID.ID = cH.ID AND cW.Metric = "Ht_cm" 
      LEFT OUTER JOIN 
    Client cA ON cID.ID = cA.ID AND cW.Metric = "Age_yr"

To:

SELECT c.ID, c.Ht_cm, c.Wt_kg, c.Age_yr
FROM Client c

Here's a (very short) list of when you should use EAV:

  • When there's absolutely no way around it and you have to support schema-less data in your database.
  • When you just need to store "stuff" and don't expect to have to need it in a more structured form. Beware, though, the monster called "changing requirements".

I know I just spent this entire post detailing why EAV is a terrible idea in most cases - but there are a few cases where it's needed/unavoidable. however, most of the time (including the example above), it's going to be far more hassle than it's worth. If you have a requirement for wide support of EAV-type data input, you should look at storing them in a key-value system, e.g. Hadoop/HBase, CouchDB, MongoDB, Cassandra, BerkeleyDB.


Entity Attribute Value (EAV)

It is considered to be an anti-pattern by many, including me.

Here are your alternatives:

  1. use database table inheritance

  2. use XML data and SQLXML functions

  3. use a nosql database, like HBase


In PostgreSQL, one very good way to deal with EAV structures is the additional module hstore, available for version 8.4 or later. I quote the manual:

This module implements the hstore data type for storing sets of key/value pairs within a single PostgreSQL value. This can be useful in various scenarios, such as rows with many attributes that are rarely examined, or semi-structured data. Keys and values are simply text strings.

Since Postgres 9.2 there is also the json type and a host of functionality to go with it (most of it added with 9.3).

Postgres 9.4 adds the (largely superior!) "binary JSON" data type jsonb to the list of options. With advanced index options.