Database design - empty fields

There is definitely a school of thought which holds that NULL fields are bad, in and of themselves. Relational theory demands that databases consist of facts, and NULLs are the absence of fact. So, a rigorously designed database would have no nullable columns.

Your colleague is proposing something which is on the road to 6th Normal Form, where all the tables consist of a primary key and at most one other column. Only in such a schema we wouldn't have tables called customer_info_fr. That's not normalised. Many countries might include ENTRY_CODE in their addresses. So we would need address_entry_codes and address_floor_numbers. Not to mention address_building_number and address_building_name, as some places are identified by number and other by name.

It's completely accurate and truthful as a logical design. Alas from a physical perspective it is Teh Suck! The simplest query - select * from addresses - becomes a multi-table join, and outer joins at that. Nullable columns are a way of reconciling ugly design with the hard truth, "you cannae break the laws of physics". Nullable columns allow us to combine disjoint data sets into a single table, albeit at the cost of handling nulls (they can affect data retrieval, index usage, maths, etc).


Some designs attempt to get around the use of nulls by applying magic values. That is, if we don't know the correct value for some column we inject a default value which is a value but also means "unknown". A common instance of this is date '9999-12-31' as an open-ended TO_DATE in a FROM-TO date range. As long as everybody understands and adheres to the convention it's not a problem. It becomes a problem when some tables have date '9999-12-01' or date '9999-01-31' instead.

This is why magic values are not a robust solution. Consumers of our data need to know that -1 is the value we use for DofQ in our stock control system when we don't know the real value. But at least it's obviously not a valid value. Choosing say 20 as a magic value is deadly because it could be a real DofQ: we can no longer tell the actual values from the "don't knows".

So, given a choice between nulls and magic values, choose nulls.


I'd be interested in your colleague's justification as to why empty fields are bad. As far as I'm aware, empty or null fields aren't bad in and of themselves. If you have a lot of empty data values for a column that you are planning on putting an important index on, you may want to consider other options. This goes for any column where you have a lot of duplicate records actually and need an index, as duplicated records lower the cardinality of the column, making indexes less useful. In your case, I don't see it being an issue.

For this kind of data, you're likely using a VARCHAR or some kind of TEXT column anyway, which are variable length fields in the database. It doesn't matter if your field is full of data or empty, you're still going to incur the overhead of a variable-length column (which isn't worth worrying about in normal circumstances). So again, there's no difference to the RDBMS.

From the sounds of what you're designing, I think if you came up with a generic method of handling address variances in a single table, it would be the way to go. Your code and structure would be much simpler at the negligible (in my opinion) cost of some empty data fields.