DB Design: 1st Normal Form and Repeating Groups

Design 2 and Design 4 are the best ways to go provided the results will not always be present (aka NULLs in Desigin 1). If they always are taken, then the first design is fine.

I believe repeating groups in SQL would actually be if you have a column stuffed with add'l values e.g. Phone_Number contains "123-444-4444,123-333-3334" etc.

Anyway, the later designs are suboptimal -- you continue to take that to the final level and have the "One True Lookup Table" http://www.dbazine.com/ofinterest/oi-articles/celko22 or Entity Attribute Value http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:10678084117056

Either way, it's almost always a bad thing. Although they may share a common datatype/domain, the meaning differs -- thus they should remain individual attributes (maxtemp, mintemp, etc.)


Here's the rule on repeating groups -- what is functionally dependent?

If the statistic value is functionally dependent on SN, Test and Statistic Name, then you have three key elements and one value element. ( SN, Test, Statistic -> Value )

In this specific case -- aggregated data (mean, sum, min, max) -- you have ambiguity because you're not dealing with atomic objects, you're dealing with aggregates. Strictly speaking, you shouldn't store aggregates, you should compute them. (Yes, I know it's impractical, but that's the relational theory.)

For other cases, it's usually obvious what's a key and what's a value for repeating groups. In this case, however, you're at the murky edge because your storing derivable data.

For your examples, follow the data warehouse design to locate a more pragmatic test:

Would you Slice and Dice by the other key?

Think of your statistical fact as a point surrounded by three dimensions: (SN, Test, Statistic). Is this valid? (With summary data, it's often murky.)

Instead, let's look at the detail data we should have kept: SN, Test, Score. There are clearly two dimensions (SN, Test) and one measure (score) at the intersection of those two dimensions. We can derive any number of statistics from this detailed data using either dimension (SN or Test)

For the battery example, you probably do want to create it as an EAV database instead of a more typical relational database. Your measurements (AvergaeCurrent and BatteryCapacity) give you good reasons to use an Entity-Attribute-Value database design.

Note that ALL relational design is a tension between longer relations and EAV triples. You must always balance the "is this is key" vs. "is this a column" because you can always label everything as a attribute key and use an EAV design.


I think of (and was taught) 1NF as 'all rows should be the same length' rather than 'no repeating groups'. With that view you can make a decision slightly more easily from the following:

In design 1, are both tests ALWAYS present? If so, then it isn't truly a repeating group. Are all the averages always present in design 2? Could there be more (or less) in a given row?

In design 4, are both those values always present? If so, it's fine. If not then design 5 should be used.