Does an excessive table violate normalization rules?

By changing to your proposed solution you lose information from the database. The existing solution says what clubs can exist in a particular school irrespective of anyone actually being in that club at any point in time. The proposed solution requires someone to join the club before the club comes into existence (i.e. before a row is written to the database).

As a practical implication think of sign-up sheets. It's the day before term starts. The principal wants a sign-up sheet on the noticeboard for each club so students can join. It would be wasteful to print a sheet for all club types and let students join clubs which will never exists in this school. Today, before term starts, there are no students so your proposed solution will not work. The existing solution, however, allows the principal to offer, say, a soccer club but not a water polo club.

I realise you're using the analogy of school/ club as a proxy for your real situation, and you're going to have to translate all I say to the actual problem and that the comments I'm about to make may not apply. That's the price you pay for analogies. If your actual "school" can choose from all "club type" all the time then your proposed solution is adequate.

Or is it just a case of poor design?

It is not poor design. Neither is it good design. It is a design which implements some affordances but precludes others. It was written for reasons to which we do not have access. It was written with the knowledge available at the time. Likely it passed a great many tests and active, production usage.

Now, the world may have moved on since then. The business rules may have changed; the implementation team's understanding may have improved. That design may have a performance characteristic which is not acceptable on your hardware with your workload given your data. It may be appropriate to change that design.

Normalization is about how non-key columns depend on key columns within a single table. It shows how you can change the schema so changing a single value in the real world will update a single column in a single row within the database. It has nothing to say about implementing scenarios from the real-world problem at hand.

I understand your current Student table to mean "a person as a member of a club". For that meaning the primary key will be {student id, club id}. In your current implementation the table is not normalized because Name depends only on student id and not on club id. The normalized solution would be to change the semantics of table Student to "A person" (columns student id, name) and create a new table ClubMember with columns {student id, club id}.

The relationship from Club to ClubType is 1 to 1.

I doubt it. What are possible values of ClubType? Maybe "soccer" or "yoga"? I should think a great many schools would like to have a soccer club. Perhaps

Each Club     is-this-schools exactly one   ClubType  
Each ClubType is-offered-in   zero or more  Club

As an ERD:

ClubType --< Club >-- School

does the first example violate some known rule of database normalization or some other mathematical principle? Or is it just a case of poor design?

Neither. It has no obvious defects of either normalization or good design.

It sensibly models the propositions like the following:

There's a school named 'School1'.
There's a ClubType named 'Spanish Club'.
School1 has a Spanish Club.
There's a student named 'Fred' at School1 who is a member of the Spanish Club there.

The only strange thing about that model is that a Student can only be a member of one Club. It makes sense, it would just be an unusual rule for a real school.

Does an excessive table violate normalization rules?

Tags:

Database Design

Normalization

Related

Recent Posts