Movie Database, storing multiple genres

I would suggest you should follow the following structure:

tablename: movies

movieid, title, plot, rating, director

> sample data:
> 
> 1 titanic Bollywood   10  James Cameron

tablename: genres

genreid, genre

> sample data:
>  1    Horror
>  2    Thriller
>  3    Action
>  4    Love

tablename: moviegenres

moviegenresid, movieid, genreid

> sample data:
> 1 1   2
> 2 1   4

And the query is:

select m.*,group_concat(g.genre)
from movies m inner join moviegenres mg
on m.movieid=mg.movieid
inner join genres g
on g.genreid=mg.genreid
group by m.movieid
;

See the fiddle


What you are looking to model here is called a "many-to-many" relationship, and is very common when modelling "real world" categorisations.

There are many descriptions out there of how to work with such relationships including:

  • Praveen's answer here which is specific to your question.
  • http://en.wikipedia.org/wiki/Junction_table - the extra table linking two populations in many/may relationships is usually called an intersection table or junction table.
  • http://www.tomjewett.com/dbdesign/dbdesign.php?page=manymany.php which helpfully shows an example with the table and key/constraints design, a handy data representation diagram in case that isn't clear, and how the relationship is modelled and used in the application.
  • Any good database design book/tutorial will cover this somewhere.

Do not be tempted to skip the extra intersection table by storing multiple genres in one field for each film (a comma separated list for instance). This is a very common "anti pattern" that will cause you problems, maybe not today, maybe not tomorrow, but eventually. I recommend anyone working with database design give Bill Karwin's "SQL Antipatterns" (http://pragprog.com/book/bksqla/sql-antipatterns) a read. It is written in a way that should be accessible to a relative beginner, but contains much that those of us who should know better need reminding of from time to time (many-to-many relations, the list-in-a-field solution/problem, and what you should do instead, are one of the first things the book covers).


This answer is an elaboration of my comment on @Praveen Prasannan's answer above.

I would eliminate the arbitrary surrogate keys movieID and genreID as a way to remove necessary overhead for the relational database. Since title and genre are natural unique keys, we should use them and not ask the database to maintain the uniqueness of extra, meaningless keys and tables (the genres table in the referenced answer). This should improve the speed and performance for large relational databases and is good practice.

tablename: movies

primary key: title

title, plot, rating, director

> sample data:
> Titanic  Bollywood   10  James Cameron

tablename: moviegenres

primary key: title, genre

title, genre

> sample data:
> Titanic   Thriller
> Titanic   Romance

This also makes queries a lot easier for both the user and the machine since you don't have to join an extra table to decode the genres by the arbitrary UID.