Design of a database about Zoos

While you've done a pretty thorough job of following the textbook requirements, there are some practical considerations that you might or might not want to include, depending on the expectations of the assignment. Some of these have to do with bad requirements. I don't know if you would get extra marks or reduced marks for pointing these out.

  • Food quantity as an integer is probably not precise enough
  • Food quantities should have a unit of measure. Some foods will be dry, some will be liquid. Iguanas probably need a few grams of "Iguana Chow" whereas elephants will need multiple kilos of whatever they're eating.
  • Amounts of money should be stored as MONEY or another non-integer data type.
  • If your zoos are all over the place you might need to consider storing a currency with the monetary amount
  • Storing the age of a visitor is a mistake. Visitors age between visits. You're better off storing the date of birth of the visitor and calculating their age at each visit, since you know the date of the visits
  • In a real system, some of this information is going to be optional. Not everyone is going to tell you their date of birth. Similarly, you may not know when your giant tortoise was born.
  • People's names, if they matter, are usually stored in more than one field (e.g. given name and family name)
  • It's highly unlikely that every animal will have a personal record of their dietary requirements. Some animals might, but for a lot of animals it might make more sense to store the requirements for all animals of a given species.
  • The fact that animal species is not accounted for in the model is kind of questionable, since you would think that most zoos would care about something like that

I don't think there's much to say, it looks like you captured the requirements of the instructions you were given. The only small remark is when the instructions say "a visit is defined by a unique identifier", sometimes "unique identifier" is synonymous with the word GUID (and in some database systems UniqueIdentifier is the actual data type name for a GUID, like Microsoft SQL Server). Not sure if it's meant to be interpreted that way in this context, but figured I'd make you aware.

To answer your second question regarding shortening your view to get the Zoo_Id with the fewest visits, you can do the following that uses a CTE and the ROW_NUMBER() window function:

WITH CTE_ZooVisits_Sorted AS
(
    SELECT v.zoo_id, ROW_NUMBER() OVER (ORDER BY COUNT(v.zoo_id), v.zoo_id) AS SortId -- Generates a unique sequential ID, ordered by the number of Zoo visits, then by the Zoo's ID to break any ties
    FROM Visit v
    GROUP BY v.zoo_id
)

SELECT zoo_id
FROM CTE_ZooVisits_Sorted
WHERE SortId = 1 -- Returns only one row with the minimum amount of Zoo Visits (ties broken by whichever Zoo was created first)

Note with ROW_NUMBER() when there's a tie, it randomly chooses which one comes first in the sort unless you provide a unique field as the tie-breaker, which in my example above I did by zoo_id. (What this logically means is if two Zoo's are tied for number of visits, the Zoo that was created first will break the tie and sorted first.) You can remove the WHERE SortId = 1 in the final SELECT and replace it with an ORDER BY SortId to get the full list of zoo_id ordered by the least amount of visits to most amount of visits.

If you want an alternative where you want to sort ties in the same order without a tie-breaker then instead of the ROW_NUMBER() window function, you can use RANK() or DENSE_RANK() like so:

WITH CTE_ZooVisits_Sorted AS
(
    SELECT v.zoo_id, DENSE_RANK() OVER (ORDER BY COUNT(v.zoo_id)) AS SortId -- Generates a sequential ID, ordered by the number of Zoo visits, ties will have the same sequential ID generated
    FROM Visit v
    GROUP BY v.zoo_id
)

SELECT zoo_id
FROM CTE_ZooVisits_Sorted
WHERE SortId = 1 -- Returns all rows with the minimum amount of Zoo Visits (multiple rows for when there's a tie among minimum visits between multiple Zoos)

Note using a window function like ROW_NUMBER(), RANK(), or DENSE_RANK() is additionally helpful because it lets you select any and all fields of the row that has the minimum number of zoo visits (or whatever sort criteria you want to use).


I recommend you split the animal part and the visitor part into two different schemas. They are conceptually two different domains and you can have different development and maintenance schedules. I think mashing the two together increases complexity and complexity correlates with cost and bugs.

The separation can also help with information security. While the visitor database may need a internet facing presence (e.g. season/annual pass holder accounts), the animal part has no need to be on the internet and can have limited (or no) connectivity to internet.

I would advocate for a third schema--business operations (maintenance, inventory, purchasing). Define a data interchange standard between the different schemas (e.g. food consumption message to the business ops schema).

In my experience the "one database schema to rule them all" becomes expensive and difficult to maintain.