Choosing primary keys: Scientific names of species or system-assigned numeric identifiers?

I would use my own identifier.

The species name may be unique but - it's too long - it's a string

For example, in SQL Server, if used as the clustered primary key it will be used in non-clustered indexes, thus repeating the long string. And typically foreign keys omn child tables will go to the primary key, thus repeating it again

As a string, you have the overhead of sorting and comparison (case, accents, etc).

Using a surrogate numeric key avoids these problems: but you must create a unique non-clustered index on the species name.

Is species name a good long term identifier too? Not my area of expertise but don't many species have alternate names, or controversies, or get reclassified, or "maybe this species"?

Example: how many Giraffe species are there? 9? 2? 6? 8? 4?


As a former environmental scientist with a bit of "bugs and bunnies" background (fish and inverts specifically), I would recommend the use your own identifier.

As a database administrator, you have stumbled onto what is called the "Species Problem." Wherein it has been argued that the concept of a species is more of a philosophical than an empirical one (Pigliucci 2003). Also consider that taxonomists do not get published for getting rid of species (Jones 2017). Thus, the incentive is always going to be creating new species from existing ones. Database infrastructure needs to account for that.

Building on @gbn, there are some organisms which do not fit neatly into the species concept and the data modelling for these species could get complicated. Consider the case of all female, hybrid populations of Ambystomid Salamanders (Wikipedia 2018). Herpetologists refer to these animals using chromosomal constituents of the animal's DNA. Therefore, the Linnean species approach does not work as what is going on with these animals is far more complicated than simple parthenogenesis (female cloning).

Building on the giraffe and salamander examples, consultation with your end users about conventions in their fields would be in order. For example, mycologists might have unique conventions. Herpetologists on the salamander problem presented above have their own identifying conventions (Wikipedia 2018).

Sources:

Pigliucci, M. (2003). Species as family resemblance concepts: The (dis‐) solution of the species problem?. BioEssays, 25(6), 596-602.

Mole salamander - Hybrid_all-female_populations. In Wikipedia. Retrieved February 10, 2018

Jones, B. (2017). A Few Bad Scientists Are Threatening to Topple Taxonomy Naming species forms the foundation of biology—but these rogue researchers are exposing the flaws in the system. Smithsonian Magazine.


I do not know whether the scientific community ever changes the species name after it has been assigned.

If this does happen, it's another reason to avoid using it as a primary key. Whenever the name changes all references to it have to be changed. Cascaded update helps with declared FKs. It wont help with references that are not declared as FKs.

References to the species name outside the database are going to be a problem no matter which choice you make.