Why not always use GUIDs instead of Integer IDs?

Integers join a lot faster, for one. This is especially important when dealing with millions of rows.

For two, GUIDs take up more space than integers do. Again, very important when dealing with millions of rows.

For three, GUIDs sometimes take different formats that can cause hiccups in applications, etc. An integer is an integer, through and through.

A more in depth look can be found here, and on Jeff's blog.


GUIDs are great from a programmer's perspective - they're guaranteed to be (almost) unique, so why not use them everywhere, right?

If you look at it from the DBA perspective and from the database standpoint, at least for SQL Server, there are a few things to consider:

  • GUIDs as primary key (which is responsible for uniquely identifying a single row in your table) might be okay - after all, they're unique, right?
  • however, SQL Server also has the concept of the clustering key, which physically orders the data in your table; if you don't know about this, and don't do anything explicitly, your primary key becomes your clustering key.

Kimberly Tripp - world-known expert on SQL Server indexing and performance - has a great many blog posts on why a GUID as your clustering key is a really bad idea - check out her blog on indexes.

Most notably, her best practices for a clustering key are:

  • narrow
  • static
  • unique
  • ever-increasing

GUIDs are typically static and unique - but they're neither narrow (16 byte vs. 4 byte for a INT) nor ever-increasing. Due to their nature, they're unique and (pseudo-)random.

The narrow part is important because the clustering key will be added to each and every index page for each and every non-clustered index on your table - and if you have a few of those, and a few million rows in your table, this amounts to a massive waste of space - and not just on your disk, but also in your SQL Server's RAM.

The ever-increasing part is immportant, because the randomness of the GUIDs causes a lot of fragmentation in your indices, which negatively affects your performance all around. Even the newsequentialid() of SQL Server 2005 and up doesn't really create sequential GUIDs all around - they're sequential for a while and then there's a jump again, causing fragmentation (less than totally random GUIDs, but still).

So all in all, if you're really concerned with your SQL Server performance, using GUIDs as a clustering key is a really bad idea - use INT IDENTITY() instead, possibly using a GUID as the primary (non-clustered) key if you really have to.

Marc


  1. GUIDs are four times larger than an int, and twice as large as a bigint.

  2. GUIDs are really difficult to look at if you are trying to troubleshoot tables.