In what data type should I store an email address in database?

I've always used VARCHAR(320). Here's why. The standard dictates the following limitations:

  • 64 characters for the "local part" (username).
  • 1 character for the @ symbol.
  • 255 characters for the domain name.

Now, some folks will say you need to support more than that. Some folks will also say that you need to support Unicode for domain names (meaning you have to switch to NVARCHAR). While the standard may change in the meantime (it's been a while since I've had skin in the game), I am quite confident that at this time most servers in the world will not accept Unicode e-mail addresses, and I am sure many servers will have issues creating and/or accepting addresses with > 320 characters.

That said, you can prepare for the worst now, if you like (and if you are using Data Compression in SQL Server 2008 R2 or better, you will benefit from Unicode compression, meaning you only pay the 2 byte penalty for characters that actually need it). This way you can make your column as wide as you want, and you can let people stuff any too-long junk in there that they want - they won't receive an e-mail if they give you junk just like they won't receive an e-mail if the insert fails. The problem is if you let invalid junk in, you have to deal with it. And no matter what size you make it - if someone will try to stuff 400 characters into a 320-character column, someone will try to stuff 1025 characters into a 1024-character column. There is no reason any sensible person should have an e-mail address > 320 characters unless they are using it to explicitly test system boundaries.

But stop asking for opinions on this - and stop looking at other implementations for guidance (it just so happens in this case that the ones you referenced did not bother to do their own homework and just picked numbers out of their, well, you know). You have direct access to the standard - make sure you consult the most current version, support that as a minimum, and stay on top of the standard so you can adapt to changes in specs.


EDIT thanks to @ypercube for the ping in chat.

As an aside, perhaps you don't want to dump the whole address into a single column in the first place. Normalization might suggest that you don't want to store @hotmail.com 15 million times when a much skinnier FK int would work just fine and not have the additional overhead of variable length columns. You could also normalize the username, as [email protected] and [email protected] share a common username - they don't know each other but your database doesn't care about that.

I talked about some of this here:

http://www.mssqltips.com/sqlservertip/2657/storing-email-addresses-more-efficiently-in-sql-server/

http://www.mssqltips.com/sqlservertip/2671/storing-email-addresses-more-efficiently-in-sql-server--part-2/

This introduces challenges however to the 254-character limit above, since there doesn't seem to be consensus about what happens when a valid 255-character domain is combined with a valid 1-character localpart. This should be accepted by most servers around the world but seem to violate this 254-character limit. So do you create a Domains table that has an artificially lower restriction on length for e-mail addresses, when the domain could be re-used as a valid 255-character URL?


There are a few considerations with this decision. First and foremost is to use current and future predictions of necessary limitations that the data will have to conform to. There's a reason why you don't want to set every string column data type to varchar(1024) when you are just storing a string that shouldn't exceed 32 characters (emphasis on the should keyword).

If you have some sort of vulnerability where emails are all modified to become 255 characters, then you could potentially have a long performance impact of page splits. This may seem out of the ordinary, and it most likely is, but you need to size your data to the business requirement. Much like the age-old constraint at the database vs. application debate, I'm a firm believer that data type limitations and allowable values should also be enforced at the data tier.

Which leads me to my next point. The database is most likely just the data tier. What does the application tier utilize? For instance, if you have an application where you can only enter 80 characters for an email address, why would you want to the data type to be any larger? Business needs to answer two questions:

  1. What can it be?
  2. What should it be?

Only then you'll have your answer.

Doesn't a varchar by definition use only as much storage as needed to hold the data?

Yes and no. There is going to be a sort of offset for the variable length data to record the length of it.


RFC 5321 (the current SMTP spec, obsoletes RFC2821) states:

The maximum total length of a user name or other local-part is 64 octets. The maximum total length of a domain name or number is 255 octets

So 64 + 255 + @ sign implies VARCHAR(320). You probably will never need this much but it's safe to have it, just in case.