What is the best way to store an email address in PostgreSQL?

I always use CITEXT for email, because an email address is (in practice) case insensitive, i.e. [email protected] is same as [email protected].

It is also easier to setup an unique index to prevent duplicates, as compared to text:

-- citext
CREATE TABLE address (
   id serial primary key,
   email citext UNIQUE,
   other_stuff json
);

-- text
CREATE TABLE address (
   id serial primary key,
   email text,
   other_stuff json
);
CREATE UNIQUE INDEX ON address ((lower(email)));

Comparing emails is also easier and less prone to errors:

SELECT * FROM address WHERE email = '[email protected]';

as compared to:

SELECT * FROM address WHERE lower(email) = lower('[email protected]');

CITEXT is a type defined in a standard extension module named "citext", and available by typing:

CREATE EXTENSION citext;

P.S. text and varchar are virtually the same in Postgres and there is no penalty for using text as one may expect. Check this answer: Difference between text and varchar


Custom DOMAINs

I don't think using citext (case-insensitive) is enough[1]. Using PostgreSQL we can create a custom domain which is essentially some defined constraints over a type. We can create a domain for instance over the citext type, or over text.

Using HTML5 type=email spec

Currently the most correct answer to the question what is an e-mail address is specified in RFC5322. That spec is insanely complex[2], so much so that everything breaks it. HTML5 contains a different spec for email,

This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here. [...] The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.

/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

This is likely what you want, and if it's good enough for HTML5, it's probably good enough for you. We can make use of that directly in PostgreSQL. I also use citext here (which technically means you can simply the regex a bit visually by removing either the upper-case or lower-case).

CREATE EXTENSION citext;
CREATE DOMAIN email AS citext
  CHECK ( value ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$' );

Now you can do...

SELECT '[email protected]'::email;

But not

SELECT 'asdf@foob,,ar.com'::email;
SELECT 'asd@[email protected]'::email;

Because both of those return

ERROR:  value for domain email violates check constraint "email_check"

Because this is based on citext too

SELECT '[email protected]'::email = '[email protected]';

returns true by default.

Using plperlu/Email::Valid

As an important note, there is a more correct method of doing this that is far more complex using plperlu. If you need this level of correctness you do not want citext. Email::Valid can even check if the domain has an MX record (example in docs of Email::Valid)! First, add plperlu (requires superuser).

CREATE EXTENSION plperlu;

Then create the function, notice we mark at as an IMMUTABLE:

CREATE FUNCTION valid_email(text)
  RETURNS boolean
  LANGUAGE plperlu
  IMMUTABLE LEAKPROOF STRICT AS
$$
  use Email::Valid;
  my $email = shift;
  Email::Valid->address($email) or die "Invalid email address: $email\n";
  return 'true';
$$;

Then create the domain,

CREATE DOMAIN validemail AS text NOT NULL
  CONSTRAINT validemail_check CHECK (valid_email(VALUE));

Footnotes

  1. Using citext is technically wrong. SMTP defines local-part as being case sensitive. But, again, this is a case of the spec being stupid. It contains its own identity crises. The spec says local-part (the part before the @) "MAY be case-sensitive" ... "MUST BE treated as case sensitive" ... and yet "exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged."
  2. The spec for an email address is so complex, it's not even self-contained. Complex is truly an understatement, those making the spec don't even understand it.. From the docs on regular-expression.info

    Neither of these regexes enforce length limits on the overall email address or the local part or the domain names. RFC 5322 does not specify any length limitations. Those stem from limitations in other protocols like the SMTP protocol for actually sending email. RFC 1035 does state that domains must be 63 characters or less, but does not include that in its syntax specification. The reason is that a true regular language cannot enforce a length limit and disallow consecutive hyphens at the same time.


I always use varchar(254) as an email address may not be longer than 254 characters.

See https://stackoverflow.com/questions/386294/what-is-the-maximum-length-of-a-valid-email-address

Postgresql has no built-in type for email addresses, though I did come across some contributed data type.

In addition, you may wish to add a trigger or some such logic to standardize email addresses in case you wish to add a unique key on it.

In particular, the domain part of the email address (which is of the form local-part@domain is case insensitive while local-part must be treated as case sensitive. See http://tools.ietf.org/html/rfc5321#section-2.4

Another consideration is if you wish to store names and email addresses in the form "Joe Bloggs" <[email protected]>, in which case you need a string longer than 254 characters and you won't be able to meaningfully use a unique constraint. I wouldn't do this and suggest storing name and email address separately. Pretty printing addresses in this format is always possible in your presentation layer.