Sql script to find invalid email addresses

I find this simple T-SQL query useful for returning valid e-mail addresses

SELECT email
FROM People
WHERE email LIKE '%_@__%.__%' 
    AND PATINDEX('%[^a-z,0-9,@,.,_]%', REPLACE(email, '-', 'a')) = 0

The PATINDEX bit eliminates all e-mail addresses containing characters that are not in the allowed a-z, 0-9, '@', '.', '_' & '-' set of characters.

It can be reversed to do what you want like this:

SELECT email
FROM People
WHERE NOT (email LIKE '%_@__%.__%' 
    AND PATINDEX('%[^a-z,0-9,@,.,_]%', REPLACE(email, '-', 'a')) = 0)

Here is a quick and easy solution:

CREATE FUNCTION dbo.vaValidEmail(@EMAIL varchar(100))

RETURNS bit as
BEGIN     
  DECLARE @bitRetVal as Bit
  IF (@EMAIL <> '' AND @EMAIL NOT LIKE '_%@__%.__%')
     SET @bitRetVal = 0  -- Invalid
  ELSE 
    SET @bitRetVal = 1   -- Valid
  RETURN @bitRetVal
END 

Then you can find all rows by using the function:

SELECT * FROM users WHERE dbo.vaValidEmail(email) = 0

If you are not happy with creating a function in your database, you can use the LIKE-clause directly in your query:

SELECT * FROM users WHERE email NOT LIKE '_%@__%.__%'

Source


select
    email 
from loginuser where
patindex ('%[ &'',":;!+=\/()<>]*%', email) > 0  -- Invalid characters
or patindex ('[@.-_]%', email) > 0   -- Valid but cannot be starting character
or patindex ('%[@.-_]', email) > 0   -- Valid but cannot be ending character
or email not like '%@%.%'   -- Must contain at least one @ and one .
or email like '%..%'        -- Cannot have two periods in a row
or email like '%@%@%'       -- Cannot have two @ anywhere
or email like '%.@%' or email like '%@.%' -- Cant have @ and . next to each other
or email like '%.cm' or email like '%.co' -- Unlikely. Probably typos 
or email like '%.or' or email like '%.ne' -- Missing last letter

This worked for me. Had to apply rtrim and ltrim to avoid false positives.

Source: http://sevenwires.blogspot.com/2008/09/sql-how-to-find-invalid-email-in-sql.html

Postgres version:

select user_guid, user_guid email_address, creation_date, email_verified, active
from user_data where
length(substring (email_address from '%[ &'',":;!+=\/()<>]%')) > 0  -- Invalid characters
or length(substring (email_address from '[@.-_]%')) > 0   -- Valid but cannot be starting character
or length(substring (email_address from '%[@.-_]')) > 0   -- Valid but cannot be ending character
or email_address not like '%@%.%'   -- Must contain at least one @ and one .
or email_address like '%..%'        -- Cannot have two periods in a row
or email_address like '%@%@%'       -- Cannot have two @ anywhere
or email_address like '%.@%' or email_address like '%@.%' -- Cant have @ and . next to each other
or email_address like '%.cm' or email_address like '%.co' -- Unlikely. Probably typos 
or email_address like '%.or' or email_address like '%.ne' -- Missing last letter
;

SELECT * FROM people WHERE email NOT LIKE '%_@__%.__%'

Anything more complex will likely return false negatives and run slower.

Validating e-mail addresses in code is virtually impossible.

EDIT: Related questions

  • I've answered a similar question some time ago: TSQL Email Validation (without regex)
  • T-SQL: checking for email format
  • Regexp recognition of email address hard?
  • many other Stack Overflow questions