Weird WHERE Clause Behavior. Why does this return a row?

What is happening is that SQL pads spaces to the end of strings so that they are of the same length. So if you try do something like:

SELECT 1 WHERE '' = ' ' 

You will actually get 1 back even though they are not the same. However, if you do something like:

SELECT 1 WHERE 'a' = ' a'

It will not return anything as it would be comparing 'a ' to ' a' which do not match.

However, this would return 1:

SELECT 1 WHERE 'a' = 'a '

As it is comparing 'a ' to 'a '


What if one needs to query to find '1234 ' but not '1234'

If you are using Unicode data, you can simply use LIKE. From the documentation:

When all arguments (match_expression, pattern, and escape_character, if present) are ASCII character data types, ASCII pattern matching is performed. If any one of the arguments are of Unicode data type, all arguments are converted to Unicode and Unicode pattern matching is performed. When you use Unicode data (nchar or nvarchar data types) with LIKE, trailing blanks are significant; however, for non-Unicode data, trailing blanks are not significant. Unicode LIKE is compatible with the ISO standard. ASCII LIKE is compatible with earlier versions of SQL Server.

With non-Unicode data, you can get still get Unicode LIKE behaviour, at the cost of some implicit conversions (note the LIKE pattern in the second query has a leading 'N', denoting a Unicode string literal):

DECLARE @T AS table 
(
    pk integer IDENTITY PRIMARY KEY,
    v varchar(10) NOT NULL,
    UNIQUE (v, pk)
);

INSERT @T (v) 
VALUES 
    ('1234'), 
    ('1234' + SPACE(1)),
    ('1234' + SPACE(2)),
    ('1234' + SPACE(3));

SELECT REPLACE(T.v, SPACE(1), '@')
FROM @T AS T 
WHERE T.v LIKE '1234 ';

SELECT REPLACE(T.v, SPACE(1), '@')
FROM @T AS T 
WHERE T.v LIKE N'1234 ';

Results:

Results

The code with the implicit conversion may still be able to use an index, through the magic of dynamic seek generation:

Plan


Regarding the following portion of the Question:

What if one needs to query to find '1234 ' but not '1234'

It seems that using a binary Collation (i.e. one ending in _BIN2) does not work for this, but you can additionally compare the DATALENGTH of the two values:

SET NOCOUNT ON;
DECLARE @T TABLE (Col1 VARCHAR(10));
INSERT INTO @T (Col1) VALUES ('12345  ');
INSERT INTO @T (Col1) VALUES ('12345');

SELECT DATALENGTH(Col1) AS [Col1Bytes], '~' + Col1 + '~' AS [Col1] FROM @T;

SELECT DATALENGTH(Col1) AS [Col1Bytes], '~' + Col1 + '~' AS [Col1] FROM @T
WHERE Col1 = '12345  '
AND   DATALENGTH(Col1) = DATALENGTH('12345  ');

Returns:

Col1Bytes    Col1
7            ~12345  ~
5            ~12345~

Col1Bytes    Col1
7            ~12345  ~

Tags:

Sql Server