how to select rows that contains non-english characters in sql server 2005(it should filter only non-english chars, not special characters)

T-SQL's string-handling capability is pretty rudimentary.

If the "non-English" fields are distinguished by their use of Unicode UTF-16, you can try something like

SELECT * FROM MyTable WHERE MyField = Cast(MyField AS VARCHAR)

to pull only rows that are expressible in UTF-8.

The only way I know how to test whether a field is drawn from an arbitrary set of characters is with a user-defined function, like this:

CREATE FUNCTION IsAllowed (@input VARCHAR(MAX)) RETURNS BIT
-- Returns 1 if string is allowed, 0 otherwise.
-- Usages: SELECT dbo.IsAllowed('Hello'); -- returns 1
--         SELECT dbo.IsAllowed('Hello, world!'); -- returns 0
-- Note CHARINDEX is not case sensitive so @allowables doesn't need both.
--      VARCHAR(MAX) is different under SQL Server 2005 than 2008+
---     and use of defined VARCHAR size might be necessary.
AS
BEGIN
  DECLARE @allowables char(26) = 'abcdefghijklmnopqrstuvwxyz';
  DECLARE @allowed int = 0; 
  DECLARE @index int = 1;
  WHILE @index <= LEN(@input)
    BEGIN
    IF CHARINDEX(SUBSTRING(@input,@index,1),@allowables)=0
      BEGIN
      SET @allowed = 0;
      BREAK;
      END
    ELSE
      BEGIN
      SET @allowed = 1;
      SET @index = @index+1;
      END
    END
  RETURN @allowed
END

User-defined functions can be applied to columns in SELECT, like this:

SELECT * FROM MyTable WHERE dbo.IsAllowed(MyField) = 1

Note the schema name (dbo in this case) is not optional with user-defined functions.

If a T-SQL user-defined function is inadequate, you can also use a CLR Function. Then you could apply a regexp or whatever to a column. Because they break portability and pose a security risk, many sysadmins don't allow CLR functions. (This includes Microsoft's SQL Azure product.)


If you have all the allowed special characters, following select statement should select all columns with only English and the allowed special characters:

select column_name from table_name where column_name like '%[^a-z, .-™]%';

You can add all the allowed special characters inside the square brackets.


Got a great answer for my question.

     select ID, LATITUDE, LONGITUDE, REFERENCE, LOCATION, VALIDATE,
     patindex('%[^ !-~()"]%' COLLATE Latin1_General_BIN,LOCATION) as [Position],
     substring(LOCATION,patindex('%[^ !-~()"]%' COLLATE Latin1_General_BIN,LOCATION),1) as [InvalidCharacter],
     ascii(substring(LOCATION,patindex('%[^ !-~()"]%' COLLATE Latin1_General_BIN,LOCATION),1)) as [ASCIICode]from  dbo.RADAR_SEARCH where patindex('%[^ !-~()"]%' COLLATE Latin1_General_BIN,LOCATION) >0

EDIT1: Explanation for above answer

Above query filters only non-English chars in Location column in which special characters are not included.

NOTE: Tested only in MS-SQL.

Tags:

Sql Server