Can SQL Server create collisions in system generated constraint names?

Can SQL Server create collisions in system generated constraint names?

This depends on the type of constraint and version of SQL Server.

CREATE TABLE T1
(
A INT PRIMARY KEY CHECK (A > 0),
B INT DEFAULT -1 REFERENCES T1,
C INT UNIQUE,
CHECK (C > A)
)

SELECT name, 
       object_id, 
       CAST(object_id AS binary(4)) as object_id_hex,
       CAST(CASE WHEN object_id >= 16000057  THEN object_id -16000057 ELSE object_id +2131483591 END AS BINARY(4)) AS object_id_offset_hex
FROM sys.objects
WHERE parent_object_id = OBJECT_ID('T1')
ORDER BY name;

drop table T1

Example Results 2008

+--------------------------+-----------+---------------+----------------------+
|           name           | object_id | object_id_hex | object_id_offset_hex |
+--------------------------+-----------+---------------+----------------------+
| CK__T1__1D498357         | 491357015 | 0x1D498357    | 0x1C555F1E           |
| CK__T1__A__1A6D16AC      | 443356844 | 0x1A6D16AC    | 0x1978F273           |
| DF__T1__B__1B613AE5      | 459356901 | 0x1B613AE5    | 0x1A6D16AC           |
| FK__T1__B__1C555F1E      | 475356958 | 0x1C555F1E    | 0x1B613AE5           |
| PK__T1__3BD019AE15A8618F | 379356616 | 0x169C85C8    | 0x15A8618F           |
| UQ__T1__3BD019A91884CE3A | 427356787 | 0x1978F273    | 0x1884CE3A           |
+--------------------------+-----------+---------------+----------------------+

Example Results 2017

+--------------------------+------------+---------------+----------------------+
|           name           | object_id  | object_id_hex | object_id_offset_hex |
+--------------------------+------------+---------------+----------------------+
| CK__T1__59FA5E80         | 1509580416 | 0x59FA5E80    | 0x59063A47           |
| CK__T1__A__571DF1D5      | 1461580245 | 0x571DF1D5    | 0x5629CD9C           |
| DF__T1__B__5812160E      | 1477580302 | 0x5812160E    | 0x571DF1D5           |
| FK__T1__B__59063A47      | 1493580359 | 0x59063A47    | 0x5812160E           |
| PK__T1__3BD019AE0A4A6932 | 1429580131 | 0x5535A963    | 0x5441852A           |
| UQ__T1__3BD019A981F522E0 | 1445580188 | 0x5629CD9C    | 0x5535A963           |
+--------------------------+------------+---------------+----------------------+

For default constraints, check constraints and foreign key constraints the last 4 bytes of the auto generated name are a hexadecimal version of the objectid of the constraint. As objectid are guaranteed unique the name must also be unique. In Sybase too these use tabname_colname_objectid

For unique constraints and primary key constraints Sybase uses

tabname_colname_tabindid, where tabindid is a string concatenation of the table ID and index ID

This too would guarantee uniqueness.

SQL Server doesn't use this scheme.

In both SQL Server 2008 and 2017 it uses an 8 byte string at the end of the system generated name however the algorithm has changed as to how the last 4 bytes of that are generated.

In 2008 the last 4 bytes represent a signed integer counter that is offset from the object_id by -16000057 with any negative value wrapping around to max signed int. (The significance of 16000057 is that this is the increment applied between successively created object_id). This still guarantees uniqueness.

On 2012 upwards I don't see any pattern at all between the object_id of the constraint and the integer obtained by treating the last 8 characters of the name as the hexadecimal representation of a signed int.

The function names in the call stack in 2017 shows that it now creates a GUID as part of the name generation process (On 2008 I see no mention of MDConstraintNameGenerator). I guess this is to provide some source of randomness. Clearly it isn't using the whole 16 bytes from the GUID in that 4 bytes that changes between constraints however.

enter link description here

I presume the new algorithm was done for some efficiency reason at the expense of some increased possibility of collisions in extreme cases such as yours.

This is quite a pathological case as it requires the table name prefix and column name of the PK (insofar as this affects the 8 characters preceding the final 8) to be identical for tens of thousands of tables before it becomes probable but can be reproduced quite easily with the below.

CREATE OR ALTER PROC #P
AS
    SET NOCOUNT ON;

    DECLARE @I INT = 0;


    WHILE 1 = 1
      BEGIN
          EXEC ('CREATE TABLE abcdefghijklmnopqrstuvwxyz' + @I + '(C INT PRIMARY KEY)');
          SET @I +=1;
      END 

GO

EXEC #P

An example run on SQL Server 2017 against a newly created database failed in just over a minute (after 50,931 tables had been created)

Msg 2714, Level 16, State 30, Line 15 There is already an object named 'PK__abcdefgh__3BD019A8175067CE' in the database. Msg 1750, Level 16, State 1, Line 15 Could not create constraint or index. See previous errors.


Assuming I have 100 million tables, I calculate less than a 1-in-1-trillion chance of a collision

Remember this is the "birthday problem". You're not trying to generate a collision for a single given hash, but rather measuring the probability that none of the many pairs of values will collide.

So with N tables, there are N*(N-1)/2 pairs, so here about 1016 pairs. If the probability of a collision is 2-64, the probability of a single pair not colliding is 1-2-64, but with so many pairs, the probability of having no collisions here is about (1-2-64)1016, or more like 1/10,000. See eg https://preshing.com/20110504/hash-collision-probabilities/

And if it's only a 32bit hash the probablity of a collision crosses 1/2 at only 77k values.