Is it legal for SQL Server to fill PERSISTED columns with data that does not match the definition?

This is certainly a bug. The fact that the col1 values happened to be the result of an expression involving random numbers clearly doesn't change what the the correct value for col2 is supposed to be. DBCC CHECKDB returns an error if this is run against a permanent table.

create table test (
    Col1 INT,
    Contains2 AS CASE WHEN 2 IN (Col1) THEN 1 ELSE 0 END PERSISTED);

INSERT INTO test (Col1) VALUES
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5)),
    (ABS(CHECKSUM(NEWID()) % 5));

DBCC CHECKDB

Gives (for my test run which had one "impossible" row)

Msg 2537, Level 16, State 106, Line 17
Table error: object ID 437576597, index ID 0, partition ID 72057594041008128, alloc unit ID 72057594046251008 (type In-row data), page (1:121), row 0. The record check (valid computed column) failed. The values are 2 and 0.
DBCC results for 'test'.
There are 5 rows in 1 pages for object "test".
CHECKDB found 0 allocation errors and 1 consistency errors in table 'test' (object ID 437576597).

It does also report that

repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB

And if taken up on the repair option unceremoniously deletes the whole row as it has no way of telling which column is corrupted.

Attaching a debugger shows that the NEWID() is being evaluated twice per inserted row. Once before the CASE expression is evaluated and once inside it.

enter image description here

A possible workaround might be to use

INSERT INTO @test
            (Col1)
SELECT ( ABS(CHECKSUM(NEWID()) % 5) )
FROM   (VALUES (1),(1),(1),(1),(1)) V(X); 

Which for one reason or another avoids the issue and only evaluates the expression once per row.


Per the comment conversation, the consensus seems to be that the answer to the OP's question is that this does constitute a bug (i.e. should be illegal).

The OP references Vladimir Baranov's analysis on StackOverflow, where they state:

"First time for Col1, second time for the CASE statement of the persisted column.

Optimiser doesn't know, or doesn't care in this case that NEWID is a non-deterministic function and calls it twice."

Put another way, it should be expected that [the NEWID() within] col1 has the same value you just inserted as when you make the calculation.

This would be synonymous to what's happening with the bug, where NEWID is created for Col1 and then created again for the persisted column:

INSERT INTO @Test (Col1, Contains2) VALUES
(NEWID(), CASE WHEN (NEWID()) LIKE '%2%' THEN 1 ELSE 0 END)

In my testing, other non-deterministic functions like RAND and time values did not result in the same bug.

Per Martin, this has been raised to Microsoft (https://connect.microsoft.com/SQLServer/Feedback/Details/2751288) where there are comments back to this page and the StackOverflow analysis (below).