Unexpected results with random numbers and join types

This might give some insight until one of the smarter folks on the site chimes in.

I put the random results into a temporary table and I consistently get 4 result regardless of the join type.

/* Works as expected -- always four rows */

DECLARE @Rando table
(
    RandomNumber int
);

INSERT INTO
    @Rando
(
    RandomNumber
)
-- This generates 4 random numbers from 1 to 4, endpoints inclusive
SELECT
    1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber
FROM
    sys.databases
WHERE
    database_id <= 4;

SELECT
    *
FROM
    @Rando AS R;

SELECT
    rando.RandomNumber
,   d.database_id
FROM 
    @Rando AS rando
    LEFT JOIN 
        sys.databases d 
        ON rando.RandomNumber = d.database_id
ORDER BY 1,2;


/* Returns a varying number of rows */

SELECT rando.RandomNumber, d.database_id
FROM 
    @Rando AS rando
    INNER JOIN 
        sys.databases d 
        ON rando.RandomNumber = d.database_id
ORDER BY 1,2;

/* Also returns a varying number of rows */

WITH rando AS 
(
    SELECT * FROM @Rando AS rando
)
SELECT r.RandomNumber, d.database_id
FROM 
    rando AS r
    INNER JOIN 
        sys.databases d 
        ON r.RandomNumber = d.database_id
ORDER BY 1,2;

If I compare query plans between your second query and the variation with a table variable, I can see there's a definite difference between the two. The red X is No Join Predicate so that seems really odd to my caveman developer brain

enter image description here

If I eliminate the random bit of the query to a constant 1 % (4), my plan looks better but the Compute Scalar was eliminated so that led me to look closer

enter image description here

It's computing the expression for the random number after the join. Whether that's expected, I still leave to the internal wizards on the site but at least that's why you're getting variable results in your join.

2014

For those playing along at home, the above query plans were generated from a 2008 R2 instance. The 2014 plans look different but the Compute Scalar operation remains after the join.

This is the query plan for a 2014 using the constant expression

enter image description here

This is the query plan for a 2014 instance using the newid expression.

enter image description here

This apparently is by design, Connect issue here. Thanks to @paulWhite for knowing that existed.


By adding the additional SELECT it pushes the compute scalar evaluation deeper into the plan and gives the join predicate, the compute scalar at the top then references the earlier one.

SELECT rando.RandomNumber, d.database_id
FROM 
  (SELECT ( SELECT 1 + ABS(CHECKSUM(NEWID())) % (4)) AS RandomNumber 
   FROM sys.databases WHERE database_id <= 4) AS rando
INNER JOIN sys.databases d ON rando.RandomNumber = d.database_id

|--Compute Scalar(DEFINE:([Expr1071]=[Expr1070]))

|--Compute Scalar(DEFINE:([Expr1070]=(1)+abs(checksum(newid()))%(4)))

Still digging into just why it waits so late to do it, but currently reading this post by Paul White (https://sql.kiwi/2012/09/compute-scalars-expressions-and-execution-plan-performance.html). Perhaps it has something to do with the fact that NEWID is not deterministic?