Query doesn't respond when adding two columns

Summary

The main problems are:

  • The optimizer's plan selection assumes a uniform distribution of values.
  • A lack of suitable indexes means:
    • Scanning the table is the only option.
    • The join is a naive nested loops join, rather than an index nested loops join. In a naive join, the join predicates are evaluated at the join rather than being pushed down the inner side of the join.

Details

The two plans are fundamentally pretty similar, though performance may be very different:

Plan with the extra columns

Taking the one with the extra columns that doesn't complete in a reasonable time first:

Slow plan

The interesting features are:

  1. The Top at node 0 limits the rows returned to 100. It also sets a row goal for the optimizer, so everything below it in the plan is chosen to return the first 100 rows quickly.
  2. The Scan at node 4 finds rows from the table where the Start_Time is not null, State is 3 or 4, and Operation_Type is one of the listed values. The table is fully scanned once, with each row being tested against the predicates mentioned. Only rows that pass all the tests flow on to the Sort. The optimizer estimates that 38,283 rows will qualify.
  3. The Sort at node 3 consumes all the rows from the Scan at node 4, and sorts them in order of Start_Time DESC. This is the final presentation order requested by the query.
  4. The optimizer estimates that 93 rows (actually 93.2791) will have to be read from the Sort in order for the whole plan to return 100 rows (accounting for the expected effect of the join).
  5. The Nested Loops join at node 2 is expected to execute its inner input (the lower branch) 94 times (actually 94.2791). The extra row is required by the stop parallelism exchange at node 1 for technical reasons.
  6. The Scan at node 5 fully scans the table on each iteration. It finds rows where Start_Time is not null and State is 3 or 4. This is estimated to produce 400,875 rows on each iteration. Over 94.2791 iterations, the total number of rows is almost 38 million.
  7. The Nested Loops join at node 2 also applies the join predicates. It checks that Operation_Type matches, that the Start_Time from node 4 is less than the Start_Time from node 5, that the Start_Time from node 5 is less than the Finish_Time from node 4, and that the two Id values do not match.
  8. The Gather Streams (stop parallelism exchange) at node 1 merges the ordered streams from each thread until 100 rows have been produced. The order-preserving nature of the merge across multiple streams is what requires the extra row mentioned in step 5.

The great inefficiency is obviously at steps 6 and 7 above. Fully scanning the table at node 5 for each iteration is only even slightly reasonable if it only happens 94 times as the optimizer predicts. The ~38 million per-row set of comparisons at node 2 is also a large cost.

Crucially, the 93/94 row row goal estimation is also quite likely to be wrong, since it depends on the distribution of values. The optimizer assumes uniform distribution in the absence of more detailed information. In simple terms, this means that if 1% of the rows in the table are expected to qualify, the optimizer reasons that to find 1 matching row, it needs to read 100 rows.

If you ran this query to completion (which might take a very long time), you would most likely find that many more than 93/94 rows had to be read from the Sort in order to finally produce 100 rows. In the worst case, the 100th row would be found using the last row from the Sort. Assuming the optimizer's estimate at node 4 is correct, this means running the Scan at node 5 38,284 times, for a total of something like 15 billion rows. It could be more if the Scan estimates are also off.

This execution plan also includes a missing index warning:

/*
The Query Processor estimates that implementing the following index
could improve the query cost by 72.7096%.

WARNING: This is only an estimate, and the Query Processor is making
this recommendation based solely upon analysis of this specific query.
It has not considered the resulting index size, or its workload-wide
impact, including its impact on INSERT, UPDATE, DELETE performance.
These factors should be taken into account before creating this index.
*/

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Batch_Tasks_Queue] ([Operation_Type],[State],[Start_Time])
INCLUDE ([Id],[Parameters])

The optimizer is alerting you to the fact that adding an index to the table would improve performance.

Plan without the extra columns

Less slow plan

This is essentially the exact same plan as the previous one, with the addition of the Index Spool at node 6 and the Filter at node 5. The important differences are:

  1. The Index Spool at node 6 is an Eager Spool. It eagerly consumes the result of the scan below it, and builds a temporary index keyed on Operation_Type and Start_Time, with Id as a non-key column.
  2. The Nested Loops Join at node 2 is now an index join. No join predicates are evaluated here, instead the per-iteration current values of Operation_Type, Start_Time, Finish_Time, and Id from the scan at node 4 are passed to the inner-side branch as outer references.
  3. The Scan at node 7 is performed only once.
  4. The Index Spool at node 6 seeks rows from the temporary index where Operation_Type matches the current outer reference value, and the Start_Time is in the range defined by the Start_Time and Finish_Time outer references.
  5. The Filter at node 5 tests Id values from the Index Spool for inequality against the current outer reference value of Id.

The key improvements are:

  • The inner side scan is performed only once
  • A temporary index on (Operation_Type, Start_Time) with Id as an included column allows an index nested loops join. The index is used to seek matching rows on each iteration rather than scanning the whole table each time.

As before, the optimizer includes a warning about a missing index:

/*
The Query Processor estimates that implementing the following index
could improve the query cost by 24.1475%.

WARNING: This is only an estimate, and the Query Processor is making
this recommendation based solely upon analysis of this specific query.
It has not considered the resulting index size, or its workload-wide
impact, including its impact on INSERT, UPDATE, DELETE performance.
These factors should be taken into account before creating this index.
*/

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[Batch_Tasks_Queue] ([State],[Start_Time])
INCLUDE ([Id],[Operation_Type])
GO

Conclusion

The plan without the extra columns is faster because the optimizer chose to create a temporary index for you.

The plan with the extra columns would make the temporary index more expensive to build. The [Parameters] column is nvarchar(2000), which would add up to 4000 bytes to each row of the index. The additional cost is enough to convince the optimizer that building the temporary index on each execution would not pay for itself.

The optimizer warns in both cases that a permanent index would be a better solution. The ideal composition of the index depends on your wider workload. For this particular query, the suggested indexes are a reasonable starting point, but you should understand the benefits and costs involved.

Recommendation

A wide range of possible indexes would be beneficial for this query. The important takeaway is that some sort of nonclustered index is needed. From the information provided, a reasonable index in my opinion would be:

CREATE NONCLUSTERED INDEX i1
ON dbo.Batch_Tasks_Queue (Start_Time DESC)
INCLUDE (Operation_Type, [State], Finish_Time);

I would also be tempted to organize the query a little better, and delay looking up the wide [Parameters] columns in the clustered index until after the top 100 rows have been found (using Id as the key):

SELECT TOP (100)
    BTQ1.id,
    BTQ2.id,
    BTQ3.[Parameters],
    BTQ4.[Parameters]
FROM dbo.Batch_Tasks_Queue AS BTQ1
JOIN dbo.Batch_Tasks_Queue AS BTQ2 WITH (FORCESEEK)
    ON BTQ2.Operation_Type = BTQ1.Operation_Type
    AND BTQ2.Start_Time > BTQ1.Start_Time
    AND BTQ2.Start_Time < BTQ1.Finish_Time
    AND BTQ2.id != BTQ1.id
    -- Look up the [Parameters] values
JOIN dbo.Batch_Tasks_Queue AS BTQ3
    ON BTQ3.Id = BTQ1.Id
JOIN dbo.Batch_Tasks_Queue AS BTQ4
    ON BTQ4.Id = BTQ2.Id
WHERE
    BTQ1.[State] IN (3, 4)
    AND BTQ2.[State] IN (3, 4)
    AND BTQ1.Operation_Type NOT IN (23, 24, 25, 26, 27, 28, 30)
    AND BTQ2.Operation_Type NOT IN (23, 24, 25, 26, 27, 28, 30)
    -- These predicates are not strictly needed
    AND BTQ1.Start_Time IS NOT NULL
    AND BTQ2.Start_Time IS NOT NULL
ORDER BY
    BTQ1.Start_Time DESC;

Where the [Parameters] columns are not needed, the query can be simplified to:

SELECT TOP (100)
    BTQ1.id,
    BTQ2.id
FROM dbo.Batch_Tasks_Queue AS BTQ1
JOIN dbo.Batch_Tasks_Queue AS BTQ2 WITH (FORCESEEK)
    ON BTQ2.Operation_Type = BTQ1.Operation_Type
    AND BTQ2.Start_Time > BTQ1.Start_Time
    AND BTQ2.Start_Time < BTQ1.Finish_Time
    AND BTQ2.id != BTQ1.id
WHERE
    BTQ1.[State] IN (3, 4)
    AND BTQ2.[State] IN (3, 4)
    AND BTQ1.Operation_Type NOT IN (23, 24, 25, 26, 27, 28, 30)
    AND BTQ2.Operation_Type NOT IN (23, 24, 25, 26, 27, 28, 30)
    AND BTQ1.Start_Time IS NOT NULL
    AND BTQ2.Start_Time IS NOT NULL
ORDER BY
    BTQ1.Start_Time DESC;

The FORCESEEK hint is there to help ensure the optimizer chooses an indexed nested loops plan (there is a cost-based temptation for the optimizer to select a hash or (many-many) merge join otherwise, which tends not to work well with this type of query in practice. Both end up with large residuals; many items per bucket in the case of the hash, and many rewinds for the merge).

Alternative

If the query (including its specific values) were particularly critical for read performance, I would consider two filtered indexes instead:

CREATE NONCLUSTERED INDEX i1
ON dbo.Batch_Tasks_Queue (Start_Time DESC)
INCLUDE (Operation_Type, [State], Finish_Time)
WHERE 
    Start_Time IS NOT NULL
    AND [State] IN (3, 4)
    AND Operation_Type <> 23
    AND Operation_Type <> 24
    AND Operation_Type <> 25
    AND Operation_Type <> 26
    AND Operation_Type <> 27
    AND Operation_Type <> 28
    AND Operation_Type <> 30;

CREATE NONCLUSTERED INDEX i2
ON dbo.Batch_Tasks_Queue (Operation_Type, [State], Start_Time)
WHERE 
    Start_Time IS NOT NULL
    AND [State] IN (3, 4)
    AND Operation_Type <> 23
    AND Operation_Type <> 24
    AND Operation_Type <> 25
    AND Operation_Type <> 26
    AND Operation_Type <> 27
    AND Operation_Type <> 28
    AND Operation_Type <> 30;

For the query that does not need the [Parameters] column, the estimated plan using the filtered indexes is:

Simple filtered index plan

The index scan automatically returns all qualifying rows without evaluating any additional predicates. For each iteration of the index nested loops join, the index seek performs two seeking operations:

  1. A seek prefix match on Operation_Type and State = 3, then seeking the range of Start_Time values, residual predicate on the Id inequality.
  2. A seek prefix match on Operation_Type and State = 4, then seeking the range of Start_Time values, residual predicate on the Id inequality.

Where the [Parameters] column is needed, the query plan simply adds a maximum of 100 singleton lookups for each table:

Filtered index plan with extra columns

As a final note, you should consider using the built-in standard integer types instead of numeric where applicable.