difference in execution plans on UAT and PROD server

The potential size of the buffer pool affects plan selection by the query optimizer in a number of ways. As far as I know, hyper-threading does not affect plan choice (though the number of potentially available schedulers certainly can).

Workspace Memory

For plans that contain memory-consuming iterators like sorts and hashes, the size of the buffer pool (among other things) determines the maximum amount of memory grant that might be available to the query at runtime.

In SQL Server 2012 (all versions) this number is reported on the root node of a query plan, in the Optimizer Hardware Dependencies section, shown as Estimated Available Memory Grant. Versions prior to 2012 do not report this number in show plan.

The estimated available memory grant is an input to the cost model used by the query optimizer. As a result, a plan alternative that requires a large sorting or hashing operation is more likely to be chosen on a machine with a large buffer pool setting than on a machine with a lower setting. For installations with a very large amount of memory, the cost model can go too far with this sort of thinking - choosing plans with very large sorts or hashes where an alternative strategy would be preferable (KB2413549 - Using large amounts of memory can result in an inefficient plan in SQL Server - TF2335).

Workspace memory grant is not a factor in your case, but it is something worth knowing about.

Data Access

The potential size of the buffer pool also affects the optimizer's cost model for data access. One of the assumptions made in the model is that every query starts with a cold cache - so the first access to a page is assumed to incur a physical I/O. The model does attempt to account for the chance that repeated access will come from cache, a factor that depends on the potential size of the buffer pool among other things.

The Clustered Index Scans in the query plans shown in the question are one example of repeated access; the scans are rewound (repeated, without a change of correlated parameter) for each iteration of the nested loops semi join. The outer input to the semi join estimates 28.7874 rows, and the query plan properties for these scans shows estimated rewinds at 27.7874 as a result.

Again, in SQL Server 2012 only, the root iterator of the plan shows the number of Estimated Pages Cached in the Optimizer Hardware Dependencies section. This number reports one of the inputs to the costing algorithm that looks to account for the chance of repeated page access coming from cache.

The effect is that an installation with a higher configured maximum buffer pool size will tend to reduce the cost of scans (or seeks) that read the same pages more than once more than an installation with a smaller maximum buffer pool size.

In simple plans, the cost reduction on a rewound scan can be seen by comparing (estimated number of executions) * (estimated CPU + estimated I/O) with the estimated operator cost, which will be lower. The calculation is more complex in the example plans due to the effect of the semi join and union.

Nevertheless, the plans in the question appear to show a case where the choice between repeating the scans and creating a temporary index is quite finely balanced. On the machine with a larger buffer pool, repeating the scans is costed slightly lower than creating the index. On the machine with a smaller buffer pool, the scan cost is reduced by a smaller amount, meaning the index spool plan looks slightly cheaper to the optimizer.

Plan Choices

The optimizer's cost model makes a number of assumptions, and contains a great number of detailed calculations. It is not always (or even usually) possible to follow all the details because not all the numbers we would need are exposed, and the algorithms can change between releases. In particular, the scaling formula applied to take account of the chance of encountering a cached page is not well known.

More to the point in this particular case, the optimizer's plan choices are based on incorrect numbers anyway. The estimated number of rows from the Clustered Index Seek is 28.7874, whereas 256 rows are encountered at runtime - almost an order of magnitude out. We cannot directly see the information the optimizer has about the expected distribution of values within those 28.7874 rows, but it is very likely to be horribly wrong as well.

When estimates are this wrong, plan selection and runtime performance are essentially no better than chance. The plan with the index spool happens to perform better than repeating the scan, but it is quite wrong to think that increasing the size of the buffer pool was the cause of the anomaly.

Where the optimizer has correct information, the chances are much better that it will produce a decent execution plan. An instance with more memory will generally perform better on a workload than another instance with less memory, but there are no guarantees, especially when plan selection is based on incorrect data.

Both instances suggested a missing index in their own way. One reported an explicit missing index, and the other used an index spool with the same characteristics. If the index provides good performance and plan stability, that might be enough. My inclination would be to rewrite the query as well, but that's probably another story.


Paul White has explained in an excellent lucid manner the reason behind - sql server behaviour when running on servers with more memory.

Also, a huge thanks to @swasheck for first spotting the issue.

Opened a case with microsoft and below is what was suggested.

The problem is resolved by using trace flag T2335 as a startup parameter.

The KB2413549 - Using large amounts of memory can result in an inefficient plan in SQL Server describes it in more details.

This trace flag will cause SQL Server to generate a plan that is more conservative in terms of memory consumption when executing the query. It does not limit how much memory SQL Server can use. The memory configured for SQL Server will still be used by data cache, query execution and other consumers. Please ensure that you thoroughly test this option, before rolling it into a production environment.


Max memory settings and hyperthreading can both affect the plan choice.

Additionally, I notice your "set" options are different in each environment:

StatementSetOptions on UAT:

ANSI_NULLS="true" 
ANSI_PADDING="true" 
ANSI_WARNINGS="true" 
ARITHABORT="true" 
CONCAT_NULL_YIELDS_NULL="true" 
NUMERIC_ROUNDABORT="false" 
QUOTED_IDENTIFIER="true" 

StatementSetOptions on Prod:

ANSI_NULLS="true" 
ANSI_PADDING="true" 
ANSI_WARNINGS="true" 
ARITHABORT="false" 
CONCAT_NULL_YIELDS_NULL="true"
NUMERIC_ROUNDABORT="false"
QUOTED_IDENTIFIER="true" 

SQL can generate different plans based on the SET options. This frequently happens if you are capturing the plan from different SSMS sessions, or from different executions from the app.

Make sure the developers are using consistent connection strings.