Shouldn't Operator cost at least be as large as I/O or CPU cost that comprises it?

Row Goals

If a row goal gets set in the query, this can affect row estimates and costing.

You could confirm if this is causing the problem by running the query with trace flag 4138 enabled (which will remove the influence of the row goal).

Buffer Pool Size

The estimated cost for some I/O operations can be reduced if there's a larger buffer pool available (the server with reduced cost has 14 GB of RAM, vs 6 GB on the other machine).

You can check for the influence of this behavior by looking for "EstimatedPagesCached" in the plan XML. A higher value for this property could reduce the I/O cost of parts of the execution plan that potentially access the same data.

Available Schedulers

For a parallel query, the CPU cost of an operator can be reduced by as much as "# of schedulers / 2." You can check what value this has by looking for "EstimatedAvailableDegreeOfParallelism" in the plan XML.

I mention this because I noticed that the "slow query" ran on a server with 4 cores, while the faster one ran on a server with 1 core.

Costs Are Weird and Broken

Forrest talks about a bunch of different ways that costs can end up not making sense on his blog: Percentage Non Grata


Shouldn't Operator cost at least be as large as I/O or CPU cost that comprises it?

It depends.

It's a shame that other person deleted their post because I came up with similar ideas.

Row Goals

This is not what you are experiencing based on the screenshots, but this is a factor in the calculation of the Operator cost. I/O and CPU costs do not scale, they will show a per-execution cost if a row goal is not in effect. The Operator cost does scale to show the row goal. This is one instance where I/O and CPU does not exactly comprise the Operator cost, the estimated number of executions is something to take into account. How you view these stats are dependent on if you are looking at the inner or outer input.

Source: Inside the Optimizer: Row Goals In Depth by Paul White - August 18, 2010 (archive)

Buffer Pool Usage

This could be a factor that is affecting you.

The full cost of an operation should be the number executes multiplied by the CPU cost, plus a more involved formula for the number of IO required. The formula for IO represents the probability that an IO will already be in memory after a number of pages have already been accessed. For large tables, it also models the chances that a previously accessed page may have already been evicted when it is needed again. The sub-tree cost represents the cost of the current operation plus all operations that feed into the current operation.

Source: Execution Plan Cost Model by Joe Chang - July 2009 (archive)

Onto your problem

We can see in your screenshots that you have a wildly interesting subtree cost on the server not performing well. What is interesting is that it has more memory to use and less CPU.

The above information indicates to me, you probably have a problem with the Subtree Cost and the Operator cost is a symptom.

...the Estimated Subtree Cost, is the cumulative (added up in NodeID order) costs of each individual operator.

Source: Actual Execution Plan Costs by Grant Fritchey - August 20, 2018 (archive)

I think the answer lies in these sentences:

The formula for IO represents the probability that an IO will already be in memory after a number of pages have already been accessed. For large tables, it also models the chances that a previously accessed page may have already been evicted when it is needed again.

What I think is happening to you:

  1. Hardware setup is different. Ram / CPU / Disk, it's not the same and it is influencing the estimations.
  2. Physical data files. How did you make a copy? I would recommend the only way to truly replicate this is to do a backup / restore with the data files.
  3. Did you try clearing out the cache and then forcing a recompile? I wonder what this would result in.

Otherwise I'd love to see the estimated and actual query plans to dive deeper into what looks like is going on.

IMPORTANT, THIS WILL HURT (You could be fired) IF YOU RUN THIS IN PRODUCTION WITHOUT UNDERSTANDING WHAT WILL HAPPEN AND WITHOUT PLANNING THIS. This is how I'd clear the cache to test again with recompile.


Different Ways to Flush or Clear SQL Server Cache by Bhavesh Patel - March 31, 2017 (archive)

  • DBCC FREESYSTEMCACHE
  • DBCC FREESESSIONCACHE
  • DBCC FREEPROCCACHE

Can we presume the servers are genuinely identical?

I noticed a small change in query step costs returned for a SP execution plan after amending the db compatibility level on a sql2012 server. (idle db, got first plan xml, applied option change, recompiled sp, got second plan xml) The plan itself appears identical. More options are available within the optimiser, possibly calculates it slightly differently. If you have different patch/compatibility across the 2x servers it could result in the actual plan being more radically different (wrong..)