Will the Query Optimizer Ignore A Fragmented Index?

AFAIK the optimizer is not aware of index fragmentation. This can be a problem if it picks a plan that scans a fragmented index.

The optimizer is aware of the allocated data size, though. If the index pages have a lot of free space (possibly due to internal fragmentation) this makes the index less likely to be used. 50% empty space means twice the amount of IO to scan. For random access that should not matter to any significant extent, though.

This is not a huge effect, though. It might explain what you are seeing.

If this small effect flips the query plan to not use the index then the index was never super great in the first place in the eyes of the query optimizer. This might be a hint that you can improve it.

Also, the optimizer seems to have a guess for how much of the index is cached in the buffer pool. There are some references to that in the XML execution plans. I have no detailed knowledge of that.

I'm beginning to think that adding an index to this table won't help at all

I wouldn't go that far. Maybe all you need is a rebuild or a drop-DML-create sequence in the right places? Or, maybe this is just a query tuning problem (ask a new question with the actual execution plan included).


The first thing that comes to mind is outdated statistics, not the fragmentation of the index as such.

Right after the index is (re)built, the statistics associated with the index is accurate; the histogram range covers all values. As data changes in the table the statistics is not updated immediately. I don't remember now the exact thresholds, i.e. how many rows should be deleted/inserted before the auto-updating of statistics is performed.

I observed a similar behaviour in our system. The simplified workflow in our system is the following.

We have a table with ~100M rows that contains data for N days. During the day new rows are added with increasing datetime values in an indexed column. The data is added throughout the day in batches (usually 1K-10K at a time). At midnight the maintenance procedure deletes all values older than N days and rebuilds the index.

Also, during the day every 10 minutes another procedure summarizes the data and updates the summary in another table that contains less detailed data, but which is kept for longer.

I noticed that the performance of summarizing procedure was fine in the morning, but was getting worse later in the day. I checked the execution plans and saw that they were different. The same query that runs in the morning and in the evening had different plans (I used OPTION(RECOMPILE)).

So, I added a procedure to update relevant statistics throughout the day without relying on built-in thresholds.

CREATE PROCEDURE [dbo].[RebuildStatisticsOnMyTable]
WITH EXECUTE AS OWNER
AS
BEGIN
    SET NOCOUNT ON;
    BEGIN TRY
        UPDATE STATISTICS [dbo].[MyTableStats] ([IX_ImportantIndex], [IX_AnotherIndex]);
    END TRY
    BEGIN CATCH
        -- handle errors
        ...
    END CATCH;
END

With such periodic updates of statistics throughout the day the performance of summarizing procedure is good and stable. I had to experiment a bit and found the suitable period for updating the statistics.

This is on SQL Server 2008 and as far as I know applies to 2012 as well. 2014 has a different, improved cardinality estimator, which (as far as I understood) can effectively extrapolate the statistics and produce decent predictions in such cases of added rows with growing timestamps that go beyond the range of the statistics histogram. I don't remember now where I saw the detailed description of this. Most likely it was a blog post by Paul White or Aaron Bertrand. So, it is likely, that if we upgrade to 2014 there will be no need for these forced updates of statistics throughout the day.