Most efficient way to retrieve a sub-query COUNT grouped by top table?

You're on a newer version of SQL Server so the actual plan gives you a lot of information. See the caution sign on the SELECT operator? That means that SQL Server generated a warning which could affect query performance. You should always look at those:

<Warnings>
<PlanAffectingConvert ConvertIssue="Seek Plan" Expression="[s].[logger_uuid]=CONVERT_IMPLICIT(nchar(17),[d].[uuid],0)" />
<PlanAffectingConvert ConvertIssue="Seek Plan" Expression="CONVERT_IMPLICIT(nvarchar(100),[d].[name],0)=[g].[name]" />
</Warnings>

There are two data type conversions caused by your schema. Based on the warnings I suspect that name is actually an NVARCHAR(100) and logger_uuid is an NCHAR(17). The posted table schema in the question may not be correct. You should understand the root cause of why these conversions are happening and fix it. Some types of data type conversions prevent index seeks, lead to cardinality estimate issues, and cause other problems.

Another important thing to check is wait stats. You can see those in the details of the SELECT operator as well. Here's the XML for your wait stats and the time spent by the query:

<WaitStats>
<Wait WaitType="RESOURCE_GOVERNOR_IDLE" WaitTimeMs="49515" WaitCount="3773" />
<Wait WaitType="SOS_SCHEDULER_YIELD" WaitTimeMs="57164" WaitCount="2466" />
</WaitStats>
<QueryTimeStats ElapsedTime="67135" CpuTime="10007" />

I'm not a cloud guy but it looks like your query isn't able to fully engage a CPU. That's probably related to your current Azure tier. The query only needed about 10 seconds of CPU when executing but it took 67 seconds. I believe that 50 seconds of that time was spent being throttled and 7 seconds of that time was given to you but used on other queries that were concurrently running. The bad news is that the query is slower than it could be due to your tier. The good news it that any reductions in CPU could lead to a 5X reduction in run time. In other words, if you can get the query to use 1 second of CPU then you might see a runtime of around 5 seconds.

Next you can look at the Actual Time Statistics property in your operator details to see where the CPU time was spent. Your plan uses row mode so the CPU time for an operator is the sum of time spent by that operator as well as its children. This is a relatively simple plan so it doesn't take long to discover that the clustered index scan on logger_data uses 6527 ms of CPU time. The loop join that calls it uses 10006 ms of CPU time, so all of your query's CPU is spent at that step. Another clue that something is going wrong at that step can be found by looking at the thickness of the relative arrows:

thick arrows

A lot of rows are returned from that operator so it's worth looking at detail. Looking at the actual number of rows for the clustered index scan you can see that 14088885 rows were returned and 14100798 rows were read. However, the table cardinality is just 484803 rows. Intuitively that seems pretty inefficient, right? The clustered index scan returns far more than the number of rows in the table. Some other plan with a different join type or access method on the table is likely to be more efficient.

Why did SQL Server read and return so many rows? The clustered index is on the inner side of a nested loop. There are 38 rows returned by the outer side of the loop (the scan on the logger table) so the scan on logger_data executes 38 times. 484803*38 = 18422514 which is pretty close to the number of rows read. So why did SQL Server choose such a plan that feels so inefficient? It even estimates that it'll do 57 scans of the table, so arguably the plan that you got was more efficient than it suspected.

You might have been wondering why there's a TOP operator in your plan. SQL Server introduced a row goal to when creating a query plan for your query. This might be more detail than you want, but the short version is that SQL Server does not always need to return all rows from a clustered index scan. Sometimes it can stop early if it only needs a fixed number of rows and it finds those rows before it reaches the end of the scan. A scan isn't as expensive if it can stop early so the operator cost is discounted by a formula when a row goal is present. In other words, SQL Server expects to scan the clustered index 57 times, but it thinks that it will find the single row that it needs very quickly. It only needs a single row from each scan due to the presence of the TOP operator.

You can make your query faster by encouraging the query optimizer to pick a plan that doesn't scan the logger_data table 38 times. This might be as simple as eliminating the data type conversions. That could allow SQL Server to do an index seek instead of a scan. If not, fix the conversions and create a covering index for the logger_data:

CREATE INDEX IX ON logger_data (category_name, logger_uuid);

The query optimizer chooses a plan based on cost. Adding this index makes it unlikely to get the slow plan which does many scans on logger_data because it'll be cheaper to access the table through an index seek instead of a clustered index scan.

If you aren't able to add the index you can consider adding a query hint to disable the introduction of row goals: USE HINT('DISABLE_OPTIMIZER_ROWGOAL')). You should only do this if you feel comfortable with the concept of row goals and understand them. Adding that hint should result in a different plan, but I can't say how efficient it'll be.


Start by ensuring each table has all candidate keys declared, and foreign keys enforced:

CREATE TABLE dbo.categories
(
    id uniqueidentifier NOT NULL
        CONSTRAINT [UQ dbo.categories id]
        UNIQUE NONCLUSTERED,
    [name] nvarchar(50) NOT NULL 
        CONSTRAINT [PK dbo.categories name]
        PRIMARY KEY CLUSTERED
);

-- Choose a better name for this table
CREATE TABLE dbo.[group]
(
    id uniqueidentifier NOT NULL
        CONSTRAINT [PK dbo.group id]
        PRIMARY KEY CLUSTERED
);

CREATE TABLE dbo.logger
(
    id uniqueidentifier 
        CONSTRAINT [UQ dbo.logger id]
        UNIQUE NONCLUSTERED,
    group_id uniqueidentifier NOT NULL
        CONSTRAINT [FK dbo.group id]
        FOREIGN KEY (group_id)
        REFERENCES [dbo].[group] (id),
    uuid char(17) NOT NULL
        CONSTRAINT [PK dbo.logger uuid]
        PRIMARY KEY CLUSTERED
);

CREATE TABLE dbo.logger_data
(
    id uniqueidentifier 
        CONSTRAINT [PK dbo.logger_data id]
        PRIMARY KEY NONCLUSTERED,
    logger_uuid char(17) NOT NULL
        CONSTRAINT [FK dbo.logger_data uuid]
        FOREIGN KEY (logger_uuid)
        REFERENCES dbo.logger (uuid),
    category_name nvarchar(50) NOT NULL
        CONSTRAINT [dbo.logger_data name]
        FOREIGN KEY (category_name)
        REFERENCES dbo.categories ([name]),
    recorded_on datetime NOT NULL,

    INDEX [dbo.logger_data logger_uuid recorded_on] 
        CLUSTERED (logger_uuid, recorded_on)
);

I have also added a non-unique clustered index to logger_data on logger_uuid, recorded_on.

Then notice the biggest task in your execution plan is the scan of the 484,836 rows in the data table. Since you are only interested in the most recent reading for a particular logger, and there are only 48 loggers currently, it is more efficient to replace that full scan with 48 singleton seeks:

SELECT 
    category_id = C.id, 
    logger_group_count = COUNT_BIG(DISTINCT L.group_id)
FROM dbo.logger AS L
CROSS APPLY 
(
    -- Latest reading per logger
    SELECT TOP (1) 
        LD.recorded_on,
        LD.category_name
    FROM  dbo.logger_data AS LD
    WHERE LD.logger_uuid = L.uuid
    ORDER BY 
        LD.recorded_on DESC
) AS LDT1
JOIN dbo.categories AS C
    ON C.[name] = LDT1.category_name
GROUP BY
    C.id
ORDER BY
    C.id;

The execution plan is:

Estimated plan

dbfiddle

You should also patch your instance from 2017 RTM to the latest cumulative update.