How can I get running totals of recent rows faster?

I recommend testing with a bit more data to get a better idea of what's going on and to see how different approaches perform. I loaded 16 million rows into a table with the same structure. You can find the code to populate the table at the bottom of this answer.

The following approach takes 19 seconds on my machine:

SELECT TOP (10) seq
    ,value
    ,sum(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING) total
FROM dbo.[Table_1_BIG]
ORDER BY seq DESC;

Actual plan here. Most of the time is spent calculating the sum and doing the sort. Worryingly, the query plan does almost all of the work for the entire result set and filters to the 10 rows that you requested at the very end. This query's runtime scales with the size of the table instead of with the size of the result set.

This option takes 23 seconds on my machine:

SELECT *
    ,(
        SELECT SUM(value)
        FROM dbo.[Table_1_BIG]
        WHERE seq <= t.seq
        ) total
FROM (
    SELECT TOP (10) seq
        ,value
    FROM dbo.[Table_1_BIG]
    ORDER BY seq DESC
    ) t
ORDER BY seq DESC;

Actual plan here. This approach scales with both the number of requested rows and the size of the table. Almost 160 million rows are read from the table:

hello

To get correct results you must sum rows for the entire table. Ideally you would perform this summation only once. It's possible to do this if you change the way that you approach the problem. You can calculate the sum for the entire table then subtract a running total from the rows in the result set. That allows you to the find the sum for the Nth row. One way to do this:

SELECT TOP (10) seq
,value
, [value]
    - SUM([value]) OVER (ORDER BY seq DESC ROWS UNBOUNDED PRECEDING)
    + (SELECT SUM([value]) FROM dbo.[Table_1_BIG]) AS total
FROM dbo.[Table_1_BIG]
ORDER BY seq DESC;

Actual plan here. The new query runs in 644 ms on my machine. The table is scanned once to get the complete total then an additional row is read for each row in the result set. There's no sorting and nearly all of the time is spent calculating the sum in the parallel part of the plan:

pretty_good

If you'd like for this query to go even faster you just need to optimize the part that calculates the complete sum. The above query does a clustered index scan. The clustered index includes all columns but you only need the [value] column. One option is to create a nonclustered index on that column. Another option is to create a nonclustered columnstore index on that column. Both will improve performance. If you're on Enterprise a great option is to create an indexed view like the following:

CREATE OR ALTER VIEW dbo.Table_1_BIG__SUM
WITH SCHEMABINDING
AS
SELECT SUM([value]) SUM_VALUE
, COUNT_BIG(*) FOR_U
FROM dbo.[Table_1_BIG];

GO

CREATE UNIQUE CLUSTERED INDEX CI ON dbo.Table_1_BIG__SUM (SUM_VALUE);

This view returns a single row so it takes up almost no space. There will be a penalty when doing DML but it shouldn't be much different than index maintenance. With the indexed view in play the query now takes 0 ms:

enter image description here

Actual plan here. The best part about this approach is the runtime isn't changed by the size of the table. The only thing that matters is how many rows are returned. For example, if you get the first 10000 rows the query now takes 18 ms to execute.

Code to populate table:

DROP TABLE IF EXISTS dbo.[Table_1_BIG];

CREATE TABLE dbo.[Table_1_BIG] (
    [seq] [int] NOT NULL,
    [value] [bigint] NOT NULL
);

DROP TABLE IF EXISTS #t;
CREATE TABLE #t (ID BIGINT);

INSERT INTO #t WITH (TABLOCK)
SELECT TOP (4000) -1 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
OPTION (MAXDOP 1);

INSERT INTO dbo.[Table_1_BIG] WITH (TABLOCK)
SELECT t1.ID * 4000 + t2.ID, 8 * t2.ID + t1.ID
FROM (SELECT TOP (4000) ID FROM #t) t1
CROSS JOIN #t t2;

ALTER TABLE dbo.[Table_1_BIG]
ADD CONSTRAINT [PK_Table_1] PRIMARY KEY ([seq]);

Difference in the first two approaches

The first plan spends about 7 of the 10 seconds in the Window Spool operator, so this is the main reason it's so slow. It's performing a lot of I/O in tempdb to create this. My stats I/O and time look like this:

Table 'Worktable'. Scan count 1000001, logical reads 8461526
Table 'Table_1'. Scan count 1, logical reads 2609
Table 'Worktable'. Scan count 0, logical reads 0

 SQL Server Execution Times:
   CPU time = 8641 ms,  elapsed time = 8537 ms.

The second plan is able to avoid the spool, and thus the worktable entirely. It simply grabs the top 10 rows from the clustered index, and then does a nested loops join to the aggregation (sum) coming out of a separate clustered index scan. The inner side still ends up reading the whole table, but the table is very dense, so this is reasonably efficient with a million rows.

Table 'Table_1'. Scan count 11, logical reads 26093
 SQL Server Execution Times:
   CPU time = 1563 ms,  elapsed time = 1671 ms.

Improving performance

Columnstore

If you really want the "online reporting" approach, columnstore is likely your best option.

ALTER TABLE [dbo].[Table_1] DROP CONSTRAINT [PK_Table_1];

CREATE CLUSTERED COLUMNSTORE INDEX [PK_Table_1] ON dbo.Table_1;

Then this query is ridiculously fast:

SELECT TOP 10
    seq, 
    value, 
    SUM(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING)
FROM dbo.Table_1
ORDER BY seq DESC;

Here are the stats from my machine:

Table 'Table_1'. Scan count 4, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 3319
Table 'Table_1'. Segment reads 1, segment skipped 0.
Table 'Worktable'. Scan count 0, logical reads 0

 SQL Server Execution Times:
   CPU time = 375 ms,  elapsed time = 205 ms.

You're likely not going to beat that (unless you're really smart - nice one, Joe). Columnstore is crazy good at scanning and aggregating large amounts of data.

Using ROW rather than RANGE window function option

You can get very similar performance to your second query with this approach, which was mentioned in another answer, and which I used in the columnstore example above (execution plan):

SELECT TOP 10
    seq, 
    value, 
    SUM(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING)
FROM dbo.Table_1
ORDER BY seq DESC;

It results in fewer reads than your second approach, and no tempdb activity vs your first approach because the window spool occurs in memory:

...RANGE uses an on-disk spool, while ROWS uses an in-memory spool

Unfortunately, runtime is about the same as your second approach.

Table 'Worktable'. Scan count 0, logical reads 0
Table 'Table_1'. Scan count 1, logical reads 2609
Table 'Worktable'. Scan count 0, logical reads 0

 SQL Server Execution Times:
   CPU time = 1984 ms,  elapsed time = 1474 ms.

Schema-based solution: async running totals

Since you're open to other ideas, you could consider updating the "running total" asynchronously. You could periodically take the results of one of these queries, and load it into a "totals" table. So you'd do something like this:

CREATE TABLE [dbo].[Table_1_Totals]
(
    [seq] [int] NOT NULL,
    [running_total] [bigint] NOT NULL,
    CONSTRAINT [PK_Table_1_Totals] PRIMARY KEY CLUSTERED ([seq])
);

Load it every day / hour / whatever (this took about 2 seconds on my machine with 1mm rows, and could be optimized):

INSERT INTO dbo.Table_1_Totals
SELECT
    seq, 
    SUM(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING) as total
FROM dbo.Table_1 t
WHERE NOT EXISTS (
            SELECT NULL 
            FROM dbo.Table_1_Totals t2
            WHERE t.seq = t2.seq)
ORDER BY seq DESC;

Then your reporting query is very efficient:

SELECT TOP 10
    t.seq, 
    t.value, 
    t2.running_total
FROM dbo.Table_1 t
    INNER JOIN dbo.Table_1_Totals t2
        ON t.seq = t2.seq
ORDER BY seq DESC;

Here are the read stats:

Table 'Table_1'. Scan count 0, logical reads 35
Table 'Table_1_Totals'. Scan count 1, logical reads 3

Schema-based solution: in-row totals with constraints

A really interesting solution to this is covered in detail in this answer to the question: Writing a simple bank schema: How should I keep my balances in sync with their transaction history?

The basic approach would be to track the current running total in-row along with the previous running total and sequence number. Then you can use constraints to validate the running totals are always correct and up-to-date.

Credit to Paul White for providing a sample implementation for the schema in this Q&A:

CREATE TABLE dbo.Table_1
(
    seq integer IDENTITY(1,1) NOT NULL,
    val bigint NOT NULL,
    total bigint NOT NULL,

    prev_seq integer NULL,
    prev_total bigint NULL,

    CONSTRAINT [PK_Table_1] 
        PRIMARY KEY CLUSTERED (seq ASC),

    CONSTRAINT [UQ dbo.Table_1 seq, total]
        UNIQUE (seq, total),

    CONSTRAINT [UQ dbo.Table_1 prev_seq]
        UNIQUE (prev_seq),

    CONSTRAINT [FK dbo.Table_1 previous seq and total]
        FOREIGN KEY (prev_seq, prev_total) 
        REFERENCES dbo.Table_1 (seq, total),

    CONSTRAINT [CK dbo.Table_1 total = prev_total + val]
        CHECK (total = ISNULL(prev_total, 0) + val),

    CONSTRAINT [CK dbo.Table_1 denormalized columns all null or all not null]
        CHECK 
        (
            (prev_seq IS NOT NULL AND prev_total IS NOT NULL)
            OR
            (prev_seq IS NULL AND prev_total IS NULL)
        )
);

When dealing with such a small subset of rows returned, the triangular join is a good option. However, when using window functions you have more options that can increase their performance. The default option for window option is RANGE, but the optimal option is ROWS. Be aware that the difference is not only in the performance, but in the results as well when ties are involved.

The following code is slightly faster than the ones you presented.

SELECT TOP 10 seq
    ,value
    ,sum(value) OVER (ORDER BY seq ROWS UNBOUNDED PRECEDING) total
FROM Table_1
ORDER BY seq DESC