Calculating stock quantity based on change log

Sometimes you can improve query performance just by doing a little bit of tuning instead of changing your entire query. I noticed in your actual query plan that your query spills to tempdb in three places. Here's one example:

tempdb spills

Resolving those tempdb spills may improve performance. If Quantity is always non-negative then you can replace UNION with UNION ALL which will likely change the hash union operator to something else that doesn't require a memory grant. Your other tempdb spills are caused by issues with cardinality estimation. You're on SQL Server 2014 and using the new CE so it may be difficult to improve the cardinality estimates because the query optimizer won't use multi-column statistics. As a quick fix, consider using the MIN_MEMORY_GRANT query hint made available in SQL Server 2014 SP2. Your query's memory grant is only 49104 KB and the max available grant is 5054840 KB so hopefully bumping it up won't impact concurrency too much. 10% is a reasonable starting guess but you may need to adjust it up and done depending on your hardware and data. Putting that all together, this is what your query might look like:

WITH t AS
(
    SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId 
    FROM ProductPositionLog
    GROUP BY ToPositionId, ProductId
    UNION ALL
    SELECT FromPositionId AS PositionId, -SUM(Quantity) AS Quantity, ProductId 
    FROM ProductPositionLog
    GROUP BY FromPositionId, ProductId
)

SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
WHERE NOT t.PositionId = 0
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0
OPTION (MIN_GRANT_PERCENT = 10);

If you wish to improve performance further I recommend trying out indexed views instead of building and maintaining your own checkpoint table. Indexed views are significantly easier to get right than a custom solution involving your own materialized table or triggers. They will add a small amount of overhead to all DML operations but it may allow you to remove some of the nonclustered indexes that you currently have. Indexed views appear to be supported in the web edition of the product.

There are some restrictions on indexed views so you'll need to create a pair of them. Below is an example implementation, along with the fake data that I used for testing:

CREATE TABLE dbo.ProductPositionLog (
    LogId BIGINT NOT NULL,
    ProductId BIGINT NOT NULL,
    FromPositionId BIGINT NOT NULL,
    ToPositionId BIGINT NOT NULL,
    Quantity INT NOT NULL,
    FILLER VARCHAR(20),
    PRIMARY KEY (LogId)
);

INSERT INTO dbo.ProductPositionLog WITH (TABLOCK)
SELECT RN, RN % 100, RN % 3999, 3998 - (RN % 3999), RN % 10, REPLICATE('Z', 20)
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
) q;

CREATE INDEX NCI1 ON dbo.ProductPositionLog (ToPositionId, ProductId) INCLUDE (Quantity);
CREATE INDEX NCI2 ON dbo.ProductPositionLog (FromPositionId, ProductId) INCLUDE (Quantity);

GO    

CREATE VIEW ProductPositionLog_1
WITH SCHEMABINDING  
AS  
   SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId, COUNT_BIG(*) CNT
    FROM dbo.ProductPositionLog
    WHERE ToPositionId <> 0
    GROUP BY ToPositionId, ProductId
GO  

CREATE UNIQUE CLUSTERED INDEX IDX_V1   
    ON ProductPositionLog_1 (PositionId, ProductId);  
GO  

CREATE VIEW ProductPositionLog_2
WITH SCHEMABINDING  
AS  
   SELECT FromPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId, COUNT_BIG(*) CNT
    FROM dbo.ProductPositionLog
    WHERE FromPositionId <> 0
    GROUP BY FromPositionId, ProductId
GO  

CREATE UNIQUE CLUSTERED INDEX IDX_V2   
    ON ProductPositionLog_2 (PositionId, ProductId);  
GO  

Without the indexed views the query takes about 2.7 seconds to finish on my machine. I get a similar plan to yours except mine runs in serial:

enter image description here

I believe that you'll need to query the indexed views with the NOEXPAND hint because you aren't on enterprise edition. Here's one way to do that:

WITH t AS
(
    SELECT PositionId, Quantity, ProductId 
    FROM ProductPositionLog_1 WITH (NOEXPAND)
    UNION ALL
    SELECT PositionId, Quantity, ProductId 
    FROM ProductPositionLog_2 WITH (NOEXPAND)
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0;

This query has a simpler plan and finishes in under 400 ms on my machine:

enter image description here

The best part is that you won't have to change any of the application code that loads data into the ProductPositionLog table. You simply need to verify that the DML overhead of the pair of indexed views is acceptable.


I don't really think your current approach is all that inefficient. Seems like a pretty straightforward way to do it. Another approach might be to use an UNPIVOT clause, but I'm not sure it would be a performance improvement. I implemented both approaches with the below code (just over 5 million rows), and each returned in about 2 seconds on my laptop, so I'm not sure what is so different about my data set compared to the real one. I didn't even add any indexes (other than a primary key on LogId).

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[ProductPositionLog]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[ProductPositionLog] (
[LogId] int IDENTITY(1, 1) NOT NULL PRIMARY KEY,
[ProductId] int NULL,
[FromPositionId] int NULL,
[ToPositionId] int NULL,
[Date] datetime NULL,
[Quantity] int NULL
)
END;
GO

SET IDENTITY_INSERT [ProductPositionLog] ON

INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (1, 123, 0, 1, '2018-01-01 08:10:22', 5)
INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (2, 123, 0, 2, '2018-01-03 15:15:10', 9)
INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (3, 123, 1, 3, '2018-01-07 21:08:56', 3)
INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (4, 123, 3, 0, '2018-02-09 10:03:23', 2)
INSERT INTO [ProductPositionLog] ([LogId], [ProductId], [FromPositionId], [ToPositionId], [Date], [Quantity])
VALUES (5, 123, 2, 3, '2018-02-09 10:03:23', 4)
SET IDENTITY_INSERT [ProductPositionLog] OFF

GO

INSERT INTO ProductPositionLog
SELECT ProductId + 1,
  FromPositionId + CASE WHEN FromPositionId = 0 THEN 0 ELSE 1 END,
  ToPositionId + CASE WHEN ToPositionId = 0 THEN 0 ELSE 1 END,
  [Date], Quantity
FROM ProductPositionLog
GO 20

-- Henrik's original solution.
WITH t AS
(
    SELECT ToPositionId AS PositionId, SUM(Quantity) AS Quantity, ProductId 
    FROM ProductPositionLog
    GROUP BY ToPositionId, ProductId
    UNION
    SELECT FromPositionId AS PositionId, -SUM(Quantity) AS Quantity, ProductId 
    FROM ProductPositionLog
    GROUP BY FromPositionId, ProductId
)
SELECT t.ProductId, t.PositionId, SUM(t.Quantity) AS Quantity
FROM t
WHERE NOT t.PositionId = 0
GROUP BY t.ProductId, t.PositionId
HAVING SUM(t.Quantity) > 0
GO

-- Same results via unpivot
SELECT ProductId, PositionId,
  SUM(CAST(TransferType AS INT) * Quantity) AS Quantity
FROM   
   (SELECT ProductId, Quantity, FromPositionId AS [-1], ToPositionId AS [1]
   FROM ProductPositionLog) p  
  UNPIVOT  
     (PositionId FOR TransferType IN 
        ([-1], [1])
  ) AS unpvt
WHERE PositionId <> 0
GROUP BY ProductId, PositionId

As far as the checkpoints go, it seems like a reasonable idea to me. Since you say that the updates and deletes are really infrequent, I would just add a trigger on ProductPositionLog that fires on update and delete and that adjusts the checkpoint table appropriately. And just to be extra sure, I would recalculate the checkpoint and cache tables from scratch occasionally.