Improve performance of query using IN()

Given tables of the following general form:

CREATE TABLE Device 
(
    ID integer PRIMARY KEY
);

CREATE TABLE EventType
(
    ID integer PRIMARY KEY, 
    Name nvarchar(50) NOT NULL
);

CREATE TABLE [Event]
(
    ID integer PRIMARY KEY, 
    [TimeStamp] datetime NOT NULL, 
    EventTypeID integer NOT NULL REFERENCES EventType, 
    DeviceID integer NOT NULL REFERENCES Device
);

The following index is useful:

CREATE INDEX f1 
ON [Event] ([TimeStamp], EventTypeID) 
INCLUDE (DeviceID)
WHERE EventTypeID IN (2, 5, 7, 8, 9, 14);

For the query:

SELECT
  [Event].ID,
  [Event].[TimeStamp],
  EventType.Name,
  Device.ID
FROM
  [Event]
INNER JOIN EventType ON EventType.ID = [Event].EventTypeID
INNER JOIN Device ON Device.ID = [Event].DeviceID
WHERE
  [Event].[TimeStamp] BETWEEN '2011-01-28' AND '2011-01-29'
  AND Event.EventTypeID IN (2, 5, 7, 8, 9, 14);

The filter meets the AND clause requirement, the first key of the index allows a seek on [TimeStamp] for the filtered EventTypeIDs and including the DeviceID column makes the index covering (because DeviceID is required for the join to the Device table).

Finished plan

The second key of the index - EventTypeID is not strictly required (it could also be an INCLUDEd column); I have included it in the key for the reasons stated here. In general, I advise people to at least INCLUDE columns from a filtered index WHERE clause.


Based on the updated query and execution plan in the question, I agree that the more general index suggested by SSMS is likely the better choice here, unless the list of filtered EventTypeIDs is static as Aaron also mentions in his answer:

CREATE TABLE Device 
(
    ID integer PRIMARY KEY,
    Name nvarchar(50) NOT NULL UNIQUE
);

CREATE TABLE EventType
(
    ID integer PRIMARY KEY, 
    Name nvarchar(20) NOT NULL UNIQUE,
    [Description] nvarchar(100) NOT NULL
);

CREATE TABLE [Event]
(
    ID integer PRIMARY KEY, 
    PLCTimeStamp datetime NOT NULL,
    EventTypeID integer NOT NULL REFERENCES EventType, 
    DeviceID integer NOT NULL REFERENCES Device,
    IATA varchar(50) NOT NULL,
    Data1 integer NULL,
    Data2 integer NULL,
);

Suggested index (declare it unique if that is appropriate):

CREATE UNIQUE INDEX uq1
ON [Event]
    (EventTypeID, PLCTimeStamp)
INCLUDE 
    (DeviceID, IATA, Data1, Data2, ID);

Cardinality information from the execution plan (undocumented syntax, do not use in production systems):

UPDATE STATISTICS dbo.Event WITH ROWCOUNT = 4042700, PAGECOUNT = 400000;
UPDATE STATISTICS dbo.EventType WITH ROWCOUNT = 22, PAGECOUNT = 1;
UPDATE STATISTICS dbo.Device WITH ROWCOUNT = 2806, PAGECOUNT = 28;

Updated query (repeating the IN list for the EventType table helps the optimizer in this specific case):

SELECT
  Event.ID,
  Event.IATA,
  Device.Name,
  EventType.Description,
  Event.Data1,
  Event.Data2,
  Event.PLCTimeStamp,
  Event.EventTypeID
FROM
  Event
INNER JOIN EventType ON EventType.ID = Event.EventTypeID
INNER JOIN Device ON Device.ID = Event.DeviceID
WHERE
  Event.EventTypeID IN (3, 30, 40, 41, 42, 46, 49, 50)
  AND EventType.ID IN (3, 30, 40, 41, 42, 46, 49, 50)
  AND Event.PLCTimeStamp BETWEEN '2011-01-28' AND '2011-01-29'
  AND Event.IATA LIKE '%0005836217%'
ORDER BY Event.ID;

Estimated execution plan:

Second plan

The plan you get will likely be different because I am using guessed statistics. The general point is to give the optimizer as much information as you can, and provide an efficient access method (index) on the 4-million row [Event] table.


The majority of the cost is the clustered index scan, and unless this table is really wide or you don't really need all those columns in the output, I believe SQL Server that this is the optimal path in the current scenario with nothing else changed. It does use a range scan (labeled as a CI seek) to narrow down the range of rows it's interested in, but because of the output it is still going to require either a lookup or a CI scan even with the filtered index you created that is targeted at this range, and even in that case the CI scan is probably still cheapest (or at least SQL Server estimates it as such).

The execution plan does tell you that this index would be useful:

CREATE NONCLUSTERED INDEX ix_EventTypeID_PLCTimeStamp_WithIncludes
  ON [dbo].[Event] ([EventTypeID],[PLCTimeStamp])
  INCLUDE ([ID],[DeviceID],[Data1],[Data2],[IATA]);

Though depending on your data skew it might be better the other way around, e.g.:

CREATE NONCLUSTERED INDEX ix_PLCTimeStamp_EventTypeID_WithIncludes
  ON [dbo].[Event] ([PLCTimeStamp],[EventTypeID])
  INCLUDE ([ID],[DeviceID],[Data1],[Data2],[IATA]);

But I would test both to be sure which is better, if either - the difference between either of those indexes and what you have now may only be marginal (too many variables for us to know) and you have to take into account that an additional index requires extra maintenance, and this can noticeably affect your DML operations (insert/update/delete). You may also consider including the filter criteria in this index as suggested by @SQLKiwi, but only if that is the set of EventTypeID values you search for frequently. If that set changes over time, then the filtered index will only be useful for this specific query.

With such a low row count, I have to wonder how bad the performance could possibly be currently? This query returns 3 rows (but there isn't any indication of how many rows it rejected). How many rows in the table?


I just discover that SQL Server 2008 R2 actually made an index suggestion when I ran the execution plan. This suggested index makes the query run about 90% faster.

The index it suggested was the following:

CREATE NONCLUSTERED INDEX [INDEX_spBagSearch] ON [dbo].[Event] 
(
    [EventTypeID] ASC,
    [PLCTimeStamp] ASC
)
INCLUDE ( [ID],
[DeviceID],
[Data1],
[Data2],
[IATA]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO