How SQL Server caches data in memory

Test Data

CREATE TABLE dbo.Table1( A INT IDENTITY(1,1) PRIMARY KEY NOT NULL
                     ,B varchar(255),C int,D int,E  int);

INSERT  INTO dbo.Table1 WITH(TABLOCK)
(B,C,D,E)

SELECT TOP(1000000) 'Some Value ' + CAST((ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) % 400) as varchar(255))-- 400 different values
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
,ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2;

CREATE INDEX IX_B
On dbo.Table1(B);
CREATE INDEX IX_C
On dbo.Table1(C);

Q1

When we do "select top 100 * from Table1" - whole clustered index (table) is being read from disk to memory, even if we need just 100 rows ? or just 100 rows (their data pages) are read from disk to memory ?

Only 100 rows from the clustered index are read. As a result only their data pages are cached into memory.

Example

In my case it reads downward from the clustered index, 1 - 100 values for the column A.

SET STATISTICS IO, TIME ON;
select top 100 * from dbo.Table1

The reads:

Table 'Table1'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

4 logical reads = 4 pages

Why are we still seeing a scan count = 1, but only 4 logical reads you might ask?

Because it is executing a range scan:

enter image description here

enter image description here

to satisfy the top operator.


Q2

Same with nonclustered index, when we do "select top 100 * from Table1 where column B = 'some value'", whole nonclustered index + clustered index gets loaded into memory ? or just 100 rows from nonclustered index + 100 rows from clustered index ?

Example 1

SET STATISTICS IO, TIME ON;
select top 100 * from dbo.Table1
WHERE B='Some Value 200'

You might expect for the nonclustered Index to be used here, but in fact the clustered index is still used:

enter image description here

With after clearing the cache 287 logical reads & 2528 read ahead reads.

Table 'Table1'. Scan count 1, logical reads 287, physical reads 1, read-ahead reads 2528, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

We are caching these read ahead reads after reading them from disk to then read 287 pages from memory.

The read-ahead mechanism is SQL Server’s feature which brings data pages into the buffer cache even before the data is requested by the query. Source

If we checked the cached pages:

cached_pages_count  objectname  indexname   indexid
2536    Table1  PK__Table1__3BD019AE039DA497    1

We see this also shown.

So in this case we are caching some more pages to satisfy the query faster, since we are reading more rows to apply a residual predicate on them:

enter image description here

But we are only reading these pages from the clustered index.

Example 2

You could change the query in the hope of utilizing the nonclustered index:

select top 100 B from dbo.Table1
WHERE B='Some Value 200';

Which it does:

enter image description here

This again gives a small range scan:

Table 'Table1'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Only the nonclustered index pages are loaded into memory here.

Example 3

We might force a key lookup to satisfy query from example 1 by using an index hint:

SET STATISTICS IO, TIME ON;
select top 100 * 
from dbo.Table1
WITH(INDEX(IX_B))
WHERE B='Some Value 200';

This again results in some read ahead reads and more logical reads than example1:

Table 'Table1'. Scan count 1, logical reads 684, physical reads 3, read-ahead reads 465, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Since we are doing a key lookup from the nonclustered to the clustered index we will be caching pages from both indexes:

enter image description here


even if we need just 100 rows ? or just 100 rows (their data pages) are read from disk to memory ?

This could be due to Logical Processing order of SQL query as the TOP clause would be considered at last in the logical sequence. for more details... So it would read less pages when filter applied in WHERE clause to return only 100 rows not by applying TOP 100.