LOB_DATA, slow table scans, and some I/O questions

Presence of XML field causes most of the table data to be located on LOB_DATA pages (in fact ~90% of table pages are LOB_DATA).

Merely having the XML column in the table does not have that effect. It is the presence of XML data that, under certain conditions, causes some portion of a row's data to be stored off row, on LOB_DATA pages. And while one (or maybe several ;-) might argue that duh, the XML column implies that there will indeed be XML data, it is not guaranteed that the XML data will need to be stored off row: unless the row is pretty much already filled outside of their being any XML data, small documents (up to 8000 bytes) might fit in-row and never go to a LOB_DATA page.

am I correct in thinking that LOB_DATA pages can cause slow scans not only because of their size, but also because SQL Server can't scan the clustered index effectively when there's a lot of LOB_DATA pages in the table?

Scanning refers to looking at all of the rows. Of course, when a data page is read, all of the in-row data is read, even if you selected a subset of the columns. The difference with LOB data is that if you don't select that column, then the off-row data won't be read. Hence it is not really fair to draw a conclusion about how efficiently SQL Server can scan this Clustered Index since you didn't exactly test that (or you tested half of it). You selected all columns, which includes the XML column, and as you mentioned, that is where most of the data is located.

So we already know that the SELECT TOP 1000 * test wasn't merely reading a series of 8k data pages, all in a row, but instead jumping to other locations per each row. The exact structure of that LOB data can vary based on how large it is. Based on research shown here ( What is the Size of the LOB Pointer for (MAX) Types Like Varchar, Varbinary, Etc? ), there are two types of off-row LOB allocations:

  1. Inline Root -- for data between 8001 and 40,000 (really 42,000) bytes, space permitting, there will be 1 to 5 pointers (24 - 72 bytes) IN ROW that point directly to the LOB page(s).
  2. TEXT_TREE -- for data over 42,000 bytes, or if the 1 to 5 pointers can't fit in-row, then there will be just a 24 byte pointer to the starting page of a list of pointers to the LOB pages (i.e. the "text_tree" page).

One of these two situations is occurring each time you retrieve LOB data that is over 8000 bytes or just didn't fit in-row. I posted a test script on PasteBin.com ( T-SQL script to test LOB allocations and reads ) that shows the 3 types of LOB allocations (based on the size of the data) as well as the effect each of those has on logical and physical reads. In your case, if the XML data really is less than 42,000 bytes per row, then none of it (or very little of it) should be in the least-efficient TEXT_TREE structure.

If you wanted to test how quickly SQL Server can scan that Clustered Index, do the SELECT TOP 1000 but specify one or more columns not including that XML column. How does that affect your results? It should be quite a bit faster.

is it considered reasonable to have such a table structure/data pattern?

Given that we have an incomplete description of the the actual table structure and data pattern, any answer might not be optimal depending on what those missing details are. With that in mind, I would say that there is nothing obviously unreasonable about your table structure or data pattern.

I can (in a c# app) compress XML from 20KB to ~2.5KB and store it in VARBINARY column, preventing usage of LOB data pages. This speeds SELECTs 20x times in my tests.

That made selecting all columns, or even just the XML data (now in VARBINARY) faster, but it actually hurts queries that don't select the "XML" data. Assuming you have about 50 bytes in the other columns and have a FILLFACTOR of 100, then:

  • No Compression: 15k of XML data should require 2 LOB_DATA pages, which then requires 2 pointers for the Inline Root. The first pointer is 24 bytes and the second is 12, for a total of 36 bytes stored in-row for the XML data. The total row size is 86 bytes, and you can fit about 93 of those rows onto a 8060 byte data page. Hence, 1 million rows requires 10,753 data pages.

  • Custom Compression: 2.5k of VARBINARY data will fit in-row. The total row size is 2610 (2.5 * 1024 = 2560) bytes, and you can fit only 3 of those rows onto a 8060 byte data page. Hence, 1 million rows requires 333,334 data pages.

Ergo, implementing custom compression results in a 30x increase in data pages for the Clustered Index. Meaning, all queries using a Clustered Index scan now have about 322,500 more data pages to read. Please see detailed section below for additional ramifications of doing this type of compression.

I would caution against doing any refactoring based on the performance of SELECT TOP 1000 *. That is not likely to be a query that the application will even issue, and should not be used as the sole basis for potentially needless optimization(s).

For more detailed info and more tests to try, please see the section below.


This Question cannot be given a definitive answer, but we can at least make some progress and suggest additional research to help move us closer to figuring out the exact issue (ideally based on evidence).

What we know:

  1. Table has approximately 1 million rows
  2. Table size is approximately 15 GB
  3. Table contains one XML column and several other columns of types: INT, BIGINT, UNIQUEIDENTIFIER, "etc"
  4. XML column "size" is, on average approximately 15k
  5. After running DBCC DROPCLEANBUFFERS, it takes 20 - 25 seconds for the following query to complete: SELECT TOP 1000 * FROM TABLE
  6. The Clustered Index is being scanned
  7. Fragmentation on the Clustered Index is close to 0%

What we think we know:

  1. No other disk activity outside of these queries. Are you sure? Even if there are no other user queries, are there background operations taking place? Are there processes external to SQL Server running on the same machine that could be taking up some of the IO? There might not be, but it isn't clear based solely on the info provided.
  2. 15 MB of XML data is being returned. What is this number based on? An estimation derived from the 1000 rows times the average of 15k of XML data per row? Or a programmatic aggregation of what was received for that query? If it is just an estimation, I wouldn't rely upon it since the distribution of the XML data might not be even in the way that is implied by a simple average.
  3. XML Compression might help. How exactly would you do the compression in .NET? Via the GZipStream or DeflateStream classes? This is not a zero-cost option. It will certainly compress some of the data by a large percentage, but it will also require more CPU as you will need an additional process to compress / decompress the data each time. This plan would also completely remove your ability to:

    • query the XML data via the .nodes, .value, .query, and .modify XML functions.
    • index the XML data.

      Please keep in mind (since you mentioned that XML is "highly redundant") that the XML datatype is already optimized in that it stores the element and attribute names in a dictionary, assigning an integer index ID to each item, and then using that integer ID throughout the document (hence it does not repeat the full name per each usage, nor does it repeat it again as a closing tag for elements). The actual data also has extraneous white space removed. This is why extracted XML documents don't retain their original structure and why empty elements extract as <element /> even if they went in as <element></element>. So any gains from compressing via GZip (or anything else) will only be found by compressing the element and/or attribute values, which is a much smaller surface area that could be improved than most would expect, and most likely not worth the loss of capabilities as noted directly above.

      Please also keep in mind that compressing the XML data and storing the VARBINARY(MAX) result won't eliminate the LOB access, it will just reduce it. Depending on the size of the rest of the data on the row, the compressed value might fit in-row, or it might still require LOB pages.

That information, while helpful, is not nearly enough. There are a lot of factors that influence query performance, so we need a much more detailed picture of what is going on.

What we don't know, but need to:

  1. Why does the performance of SELECT * matter? Is this a pattern that you use in code. If so, why?
  2. What is the performance of selecting only the XML column? What are the statistics and timing if you do just: SELECT TOP 1000 XmlColumn FROM TABLE; ?
  3. How much of the 20 - 25 seconds it takes to return these 1000 rows is related to network factors (getting the data across the wire), and how much is related to client factors (rendering that approximately 15 MB plus the rest of the non-XML data into the grid in SSMS, or possibly saving to disk)?

    Factoring out these two aspects of the operation can sometimes be done by simply not returning the data. Now, one might think to select into a Temporary Table or Table Variable, but this would just introduce a few new variables (i.e. disk I/O for tempdb, Transaction Log writes, possible auto-growth of tempdb data and/or log file, need space in the Buffer Pool, etc). All of those new factors can actually increase the query time. Instead, I typically store the columns into variables (of the appropriate datatype; not SQL_VARIANT) that get overwritten with each new row (i.e. SELECT @Column1 = tab.Column1,...).

    HOWEVER, as was pointed out by @PaulWhite in this DBA.StackExchange Q & A, Logical reads different when accessing the same LOB data, with additional research of my own posted on PasteBin ( T-SQL script to test various scenarios for LOB reads ), LOBs are not accessed consistently between SELECT, SELECT INTO, SELECT @XmlVariable = XmlColumn, SELECT @XmlVariable = XmlColumn.query(N'/'), and SELECT @NVarCharVariable = CONVERT(NVARCHAR(MAX), XmlColumn). So our options are a little more limited here, but here is what can be done:

    1. Rule out network issues by executing the query on the server running SQL Server, either in SSMS or SQLCMD.EXE.
    2. Rule out client issues in SSMS by going to Query Options -> Results -> Grid and checking the option for "Discard results after execution". Please note that this option will prevent ALL output, including messages, but can still be useful to rule out the time it takes SSMS to allocate the memory per each row and then draw it in the grid.
      Alternatively, you could execute the query via SQLCMD.EXE and direct the output to go to nowhere via: -o NUL:.
  4. Is there a Wait Type associated with this query? If yes, what is that Wait Type?
  5. What is the actual data size for the XML columns being returned? The average size of that column across the entire table doesn't really matter if the "TOP 1000" rows contain a disproportionately large portion of the total XML data. If you want to know about the TOP 1000 rows, then look at those rows. Please run the following:

    SELECT TOP 1000 tab.*,
           SUM(DATALENGTH(tab.XmlColumn)) / 1024.0 AS [TotalXmlKBytes],
           AVG(DATALENGTH(tab.XmlColumn)) / 1024.0 AS [AverageXmlKBytes]
           STDEV(DATALENGTH(tab.XmlColumn)) / 1024.0 AS [StandardDeviationForXmlKBytes]
    FROM   SchemaName.TableName tab;
    
  6. The exact table schema. Please provide the full CREATE TABLE statement, including all indexes.
  7. Query plan? Is that something that you can post? That info probably won't change anything, but it is better to know that it won't than to guess that it won't and be wrong ;-)
  8. Is there physical / external fragmentation on the data file? While this might not be a large factor here, since you are using "consumer-grade SATA" and not SSD or even Super-Expensive SATA, the effect of sub-optimally ordered sectors will be more noticeable, especially as the number of those sectors that needs to be read increases.
  9. What are the exact results of the following query:

    SELECT * FROM sys.dm_db_index_physical_stats(DB_ID(),
                              OBJECT_ID(N'dbo.SchemaName.TableName'), 1, 0, N'LIMITED');
    

UPDATE

It occurred to me that I should try to reproduce this scenario to see if I experience similar behavior. So, I created a table with several columns (similar to the vague description in the Question), and then populated it with 1 million rows, and the XML column has approximately 15k of data per row (see code below).

What I found is that doing a SELECT TOP 1000 * FROM TABLE completed in 8 seconds the first time, and 2 - 4 seconds each time thereafter (yes, executing DBCC DROPCLEANBUFFERS before each run of the SELECT * query). And my several-year-old laptop is not fast: SQL Server 2012 SP2 Developer Edition, 64 bit, 6 GB RAM, dual 2.5 Ghz Core i5, and a 5400 RPM SATA drive. I am also running SSMS 2014, SQL Server Express 2014, Chrome, and several other things.

Based on the response time of my system, I will repeat that we need more info (i.e. specifics about the table and data, results of the suggested tests, etc) in order to help narrow down the cause of the 20 - 25 second response time that you are seeing.

SET ANSI_NULLS, NOCOUNT ON;
GO

IF (OBJECT_ID(N'dbo.XmlReadTest') IS NOT NULL)
BEGIN
    PRINT N'Dropping table...';
    DROP TABLE dbo.XmlReadTest;
END;

PRINT N'Creating table...';
CREATE TABLE dbo.XmlReadTest 
(
    ID INT NOT NULL IDENTITY(1, 1),
    Col2 BIGINT,
    Col3 UNIQUEIDENTIFIER,
    Col4 DATETIME,
    Col5 XML,
    CONSTRAINT [PK_XmlReadTest] PRIMARY KEY CLUSTERED ([ID])
);
GO

DECLARE @MaxSets INT = 1000,
        @CurrentSet INT = 1;

WHILE (@CurrentSet <= @MaxSets)
BEGIN
    RAISERROR(N'Populating data (1000 sets of 1000 rows); Set # %d ...',
              10, 1, @CurrentSet) WITH NOWAIT;
    INSERT INTO dbo.XmlReadTest (Col2, Col3, Col4, Col5)
        SELECT  TOP 1000
                CONVERT(BIGINT, CRYPT_GEN_RANDOM(8)),
                NEWID(),
                GETDATE(),
                N'<test>'
                  + REPLICATE(CONVERT(NVARCHAR(MAX), CRYPT_GEN_RANDOM(1), 2), 3750)
                  + N'</test>'
        FROM        [master].[sys].all_columns sac1;

    IF ((@CurrentSet % 100) = 0)
    BEGIN
        RAISERROR(N'Executing CHECKPOINT ...', 10, 1) WITH NOWAIT;
        CHECKPOINT;
    END;

    SET @CurrentSet += 1;
END;

--

SELECT COUNT(*) FROM dbo.XmlReadTest; -- Verify that we have 1 million rows

-- O.P. states that the "clustered index fragmentation is close to 0%"
ALTER INDEX [PK_XmlReadTest] ON dbo.XmlReadTest REBUILD WITH (FILLFACTOR = 90);
CHECKPOINT;

--

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SET STATISTICS IO, TIME ON;
SELECT TOP 1000 * FROM dbo.XmlReadTest;
SET STATISTICS IO, TIME OFF;

/*
Scan count 1, logical reads 21,       physical reads 1,     read-ahead reads 4436,
              lob logical reads 5676, lob physical reads 1, lob read-ahead reads 3967.

 SQL Server Execution Times:
   CPU time = 171 ms,  elapsed time = 8329 ms.
*/

And, because we want to factor out the time taken to read the non-LOB pages, I ran the following query to select all but the XML column (one of the tests I suggested above). This returns in 1.5 seconds fairly consistently.

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SET STATISTICS IO, TIME ON;
SELECT TOP 1000 ID, Col2, Col3, Col4 FROM dbo.XmlReadTest;
SET STATISTICS IO, TIME OFF;

/*
Scan count 1, logical reads 21,    physical reads 1,     read-ahead reads 4436,
              lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1666 ms.
*/

Conclusion (for the moment)
Based on my attempt to recreate your scenario, I don't think we can point to either the SATA drive or non-sequential I/O as the main cause of the 20 - 25 seconds, especially because we still don't know how fast the query returns when not including the XML column. And I was not able to reproduce the large number of Logical Reads (non-LOB) that you are showing, but I have a feeling that I need to add more data to each row in light of that and the statement of:

~90% of table pages are LOB_DATA

My table has 1 million rows, each having just over 15k of XML data, and sys.dm_db_index_physical_stats shows that there are 2 million LOB_DATA pages. The remaining 10% would then be 222k IN_ROW data pages, yet I only have 11,630 of those. So once again, we need more info regarding the actual table schema and actual data.


am I correct in thinking that LOB_DATA pages can cause slow scans not only because of their size, but also because SQL Server can't scan the clustered index effectively

Yes, reading LOB data not stored in-row leads to random IO instead of sequential IO. The disk performance metric to use here to understand why it is fast or slow is Random Read IOPS.

LOB data is stored in a tree structure where the data page in the clustered index points to a LOB Data page with a LOB root structure that in turn points to the actual LOB data. When traversing the root nodes in the clustered index SQL Server can only get the in-row data by sequential reads. To get the LOB data SQL Server has to go somewhere else on disk.

I guess that if you changed to an SSD disk you would not suffer that much from this since random IOPS for an SSD is way higher than for a spinning disk.

is it considered reasonable to have such a table structure/data pattern?

Yes it could be. Depends on what this table is doing for you.

Usually the performance issues with XML in SQL Server happens when you want to use T-SQL to query into the XML and even more so when you want to use values from the XML in a predicate in a where clause or join. If that is the case you could have a look at property promotion or selective XML indexes or a redesign of your table structures shredding the XML to tables instead.

I tried the compression

I did that once in a product a bit more than 10 years ago and have regretted it ever since. I really missed not being able to work with the data using T-SQL, so I would not recommend that to anyone if it can be avoided.