Why does ALTER COLUMN to NOT NULL cause massive log file growth?

When you change a column to NOT NULL, SQL Server has to touch every single page, even if there are no NULL values. Depending on your fill factor this could actually lead to a lot of page splits. Every page that is touched, of course, has to be logged, and I suspect due to the splits that two changes may have to be logged for many pages. Since it's all done in a single pass, though, the log has to account for all of the changes so that, if you hit cancel, it knows exactly what to undo.


An example. Simple table:

DROP TABLE dbo.floob;
GO

CREATE TABLE dbo.floob
(
  id INT IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, 
  bar INT NULL
);

INSERT dbo.floob(bar) SELECT NULL UNION ALL SELECT 4 UNION ALL SELECT NULL;

ALTER TABLE dbo.floob ADD CONSTRAINT df DEFAULT(0) FOR bar

Now, let's look at the page details. First we need to find out what page and DB_ID we're dealing with. In my case I created a database called foo, and the DB_ID happened to be 5.

DBCC TRACEON(3604, -1);
DBCC IND('foo', 'dbo.floob', 1);
SELECT DB_ID();

The output indicated that I was interested in page 159 (the only row in DBCC IND output with PageType = 1).

Now, let's look some select page details as we step through the OP's scenario.

DBCC PAGE(5, 1, 159, 3);

enter image description here

UPDATE dbo.floob SET bar = 0 WHERE bar IS NULL;    
DBCC PAGE(5, 1, 159, 3);

enter image description here

ALTER TABLE dbo.floob ALTER COLUMN bar INT NOT NULL;
DBCC PAGE(5, 1, 159, 3);

enter image description here

Now, I don't have all the answers to this, as I am not a deep internals guy. But it's clear that - while both the update operation and the addition of the NOT NULL constraint undeniably write to the page - the latter does so in an entirely different way. It seems to actually change the structure of the record, rather than just fiddle with bits, by swapping out the nullable column for a non-nullable column. Why it has to do that, I'm not quite sure - a good question for the storage engine team, I guess. I do believe that SQL Server 2012 handles some of these scenarios a lot better, FWIW - but I have yet to do any exhaustive testing.


When carrying out the command

ALTER COLUMN ... NOT NULL

This seems to be implemented as an Add Column, Update, Drop Column operation.

  • A new row is inserted into sys.sysrscols to represent a new column. The status bit for 128 is set indicating the column does not allow NULLs
  • An update is carried out on every row of the table setting the new columnn value to that of the old colum value. If the "before" and "after" versions of the row are exactly the same this does not cause any thing to be written to the transaction log otherwise the update is logged.
  • The original column is marked as dropped (this is a metadata only change in sys.sysrscols. rscolid updated to a large integer and status bit 2 set on to indicated dropped)
  • The entry in sys.sysrscols for the new column is altered to give it the rscolid of the old column.

The operation which has the potential to cause lots of logging is the UPDATE of all rows in the table however that does not mean that this will always occur. If the "before" and "after" images of the row are identical then this will be treated as a non updating update and not be logged from my testing so far.

So the explanation as to why you are getting lots of logging will depend upon why exactly the "before" and "after" versions of the row are not the same.

For variable length columns stored in the FixedVar format I found that setting to NOT NULL always causes a change in the row that needs to be logged. The column count and the variable length column count both are incremented and the new column is added to the end of the variable length section duplicating the data.

datetimeoffset(0) is fixed length however and for fixed length columns stored in the FixedVar format the old and new columns both seem to be given the same slot in the fixed length data portion of the row and as they both have the same length and value the "before" and "after" versions of the row are the same. This can be seen in @Aaron's answer. Both versions of the row before and after the ALTER TABLE dbo.floob ALTER COLUMN bar INT NOT NULL; are

0x10000c00 01000000 00000000 020000

This is not logged.

Logically from my description of events the row ought in fact to be different here as the column count 02 should be increased to 03 but no such change actually happens in practice.

Some possible reasons as to why this can occur in a fixed length column are

  • If the column was originally declared as SPARSE then the new column would be stored in a different part of the row from the original causing the before and after row images to be different.
  • If you are using any of the compression options then the before and after versions of the row will be different as the column count section in the CD array is incremented.
  • On databases with one of the snapshot isolation options enabled then the versioning information in each row is updated (@SQL Kiwi points out that this can also occur in databases without SI enabled as described here).
  • There may be some previous ALTER TABLE operation that was implemented as a metadata only change and has not yet been applied to the row. For example if a new nullable variable length column was added then this is originally applied as a metadata only change and it is only actually written out to the rows when they are next updated (the writing that actually occurs in this last instance is just updates to the column count section and the NULL_BITMAP as a NULL varchar column at the end of the row doesn't take up any space)

I faced the same problem regarding a table having 200.000.000 rows. Initially I added the column nullable, then updated all the rows, and finally altered the column to NOT NULL via an ALTER TABLE ALTER COLUMN statement. This resulted in two huge transactions blowing up the logfile incredibly (170 GB growth).

The fastest way I found was the following:

  1. Add the column using a default value

    ALTER TABLE table1 ADD column1 INT NOT NULL DEFAULT (1)
    
  2. Drop the default constraint by using dynamic SQL as the constraint has not been named before:

    DECLARE 
        @constraint_name SYSNAME,
        @stmt NVARCHAR(510);
    
    SELECT @CONSTRAINT_NAME = DC.NAME
    FROM SYS.DEFAULT_CONSTRAINTS DC
    INNER JOIN SYS.COLUMNS C
        ON DC.PARENT_OBJECT_ID = C.OBJECT_ID
        AND DC.PARENT_COLUMN_ID = C.COLUMN_ID
    WHERE
        PARENT_OBJECT_ID = OBJECT_ID('table1')
        AND C.NAME = 'column1';
    

Execution time went down from > 30 minutes to 10 minutes, including replicating the changes via Transactional Replication. I am running a SQL Server 2008 installation (SP2).