Can I bulk insert into an empty page-compressed table and get full compression?

@HandyD is entirely correct, I only want to highlight some other methods to get compression while inserting into a heap.

From the same document

When a heap is configured for page-level compression, pages receive page-level compression only in the following ways:

  • Data is bulk imported with bulk optimizations enabled.
  • Data is inserted using INSERT INTO ... WITH (TABLOCK) syntax and the table does not have a nonclustered index.
  • A table is rebuilt by executing the ALTER TABLE ... REBUILD statement with the PAGE compression option.

According to this, you could leverage minimally logged bulk inserts or use INSERT INTO ... WITH (TABLOCK) to get PAGE compression without having to do rebuilds.


(a) what is going on here? and (b) is there a way to get this extra-small compressed size directly as I load the table without having to rebuild after the data is loaded?

There are rules to get PAGE compression when inserting into a heap, add -h "TABLOCK" to your bcp command to get compression.

ROW compression works without these prerequisites and is the least amount of compression used in below examples, thanks @DenisRubashkin for pointing that out!


Testing

Example start data & BCP out command


--Tested on SQL Server 2014 SP2

CREATE TABLE dbo.CompressedHeap_Source( Val varchar(512), 
                                 Datefield Date, 
                                 Tinyfield TinyINT,
                                 Floatfield float) 
WITH (DATA_COMPRESSION = PAGE);

INSERT INTO dbo.CompressedHeap_Source
(
Val,Datefield,Tinyfield,Floatfield)

SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2;

--bcp TEST.dbo.CompressedHeap_Source out E:\Data\HeapData.bcp -c -T

The ROW compressed and Uncompressed size

The data size is at 132272 KB when doing a standard insert into the heap, this is ROW compressed but not PAGE compressed.

The data size without any compression is ~ 176216 KB for our test.

exec sp_spaceused 'dbo.CompressedHeap_Source'

name                    rows                    reserved    data      index_size    unused
CompressedHeap_Source   6365530                 132296 KB   132272 KB   8 KB    16 KB

INSERT INTO ... WITH TABLOCK

Inserting WITH TABLOCK gives us the PAGE compressed data size, 69480 KB.

INSERT INTO dbo.CompressedHeap_Source2  WITH(TABLOCK)
(
Val,Datefield,Tinyfield,Floatfield)

SELECT 'Bla',cast(getdate() as date),1,1.2412
FROM master..spt_values spt1
CROSS APPLY master..spt_values spt2

BULK INSERT

Now when we create a destination heap table that is also page compressed, and do a bulk insert with tablock:

CREATE TABLE dbo.CompressedHeap_Destination( Val varchar(512), 
                                 Datefield Date, 
                                 Tinyfield TinyINT,
                                 Floatfield float) 
WITH (DATA_COMPRESSION = PAGE);

bulk insert dbo.CompressedHeap_Destination

from 'E:\Data\HeapData.bcp'  with (TABLOCK)

The data gets page compressed and is also at 69480 KB:

name    rows    reserved    data    index_size  unused
CompressedHeap_Destination  6365530                 69512 KB    69480 KB    8 KB    24 KB

BCP IN WITH TABLOCK

You can get the same results as the BULK INSERT WITH TABLOCK by using BCP IN with the -h "TABLOCK" hint. This makes sense, they do the same internally

--bcp TEST.dbo.CompressedHeap_Destination2 IN E:\Data\HeapData.bcp -c -T -h "TABLOCK"

With the resulting size being 69480 KB

BCP IN WITHOUT TABLOCK

Using BCP to load data from the same file in a copy of the destination table

And a standard bcp command results into non compressed data:

--bcp TEST.dbo.CompressedHeap_Destination2 IN E:\Data\HeapData.bcp -c -T 

With the data size at 132272 KB (row compressed).


According to the Docs article on compression:

New pages allocated in a heap as part of DML operations do not use PAGE compression until the heap is rebuilt. Rebuild the heap by removing and reapplying compression, or by creating and removing a clustered index.

This would seem to align with what you're seeing. It seems like you're not actually getting compression on the table until you rebuild it. You could try loading the data on an uncompressed table and see if you still average 17 rows per page or if this decreases. If it remains the same, then you're not getting compression and the rebuild is necessary.

You could also add a clustered index to your table and that should prevent your table from being uncompressed/low-compressed after bulk loading your data.