Reducing the size of a database

If you are going to continue logging data in this database, the last thing on earth you want to do is shrink the database file (and then perform index maintenance that will require it to grow again). Never mind that these shrink and grow operations will interfere with performance, and that the end result is not going to leave you much better off than when you started.

Since the file is only going to grow again, this is an extremely futile operation - much like drying off while still in the shower. What are you going to do with all that disk space you free up temporarily? Lease it out to another application until the database needs to grow again? Of course not. If the database grew to that size once, it will grow to that size again, but it is going to be much more efficient to reuse the space within the file without all this unnecessary shrink-grow-shrink-grow roller coaster of the file itself.

Even if you move the logging table to another database, you should do what you can to pre-allocate the data file to a size that will accommodate the amount of logging you want to save (be it a week, a month, what have you). Keep this database trim by purging data every day, and stop worrying about shrinking & re-indexing. If you size it right, there should always be some free space, but not excessive free space. And if you need to re-index (you shouldn't really, if your clustered index is datetime or otherwise monotonically-based), do so after the purge (when you have the most amount of free space), not after a shrink (when you have the least).

You can do what Mark suggests without introducing a new database to the application, or changing the application or its interface to the database at all (one important change, of course, would be the removal of any foreign keys or other database-dependent features). You can simply create your table(s) in the new database, then add an INSTEAD OF INSERT trigger to the table in the current database (I assume there are no updates to the logging table, but you may also need an INSTEAD OF DELETE trigger in case you don't directly control the process that performs the purging). That will help with writes, but you'll have to point reads elsewhere, since there are no INSTEAD OF SELECT triggers. Another alternative would be to rename the existing table and create a synonym or even a view that points to the new table.


If you're going to need to clean up the logs table that has grown, I would avoid a single atomic transaction like:

DELETE dbo.logs_table WHERE [datetime] < '20121201';

This will cause massive log growth and will take a long time. Instead you can break the cleanup into chunks, e.g.

BEGIN TRANSACTION;

SELECT 1;

WHILE @@ROWCOUNT > 0
BEGIN
  COMMIT TRANSACTION;

  -- if in simple: CHECKPOINT
  -- otherwise: BACKUP LOG

  BEGIN TRANSACTION;

  DELETE TOP (1000) FROM dbo.logs_table WHERE [datetime] < '20121201';
END

I picked 1000 and Dec. 1st arbitrarily, I don't know what fits your scenario best. The point is that you want to keep transactions short and contained, and prevent any long-term impact while you clean up the table. Another option I've used in the past, instead of deleting 99% of the junk in a table, move the 1% you want to keep to a new table and drop the old one.

BEGIN TRANSACTION;

SELECT * 
  INTO dbo.new_logs_table 
  FROM dbo.logs_table
  WHERE [datetime] >= '20121201'

COMMIT TRANSACTION;

-- create indexes/constraints/triggers on dbo.new_logs_table

BEGIN TRANSACTION;
  DROP TABLE dbo.logs_table;
  EXEC sp_rename N'dbo.new_logs_table', N'logs_table', N'OBJECT';
COMMIT TRANSACTION;

If you say the logs have never been purged, then you may very well be in a scenario where the database will be at a size that you don't anticipate you will ever need again (say, if you only intend to ever keep a week of logs at a time). In this case I might go along with a shrink operation, but only if truly necessary (e.g. you really do need the space for other purposes). A bunch of empty pages aren't going to affect your backups or other operations, and the pages will eventually be fully deallocated and reused.


The largest table stores complete XML documents from transactions between the app and APIs of sites like eBay.

I'm working with a system at the moment that does exactly the same. While Mat's comment about 230GB not being massive by today's standards is fair, it's still 200GB more than you need for the system to operate. It may be occupying buffer pool, it's certainly contributing to larger and longer backups and of primary concern, it will require longer restore time in the event of a disaster.

Preferable (if the application code is accessible and in a fit state to tolerate change) would be to push any non-critical logging to a different database. You can then switch it to SIMPLE recovery and forego transaction log backups. Obviously this is only suitable for data where loss of point-in-time recovery is deemed acceptable.

In the case I'm looking at, the API request/response documents that are being logged are in effect duplicate data. The transaction details are being stored in the database elsewhere, the documents are for debug purposes only.

Alternately, purge the data more frequently.

But... If your assessment is correct and the data of interest is 30GB, the 32GB server is dedicated to this database, that should be adequate. I'm inclined to suggest you do some more in depth analysis of where your problem is originating from, rather than leap straight into trying to resolve the logging/size issue prematurely.

Crack out the DMV queries. Glen Berry's DMV Diagnostic scripts contains examples for identifying the procedures and ad hoc queries that are consuming the most IO or CPU time. sp_whoisactive is very useful for live analysis.

If you find anything of interest, feel free to post a new question with the details.