Bulk Data Loading and Transaction Log

When you see CHECKPOINT as the log_reuse_wait_desc for that database, it is because no checkpoint has happened since the last time the log was truncated.

You can alleviate this issue by manually kicking off a CHECKPOINT command.

Supporting references:
Factors That Can Delay Log Truncation
Checkpoints and the Active Portion of the Log


My first comment is that you are doing an ELT (Extract, Load, Transform) rather than an ETL (Extract, Transform, Load). While ELTs leverage set based relational advantages and can be very fast, they are sometimes very write intensive (hard on storage). Specifically, the t-log. This is because the transform is done on disk (typically an update or insert). I prefer ETL when possible, as the transform is done in the buffer and, when done correctly, requires minimal t-log writes. Buffer is cheap. Fast storage is not. For some bulk operations, the t-log is a non-value adding bottleneck.

Here are a few things that you're doing but I wouldn't recommend.

  1. Bulk loading to tempdb. I'd recommend doing the bulk load on a real table in the destination database. Then you can size you're files accordingly and not worry about impacting tempdb.
  2. Bundling independent procedures together. Split these two procedures up. The bulk load and the merge are independent of each other. Splitting them into individual procedures makes them more modular / unit testable.

It looks like you have the minimal logging rules covered pretty well. You're loading to an empty B-Tree with no non-clustereds, using tf 610, the ordering key specified, in bulk-logged mode. Outside of the temp table, everything looks ok here. As long as the file is actually ordered by the key, you should be good. Are you popping the log on tempdb or the user database?

On the merge statement:

UPDATES will always be fully logged. Are you changing a pretty significant portion of your table? If so, you might consider doing the merge in memory (SSIS data flow task or .Net) then bulk loading into a new table. This is more work, but most of the work is done in the buffer and minimal t-log is used. A minimally logged insert can be faster than a fully logged update if the portion of change is significant.

Since you're using tf 610, the insert can minimally log when using a tablock hint. See here for more info on merge with tablock: http://blogs.msdn.com/b/sqlserverstorageengine/archive/2010/06/03/minimal-logging-and-merge-statement.aspx Note, the update will still be fully logged if you go this route.