Does "empty" updates create equal amounts of transaction-logging?

my seniors have concluded that we cannot use the MERGE or UPDATE statements where all columns are processed in the same statement since it creates excessive logging.

Well, that's nice of them to conclude that. But, have they provided any evidence, or their test script(s), showing this behavior? I would be interested in seeing such a test ;-)

The argument for this is that when you perform an UPDATE-statement in SQL Server, when you set the a column-value and the new value equals the old value, it is still marked as an update in the transaction-log.

This is one of those cases where a little bit of knowledge is misleading. Yes, updating a column to the exact same value is considered an update, just like testing for columns being updated via the UPDATE() function will return 1 as long as the column is in the SET statement, regardless of the value changing or not.

BUT, the missing pieces are:

  1. If none of the columns are changing in value, then that row is not actually updated. And if no rows are updated at all, then the only Transaction Log activity is two records: a LOP_BEGIN_XACT to mark the beginning of the Transaction, and a LOP_COMMIT_XACT to mark the end of the Transaction. But no actual data pages or index pages are modified. This assumes that "Row(s) affected" > 0, yet nothing actually changed.

  2. If all rows are filtered out such that no rows are updated (i.e. "Row(s) affected" = 0), then there is no Tran Log activity.

  3. If any of the columns are changing in value, then additional columns being set to their existing value looks the same in the Transaction Log as not specifying the columns that are not changing in value.

  4. Every query (unless grouped with others in an explicit Transaction) is its own Transaction, and every Transaction in the Transaction Log has, at bare minimum, the 2 entries: one for the BEGIN, and one for either COMMIT or ABORT.

Ergo:

  • Your two options of "Example 1" and "Example 2" are nearly the same as far as the Tran Log is concerned. If there is at least one row to update then they should be the same. But if there are no rows where any columns are changing in value, then "Example 1" (with the WHERE clause) will result in less Tran Log activity since there will be no entries whereas in "Example 2" (all rows "updated") there will be the BEGIN and COMMIT entries. So, I would recommend using the WHERE clause as it is being explicit in your intentions, and will result in slightly less Tran Log activity.

  • Following the advice of your "seniors" is guaranteed to result in more Tran Log activity, not to mention decreased performance. Why? Because:

    • In some cases the same row will be flagged for update if both first name and last name have changed. Even if you wrap both UPDATE statements into a single Explicit Transaction to reduce inconsistency as well as extra BEGIN / END log entries, you will still be updating the row multiple times in some cases, and each modification is logged.
    • Even though the data rows are in memory already, it still takes more time to rescan them per each UPDATE statement.
  • It is always better to know for sure and to see it for yourself rather than rely on conjecture or what someone else claims. To that end, you should test your various options, including the two separate updates suggested by your seniors, and after each test, check via:

    SELECT   *
    FROM     sys.fn_dblog(NULL, NULL) tl
    ORDER BY tl.[Transaction ID] DESC; -- most recent first
    

P.S. I did my initial testing on SQL Server 2012 (SP3) Developer Edition. I then tested again on SQL Server 2016 (RTM) Express Edition and the behavior was the same.

P.P.S. Logically, [T].[first_name] = ISNULL(NULLIF([S].[first_name], [T].[first_name]), [T].[first_name]) is no different than [T].[first_name] = [S].[first_name], it's just wrapped in more functions. But if both columns are 'A', then updating that with an 'A' from the same table as opposed to an 'A' from the other table is the exact same operation.

P.P.P.S. When checking for any differences in string fields, you really need to use a binary Collation, else there could be changes in case-only (or width or combining characters, etc) such that the column's Collation will compare the values as being the same. I do realize that you mentioned those were simplified examples, but I am just making sure that this aspect is not overlooked :-). Hence:

WHERE [T].[first_name] <> [S].[first_name] OR [T].[last_name] <> [S].[last_name]

becomes:

WHERE [T].[first_name] <> [S].[first_name] COLLATE Latin1_General_100_BIN2
OR    [T].[last_name] <> [S].[last_name] COLLATE Latin1_General_100_BIN2

And:

ISNULL(NULLIF([S].[first_name], [T].[first_name]), [T].[first_name]),
ISNULL(NULLIF([S].[last_name], [T].[last_name]), [T].[last_name])

becomes:

ISNULL(NULLIF([S].[first_name] COLLATE Latin1_General_100_BIN2, [T].[first_name]), [T].[first_name]),
ISNULL(NULLIF([S].[last_name] COLLATE Latin1_General_100_BIN2, [T].[last_name]), [T].[last_name])