UPDATE with JOIN on 100mm records, how to do this better? (in T-SQL)

I would take a different approach.

Instead of updating existing tables, just build a new table that has what you need in it.

This will almost certainly be faster:

SELECT DISTINCT
    AutoClassID,
    <Other fields>
INTO
    AutoDataImportStaging.dbo.Automobile
FROM
    AutoData.dbo.AutoClass

As currently written, there are a lot of logical operations happening:

  • Read all values of A.AutoClassName
  • Read all values of B.AutoClassName
  • Compare A and B values
  • Of the matching set, read all values of B.AutoClassID
  • Update existing values of A.AutoClassId to be the B.AutoClassId value through whatever indexes exist

You're trying to do this as a single (very large) transaction. Instead, do the update in smaller batches.

  • SET ROWCOUNT but note this is deprecated in 2012.
  • UPDATE TOP

You would also benefit from:

  • A temporary index on AutoData.dbo.AutoClass.AutoClassName
  • More RAM. Lot's more RAM.

Looping down the table one row at a time, will not be faster!

As suspected, and confirmed by you, this will be i/o bound - having one disk, the reads, write, transaction logs and (any) temp work space will all be competing for the same i/o.

Simple recovery will still log the transactions, but the log will be cleared by a checkpoint. It's possible that you initial log size and auto-growth settings are causing some i/o slow down - the transaction log will need to grow to accommodate the changes.

Have you tried indexing the AutoClassName field? How many different AutoClass values are there?

You may need to batch the updates, based on limitations of your i/o. So update 1 million, checkpoint, repeat....