At what point does a database update its indexes in a transaction?

I work with SQL Server and Oracle. There are probably some exceptions, but for those platforms the general answer is that data and indexes will be updated at the same time.

I think that it would be helpful to draw a distinction between when the indexes are updated for the session that owns the transaction and for other sessions. By default, other sessions will not see the updated indexes until the transaction is committed. However, the session that owns the transaction will immediately see the updated indexes.

For one way to think about it, consider at a table with a primary key. In SQL Server and Oracle this is implemented as an index. Most of the time we want there to immediately be an error if an INSERT is done which would violate the primary key. For that to happen the index must be updated at the same time as the data. Note that other platforms, such as Postgres, allow deferred constraints which are checked only when the transaction is committed.

Here's a quick Oracle demo showing a common case:

CREATE TABLE X_TABLE (PK INT NULL, PRIMARY KEY (PK));

INSERT INTO X_TABLE VALUES (1);
INSERT INTO X_TABLE VALUES (1); -- no commit

The second INSERT statement throws an error:

SQL Error: ORA-00001: unique constraint (XXXXXX.SYS_C00384850) violated

00001. 00000 - "unique constraint (%s.%s) violated"

*Cause: An UPDATE or INSERT statement attempted to insert a duplicate key. For Trusted Oracle configured in DBMS MAC mode, you may see this message if a duplicate entry exists at a different level.

*Action: Either remove the unique restriction or do not insert the key.

If you prefer to see an index update action below is a simple demo in SQL Server. First create a two column table with one million rows and a nonclustered index on the VAL column:

DROP TABLE IF EXISTS X_TABLE_IX;

CREATE TABLE X_TABLE_IX (
ID INT NOT NULL,
VAL VARCHAR(10) NOT NULL
PRIMARY KEY (ID)
);

CREATE INDEX X_INDEX ON X_TABLE_IX (VAL);

-- insert one million rows with N from 1 to 1000000
INSERT INTO X_TABLE_IX
SELECT N, N FROM dbo.Getnums(1000000);

The following query can use the nonclustered index because the index is a covering index for that query. It contains all of the data needed to execute it. As expected no returns are returned.

SELECT *
FROM X_TABLE_IX
WHERE VAL = 'A';

query 1

Now let's start a transaction and update VAL for almost all of the rows in the table:

BEGIN TRANSACTION

UPDATE X_TABLE_IX
SET VAL = 'A'
WHERE ID <> 1;

Here is part of the query plan for that:

query 2

Circled in red is the update to the nonclustered index. Circled in blue is the update to the clustered index, which is essentially the table's data. Even though the transaction has not been committed we see that the data and the index is updated in part of the query's execution. Note that you will not always see this in a plan depending on the size of data involved along with possibly other factors.

With the transaction still not committed, let's revisit the SELECT query from above.

SELECT *
FROM X_TABLE_IX
WHERE VAL = 'A';

enter image description here

The query optimizer is still able to use the index and this time it estimates that 999999 rows will be returned. Executing the query returns the expected result.

That was a simple demo but hopefully it cleared things up a bit.

As an aside, I am aware of a few cases in which it could be argued that an index is not immediately updated. This is done for performance reasons and the end user should not be able to see inconsistent data. For example, sometimes deletes will not be fully applied to an index in SQL Server. A background process runs and eventually cleans up the data. You can read about ghost records if you're curious.

At what point does a database update its indexes in a transaction?

Tags:

Performance

Transaction

Index

Related

Recent Posts