Bad performance using "NOT IN"

The performance difference is because you have an index that's suited to the first query ("f"."eid" = 8) but not for the second one (F.eid NOT IN (8,10)). One thing that was very confusing to me was that the queries actually use different indexes:

[dd_produccion_test2].[dbo].[files].[IX dbo.files cid, year, name : grapado IS NULL AND masterversion IS NULL]
[dd_produccion_test2].[dbo].[files].[IX dbo.files cid, year, eid, name : grapado IS NULL AND masterversion IS NULL] [F]

You don't have these indexes defined in the question. You have a definition for this one:

CREATE NONCLUSTERED INDEX [IX dbo.files cid, year, eid : grapado IS NULL AND masterversion IS NULL] ON [dbo].[archivos]
(
    [cid] ASC,
    [year] ASC,
    [eid] ASC
)
INCLUDE (   [grapado],
    [masterversion]) 
WHERE ([grapado] IS NULL AND [masterversion] IS NULL)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])

I will assume that the index definitions follow the same pattern based on their names.

Let's start with the fast query that uses the index with key columns of cid, year, id, and name. Your fast query has equality predicates on the first three key columns and an ORDER BY defined on the fourth key column. That allows you to seek directly to the rows that you want. There is no residual IO. You only need to read 50 rows from the index and the fourth key column means that the data is already ordered in the way that you want it. All in all the query probably does something like a few thousand index seeks so it's not surprising that it's fast.

For the slow query, the filter on eid is no longer an equality predicate. This is a very important difference. The index with key columns on cid, year, eid, name is still covering, but using it looks like this:

  • first key column: equality filter
  • second key column: equality filter
  • third key column: not equal to filter
  • fourth key column: ORDER BY

Without an equality filter on the third key column, the engine doesn't support taking advantage of the ordered nature of the index to avoid a sort on the fourth column (name). If the query optimizer used this index for this query it would need to read all matching rows from the index and sort them to find the first 50 rows. That type of operation can have a large estimated cost.

Instead, the query optimizer picks an index with key columns of cid, year, and name. With equality filters on the first two key columns, SQL Server is able to take advantage of the ordered nature of the index to avoid an explicit sort by name. However, the eid column is not present in this index. This leads to the order-preserving key lookup to filter on eid, which is almost certainly the cause of your performance issue. I would try adding eid as an included column to [dd_produccion_test2].[dbo].[files].[IX dbo.files cid, year, name : grapado IS NULL AND masterversion IS NULL]:

CREATE NONCLUSTERED INDEX 
    [IX dbo.files cid, year, name : grapado IS NULL AND masterversion IS NULL]
ON [dbo].[files]
    ( cid, year, name )
    INCLUDE ( eid )
    WHERE ( grapado IS NULL AND masterversion IS NULL ) 
... ;

That should avoid the key lookup and improve performance, but it's possible that you'll do some residual IO because you can't immediately seek to the first 50 rows that match all of the filters.

You might be wondering why SQL Server choose such a poorly performing plan or why a plan with an estimated cost of just 0.777646 optimizer units takes over 10 minutes to execute. The answer has to do with row goals. Let's look at the key lookup in the slow plan:

enter image description here

The query optimizer thinks that it will need to do 151 key lookups before it finds 50 rows that satisfy F.eid NOT IN (8,10). Perhaps your statistics are out of date (there are warnings on some columns in your plan) or it is making an overly optimistic assumption. I suspect that it needs to do many more seeks than 151. You can try disabling row goals in the plan as a test, but I suspect that you won't see good performance without some kind of change in your index definitions.


Analysis

Going back over the history of your questions on this site, it seems you are struggling to find a design that allows you to execute a variety of queries with differing filtering and ordering requirements. Not all these filtering requirements can be supported by simple b-tree indexes (e.g. NOT IN).

If you are lucky enough to know in advance exactly which filtering conditions and orders will be needed, and the number of combinations is relatively small, a skilled database index tuner might be able to come up with a reasonable number of nonclustered indexes to meet those needs.

On the other hand, it might be that the number of indexes required would become impractical. There aren't enough details in any of your questions about the filtering/ordering requirements to make that assessment. We also don't know enough about your data relationships. But from the number of overlapping indexes you already have, it seems you are at risk of heading down this path.

Recommendation

As an alternative, I propose you drop the existing nonclustered indexes you have created so far to support your queries, and replace them with a single nonclustered columnstore index. These have recently become available on the Standard tier for Azure SQL Database. Leave the clustered primary key and any constraints or unique indexes you have for key enforcement.

The basic idea is to cover all the columns you filter and order by in the nonclustered columnstore index, and optionally filter the index by any conditions that are always (or very commonly) applied. For example:

CREATE NONCLUSTERED COLUMNSTORE INDEX 
    [NC dbo.files id, name, year, cid, eid, created, grapado, masterversion]
ON dbo.files
    (id, [name], [year], cid, eid, created, grapado, masterversion)
WHERE
    grapado IS NULL
    AND masterversion IS NULL;

This will then enable you to find qualifying rows very quickly, for a very wide range of queries, some of which would be hard or impossible to accommodate with b-tree indexes.

Examples

Start by writing a query to return the primary keys of the files table your query will return, and the order-by column. For example:

SELECT
    -- files table primary key and order by column
    F.id,
    F.created,
    F.[name]
FROM dbo.files AS F
WHERE
    F.grapado IS NULL
    AND F.masterversion IS NULL
    AND F.[year] IN (0, 2013)
    AND F.cid = 19
    AND F.eid NOT IN (10, 12)
ORDER BY
    F.[name] ASC
OFFSET 0 ROWS
FETCH NEXT 50 ROWS ONLY;

You should find that query performs very well, regardless of the filtering or ordering conditions you use. We do not return all the columns we need from the files table at this stage to minimize the size of the data we need to sort.

Now that we have the primary keys for the 50 rows needed, we can expand the query to add the remaining columns, including from any lookup table:

WITH FoundKeys AS
(
    SELECT
        -- files table primary key and order by column
        F.id,
        F.created,
        F.[name]
    FROM dbo.files AS F
        WITH (INDEX([NC dbo.files id, name, year, cid, eid, created, grapado, masterversion]))
    WHERE
        F.grapado IS NULL
        AND F.masterversion IS NULL
        AND F.[year] IN (0, 2013)
        AND F.cid = 19
        AND F.eid NOT IN (10, 12)
    ORDER BY
        F.[name] ASC
    OFFSET 0 ROWS
    FETCH NEXT 50 ROWS ONLY
)
SELECT
    F.id,
    F.[name],
    F.[year],
    F.cid,
    F.eid,
    F.created,
    vnVE0.keywordValueCol0_numeric
FROM FoundKeys AS FK
JOIN dbo.files AS F
    -- join on primary key
    ON F.id = FK.id
    AND F.created = FK.created
OUTER APPLY
(
    -- Lookup distinct values
    SELECT
        keywordValueCol0_numeric = 
            CASE
                WHEN VN.[value] IS NOT NULL AND VN.[value] <> ''
                THEN CONVERT(decimal(28, 2), VN.[value])
                ELSE CONVERT(decimal(28, 2), 0)
            END
    FROM dbo.value_number AS VN
    WHERE
        VN.id_file = F.id
        AND VN.id_field = 260
    GROUP BY
        VN.[value]
) AS vnVE0
ORDER BY
    FK.[name];

This should produce an execution plan like:

estimated plan

I would also encourage you to think again about partitioning these tables. It does not seem necessary to me at all, is over-complicating the queries, and compromising your uniqueness constraints. It should be possible to batch up rows to be inserted and apply them very quickly without blocking concurrent readers, if that is the concern.

If you want to compare the performance of the solution above with a simpler version (that may be slower to sort), try:

SELECT
    F.id,
    F.[name],
    F.[year],
    F.cid,
    F.eid,
    F.created,
    keywordValueCol0_numeric = 
        CASE
            WHEN vnVE0.[value] IS NOT NULL AND vnVE0.[value] <> ''
            THEN CONVERT(decimal(28, 2), vnVE0.[value])
            ELSE CONVERT(decimal(28, 2), 0)
        END
FROM dbo.files AS F
    WITH (INDEX([NC dbo.files id, name, year, cid, eid, created, grapado, masterversion]))
OUTER APPLY
(
    SELECT DISTINCT VN.[value]
    FROM dbo.value_number AS VN
    WHERE VN.id_file = F.id
    AND VN.id_field = 260
) AS vnVE0
WHERE
    F.grapado IS NULL
    AND F.masterversion IS NULL
    AND F.[year] IN (0, 2013)
    AND F.cid = 19
    AND F.eid NOT IN (10, 12)
ORDER BY
    F.[name] ASC
OFFSET 0 ROWS
FETCH NEXT 50 ROWS ONLY;

simpler query