Why does adding a TOP 1 dramatically worsen performance?

Since you get the correct plan with the ORDER BY, maybe you could just roll your own TOP operator?

SELECT DOCUMENT_ID, COPIES, REQUESTOR, D_ID, FILE_NUMBER
FROM (
    SELECT dc.DOCUMENT_ID,
           dc.COPIES,
           dc.REQUESTOR,
           dc.D_ID,
           cj.FILE_NUMBER,
           ROW_NUMBER() OVER (ORDER BY cj.FILE_NUMBER) AS _rownum
    FROM DOCUMENT_QUEUE dc
    INNER JOIN CORRESPONDENCE_JOURNAL cj
        ON dc.DOCUMENT_ID = cj.DOCUMENT_ID
    WHERE dc.QUEUE_DATE <= GETDATE()
      AND dc.PRINT_LOCATION = 2
) AS sub
WHERE _rownum=1;

In my mind, the query plan for the ROW_NUMBER() above should be the same as if you had an ORDER BY. The query plan should now have a Segment, Sequence Project and finally a Filter operator, the rest should look just like your good plan.


Try forcing a hash join*

SELECT TOP 1 
       dc.DOCUMENT_ID,
       dc.COPIES,
       dc.REQUESTOR,
       dc.D_ID,
       cj.FILE_NUMBER
FROM DOCUMENT_QUEUE dc
INNER HASH JOIN CORRESPONDENCE_JOURNAL cj
        ON dc.DOCUMENT_ID = cj.DOCUMENT_ID
       AND dc.QUEUE_DATE <= GETDATE()
       AND dc.PRINT_LOCATION = 2
ORDER BY cj.FILE_NUMBER

The optimizer probably thought a loop was going to be better with top 1 and that kind of makes sense but in reality it did not work here. Just a guess here but maybe the estimated cost of that spool was off - it uses TEMPDB - you may have a poorly performing TEMPDB.


* Be careful with join hints, because they force plan table access order to match the written order of the tables in the query (just as if OPTION (FORCE ORDER) had been specified). From the documentation link:

BOL extract

This may not produce any undesirable effects in the example, but in general, it very well might. FORCE ORDER (implied or explicit) is a very powerful hint that goes beyond enforcing order; it prevents a broad range of optimizer techniques being applied, including partial aggregations and reordering.

An OPTION (HASH JOIN) query hint may be less intrusive in suitable cases, since this does not imply FORCE ORDER. It does, however, apply to all joins in the query. Other solutions are available.


Edit: +1 works in this situation because it turns out that FILE_NUMBER is a zero-padded string version of an integer. A better solution here for strings is to append '' (the empty string), as appending a value can affect order, or for numbers to add something which is a constant but contains a non-deterministic function, such as sign(rand()+1). The idea of 'breaking the sort' is still valid here, it's just that my method wasn't ideal.

+1

No, I don't mean I'm agreeing with anything, I mean that as a solution. If you change your query to ORDER BY cj.FILE_NUMBER + 1 then the TOP 1 will behave differently.

You see, with the small row goal in place for an ordered query, the system will try to consume the data in order, to avoid having a Sort operator. It will also avoid building a hash table, figuring that it probably doesn't have to do too much work to find that first row. In your case, this is wrong - from the thickness of those arrows, it looks like it's having to consume a lot of data to find a single match.

The thickness of those arrows suggests that your DOCUMENT_QUEUE (DQ) table is much smaller than your CORRESPONDENCE_JOURNAL (CJ) table. And that the best plan would actually be to check through the DQ rows until a CJ row is found. Indeed, that's what the Query Optimizer (QO) would do if it didn't have this pesky ORDER BY in there, that's nicely supported by a covering index on CJ.

So if you dropped the ORDER BY completely, I expect you'd get a plan which involved a Nested Loop, iterating over the rows in DQ, seeking into CJ to make sure the row exists. And with TOP 1, this would stop after a single row had been pulled.

But if you do actually need the first row in FILE_NUMBER order, then you could trick the system into ignoring that index which seems (incorrectly) to be so helpful, by doing ORDER BY CJ.FILE_NUMBER+1 - which we know will keep the same order as before, but importantly the QO doesn't. The QO will focus on getting the whole set out, so that a Top N Sort operator can be satisfied. This method should produce a plan which contains a Compute Scalar operator to work out the value for ordering, and a Top N Sort operator to get the first row. But to the right of these, you should see a nice Nested Loop, doing lots of Seeks on CJ. And better performance than running through a large table of rows which don't match anything in DQ.

The Hash Match isn't necessarily awful, but if the set of rows you're returning from DQ is way smaller than CJ (as I would expect it to be), then the Hash Match is going to be scanning a lot more of CJ than it needs.

Note: I used +1 instead of +0 because the query optimizer is likely to recognise that +0 changes nothing. Of course, the same thing might apply to the +1, if not now, then at some point in the future.