Optimization issue with user defined function

There are three main technical reasons you get the plan you do:

  1. The optimizer's costing framework has no real support for non-inline functions. It does not make any attempt to look inside the function definition to see how expensive it might be, it just assigns a very small fixed cost, and estimates the function will produce 1 row of output each time it is called. Both these modelling assumptions are very often completely unsafe. The situation is very slightly improved in 2014 with the new cardinality estimator enabled since the fixed 1-row guess is replaced with a fixed 100-row guess. There is still no support for costing the content of non-inline functions, however.
  2. SQL Server initially collapses joins and applies into a single internal n-ary logical join. This helps the optimizer reason about join orders later on. Expanding the single n-ary join into candidate join orders comes later, and is largely based on heuristics. For example, inner joins come before outer joins, small tables and selective joins before large tables and less selective joins, and so on.
  3. When SQL Server performs cost-based optimization, it splits the effort into optional phases to minimize the chances of spending too long optimizing low-cost queries. There are three main phases, search 0, search 1, and search 2. Each phase has entry conditions, and later phases enable more optimizer explorations than earlier ones. Your query happens to qualify for the least-capable search phase, phase 0. A low enough cost plan is found there that later stages are not entered.

Given the small cardinality estimate assigned to the UDF apply, the n-ary join expansion heuristics unfortunately reposition it earlier in the tree than you would wish.

The query also qualifies for search 0 optimization by virtue of having at least three joins (including applies). The final physical plan you get, with the odd-looking scan, is based on that heuristically-deduced join order. It is costed low enough that the optimizer considers the plan "good enough". The low cost estimation and cardinality for the UDF contributes to this early finish.

Search 0 (also known as the Transaction Processing phase) targets low-cardinality OLTP-type queries, with final plans that usually feature nested loops joins. More importantly, search 0 runs only a relatively small subset of the optimizer's exploration abilities. This subset does not include pulling an apply up the query tree over a join (rule PullApplyOverJoin). This is exactly what is required in the test case to reposition the UDF apply above the joins, to appear last in the sequence of operations (as it were).

There is also an issue where the optimizer can decide between naive nested loops join (join predicate on the join itself) and a correlated indexed join (apply) where the correlated predicate is applied on the inner side of the join using an index seek. The latter is usually the desired plan shape, but the optimizer is capable of exploring both. With incorrect costing and cardinality estimates, it can choose the non-apply NL join, as in the submitted plans (explaining the scan).

So, there are multiple interacting reasons involving several general optimizer features that normally work well to find good plans in a short period of time without using excessive resources. Avoiding any one of the reasons is enough to produce the 'expected' plan shape for the sample query, even with empty tables:

Plan on empty tables with search 0 disabled

There is no supported way to avoid search 0 plan selection, early optimizer termination, or to improve the costing of UDFs (aside from the limited enhancements in the SQL Server 2014 CE model for this). This leaves things like plan guides, manual query rewrites (including the TOP (1) idea or using intermediate temporary tables) and avoiding poorly-costed 'black boxes' (from a QO point of view) like non-inline functions.

Rewriting CROSS APPLY as OUTER APPLY can also work, as it currently prevents some of the early join-collapsing work, but you have to be careful to preserve the original query semantics (e.g. rejecting any NULL-extended rows that might be introduced, without the optimizer collapsing back to a cross apply). You need to be aware though that this behaviour is not guaranteed to remain stable, so you would need to remember to retest any such observed behaviours each time you patch or upgrade the SQL Server.

Overall, the right solution for you depends on a variety of factors that we cannot judge for you. I would, however, encourage you to consider solutions that are guaranteed to always work in future, and that work with (rather than against) the optimizer wherever possible.


It looks like this is a cost based decision by the optimizer but a rather bad one.

If you add 50000 rows to PRODUCT the optimizer thinks the scan is too much work and gives you a plan with three seeks and one call to the UDF.

The plan I get for 6655 rows in PRODUCT

enter image description here

With 50000 rows in PRODUCT I get this plan instead.

enter image description here

I guess the cost for calling the UDF is grossly underestimated.

One workaround that works fine in this case is to change the query to use outer apply against the UDF. I get the good plan no matter how many rows there are in the table PRODUCT.

select  
    S.GROUPCODE,
    H.ORDERCATEGORY
from    
    ORDERLINE L
    join ORDERHDR H on H.ORDERID = L.ORDERID
    join PRODUCT P  on P.PRODUCT = L.PRODUCT    
    outer apply dbo.GetGroupCode (P.FACTORY) S
where   
    L.ORDERNUMBER = 'XXX/YYY-123456' and
    L.RMPHASE = '0' and
    L.ORDERLINE = '01' and
    S.GROUPCODE is not null

enter image description here

The best workaround in your case is probably to get the values you need into a temp table and then query the temp table with a cross apply to the UDF. That way you are sure that the UDF will not be executed more than necessary.

select  
    P.FACTORY,
    H.ORDERCATEGORY
into #T
from    
    ORDERLINE L
    join ORDERHDR H on H.ORDERID = L.ORDERID
    join PRODUCT P  on P.PRODUCT = L.PRODUCT
where   
    L.ORDERNUMBER = 'XXX/YYY-123456' and
    L.RMPHASE = '0' and
    L.ORDERLINE = '01'

select  
    S.GROUPCODE,
    T.ORDERCATEGORY
from #T as T
  cross apply dbo.GetGroupCode (T.FACTORY) S

drop table #T

Instead of persisting to temp table you can use top() in a derived table to force SQL Server to evaluate the result from the joins before the UDF is called. Just use a really high number in the top making SQL Server have to count your rows for that part of the query before it can go on and use the UDF.

select S.GROUPCODE,
       T.ORDERCATEGORY
from (
     select top(2147483647)
         P.FACTORY,
         H.ORDERCATEGORY
     from    
         ORDERLINE L
         join ORDERHDR H on H.ORDERID = L.ORDERID
         join PRODUCT P  on P.PRODUCT = L.PRODUCT    
     where   
         L.ORDERNUMBER = 'XXX/YYY-123456' and
         L.RMPHASE = '0' and
         L.ORDERLINE = '01'
     ) as T
  cross apply dbo.GetGroupCode (T.FACTORY) S

enter image description here

I would like to understand what could be the reason for this since all the operations are done using primary keys and how to fix it if it happens in a more complex query that can't be solved this easily.

I really can't answer that but thought I should share what I know anyway. I don't know why a scan of PRODUCT table is considered at all. There might be cases where that is the best thing to do and there are stuff concerning how the optimizers treats UDF's that I don't know about.

One extra observation was that your query gets a good plan in SQL Server 2014 with the new cardinality estimator. That is because the estimated number of rows for each call to the UDF is 100 instead of 1 as it is in SQL Server 2012 and before. But it will still make the same cost based decision between the scan version and the seek version of the plan. With less than 500 (497 in my case) rows in PRODUCT you get the scan version of the plan even in SQL Server 2014.