Is support for Parallel Scalar UDF a reasonable feature request?

It is fairly well documented that UDFs force an overall serial plan.

I'm not certain it is all that well documented.

A scalar T-SQL function prevents parallelism anywhere in the plan.
A scalar CLR function can be executed in parallel, so long as it does not access the database.
A multi-statement table-valued T-SQL function forces a serial zone in a plan that may use parallelism elsewhere.
An inline table-valued T-SQL function is expanded like a view, so has no direct effect.

See Forcing a Parallel Execution Plan and/or Craig Freedman's Parallel Execution presentation.

There are claims about UDFs being a black box must use cursor.

These claims are not correct.

Extra points for explaining why the engine forces the whole plan to be serial instead of just the UDF calculation stage.

My understanding is that the current restrictions are a purely the result of certain implementation details. There is no fundamental reason why functions could not be executed using parallelism.

Specifically, T-SQL scalar functions execute inside a separate T-SQL context, which complicates correct operation, coordination and shutdown (especially in the case of an error) significantly.

Equally, table variables do support parallel reads (but not writes) in general, but the table variable exposed by a table-valued function is not able to support parallel reads for implementation-specific reasons. You would need someone with source code access (and the freedom to share details) to provide an authoritative answer, I'm afraid.

Is support for parallel UDF a reasonable feature to request?

Of course, if you can make a strong-enough case. My own feeling is that the work involved would be extensive, so your proposal would have to meet an extremely high bar. For example, a related (and much simpler) request to provide inline scalar functions has great support, but has languished unimplemented for years now.

You might like to read the Microsoft paper:

Froid: Optimization of Imperative Programs in a Relational Database (pdf)

...which outlines the approach Microsoft look to be taking to address T-SQL scalar function performance issues in the release after SQL Server 2017.

The goal of Froid is to enable developers to use the abstractions of UDFs and procedures without compromising on performance. Froid achieves this goal using a novel technique to automatically convert imperative programs into equivalent relational algebraic forms whenever possible. Froid models blocks of imperative code as relational expressions, and systematically combines them into a single expression using the Apply operator, thereby enabling the query optimizer to choose efficient set-oriented, parallel query plans.

(emphasis mine)

Inline scalar T-SQL functions are now implemented in SQL Server 2019.

As Paul has rightly mentioned in his answer, there is no fundamental reason why scalar UDFs could not be executed using parallelism. However, apart from the implementation challenges, there is another reason for forcing them to be serial. The Froid paper cited by Paul gives more information about this.

Quoting from the paper (Section 2.3):

Currently, SQL Server does not use intra-query parallelism in queries that invoke UDFs. Methods can be designed to mitigate this limitation, but they introduce additional challenges, such as picking the right degree of parallelism for each invocation of the UDF.

For instance, consider a UDF that invokes other SQL queries, such as the one in Figure 1. Each such query may itself use parallelism, and therefore, the optimizer has no way of knowing how to share threads across them, unless it looks into the UDF and decides the degree of parallelism for each query within (which could potentially change from one invocation to another). With nested and recursive UDFs, this issue becomes even more difficult to manage.

The approach of Froid, as described in the paper, will not only result in parallel plans, but also adds many more benefits for queries with UDFs. In essence, it subsumes your request for parallel execution of UDFs.

Update: Froid is now available as a feature of SQL Server 2019 preview. The feature is called "Scalar UDF Inlining". More details here: https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/11/07/introducing-scalar-udf-inlining/

[Disclosure: I am a co-author of the Froid paper]

Is support for Parallel Scalar UDF a reasonable feature request?

Tags:

Sql Server

Functions

Parallelism

Related

Recent Posts