Measuring the complexity of SQL statements

I'm not sure the retrieval of the query plans will answer the question: the query plans hide a part of the complexity about the computation performed on the data before it is returned (or used in a filter); the query plans require a significative database to be relevant. In fact, complexity, and length of execution are somewhat oppposite; something like "Good, Fast, Cheap - Pick any two".

Ultimately it's about the chances of making a mistake, or not understanding the code I've written?

Something like:

  • number of tables times (1
  • +1 per join expression (+1 per outer join?)
  • +1 per predicate after WHERE or HAVING
  • +1 per GROUP BY expression
  • +1 per UNION or INTERSECT
  • +1 per function call
  • +1 per CASE expression
  • )

Common measures of software complexity include Cyclomatic Complexity (a measure of how complicated the control flow is) and Halstead complexity (a measure of complex the arithmetic is).

The "control flow" in a SQL query is best related to "and" and "or" operators in query.

The "computational complexity" is best related to operators such as SUM or implicit JOINS.

Once you've decided how to categorize each unit of syntax of a SQL query as to whether it is "control flow" or "computation", you can straightforwardly compute Cyclomatic or Halstead measures.

What the SQL optimizer does to queries I think is absolutely irrelevant. The purpose of complexity measures is to characterize how hard is to for a person to understand the query, not how how efficiently it can be evaluated.

Similarly, what the DDL says or whether views are involved or not shouldn't be included in such complexity measures. The assumption behind these metrics is that the complexity of machinery inside a used-abstraction isn't interesting when you simply invoke it, because presumably that abstraction does something well understood by the coder. This is why Halstead and Cyclomatic measures don't include called subroutines in their counting, and I think you can make a good case that views and DDL information are those "invoked" abstractractions.

Finally, how perfectly right or how perfectly wrong these complexity numbers are doesn't matter much, as long they reflect some truth about complexity and you can compare them relative to one another. That way you can choose which SQL fragments are the most complex, thus sort them all, and focus your testing attention on the most complicated ones.


Please feel free to try my script that gives an overview of the stored procedure size, the number of object dependencies and the number of parameters -

Calculate TSQL Stored Procedure Complexity