Function Performance

Strictly speaking, the term "stored procedures" points to SQL procedures in Postgres, introduced with Postgres 11. Related:

  • When to use stored procedure / user-defined function?

There are also functions, doing almost but not quite the same, and those have been there from the beginning.

Functions with LANGUAGE sql are basically just batch files with plain SQL commands in a function wrapper (and therefore atomic, always run inside a single transaction) accepting parameters. All statements in an SQL function are planned at once, which is subtly different from executing one statement after the other and may affect the order in which locks are taken.

For anything more, the most mature language is PL/pgSQL (LANGUAGE plpgsql). It works well and has been improved with every release over the last decade, but it serves best as glue for SQL commands. It is not meant for heavy computations (other than with SQL commands).

PL/pgSQL functions execute queries like prepared statements. Re-using cached query plans cuts off some planning overhead and makes them a bit faster than equivalent SQL statements, which may be a noticeable effect depending on circumstances. It may also have side effects like in this related question:

  • PL/pgSQL issues when function used twice (caching problem ?)

This carries the advantages and disadvantages of prepared statements - as discussed in manual. For queries on tables with irregular data distribution and varying parameters dynamic SQL with EXECUTE may perform better when the gain from an optimized execution plan for the given parameter(s) outweighs the cost of re-planning.

Since Postgres 9.2 generic execution plans are still cached for the session but, quoting the manual:

This occurs immediately for prepared statements with no parameters; otherwise it occurs only after five or more executions produce plans whose estimated cost average (including planning overhead) is more expensive than the generic plan cost estimate.

We get best of both worlds most of the time (less some added overhead) without (ab)using EXECUTE. Details in What's new in PostgreSQL 9.2 of the PostgreSQL Wiki.

Postgres 12 introduces the additional server variable plan_cache_mode to force generic or custom plans. For special cases, use with care.

You can win big with server side functions that prevent additional round-trips to the database server from your application. Have the server execute as much as possible at once and only return a well defined result.

Avoid nesting of complex functions, especially table functions (RETURNING SETOF record or TABLE (...)). Functions are black boxes posing as optimization barriers to the query planner. They are optimized separately, not in the context of the outer query, which makes planning simpler, but may result in less than perfect plans. Also, cost and result size of functions cannot be predicted reliably.

The exception to this rule are simple SQL functions (LANGUAGE sql), which can be "inlined" - if some preconditions are met. Read more about how the query planner works in this presentation by Neil Conway (advanced stuff).

In PostgreSQL a function always automatically runs inside a single transaction. All of it succeeds or nothing. If an exception occurs, everything is rolled back. But there is error handling ...

That's also why functions are not exactly "stored procedures" (even though that term is used sometimes, misleadingly). Some commands like VACUUM, CREATE INDEX CONCURRENTLY or CREATE DATABASE cannot run inside a transaction block, so they are not allowed in functions. (Neither in SQL procedures, yet, as of Postgres 11. That might be added later.)

I have written thousands of plpgsql functions over the years.


Some DO's:

  • Use SQL as the function language when possible, as PG can inline the statements
  • Use IMMUTABLE / STABLE / VOLATILE correctly, as PG can cache results if it's immutable or stable
  • Use STRICT correctly, as PG can just return null if any input is null instead of running the function
  • Consider PL/V8 when you can't use SQL as the function language. It is faster than PL/pgSQL in some unscientific tests that I ran
  • Use LISTEN / NOTIFY for longer-running processes that can happen out-of-transaction
  • Consider using functions to implement pagination as key-based pagination can be faster than LIMIT based pagination
  • Make sure you unit-test your functions

Generally speaking moving application logic into the database will mean it is faster - after all it will be running closer to the data.

I believe (but am not 100% sure) that SQL language functions are faster than those using any other languages because they do not require context switching. The downside is that no procedural logic is allowed.

PL/pgSQL is the most mature and feature-complete of the built in languages - but for performance, C can be used (though it will only benefit computationally intensive functions)