Why is array_agg() slower than the non-aggregate ARRAY() constructor?

There is nothing "old school" or "outdated" about an ARRAY constructor (That's what ARRAY(SELECT x FROM foobar) is). It's modern as ever. Use it for simple array aggregation.

The manual:

It is also possible to construct an array from the results of a subquery. In this form, the array constructor is written with the key word ARRAY followed by a parenthesized (not bracketed) subquery.

The aggregate function array_agg() is more versatile in that it can be integrated in a SELECT list with more columns, possibly more aggregations in the same SELECT, and arbitrary groups can be formed with GROUP BY. While an ARRAY constructor can only return a single array from a SELECT returning a single column.

I did not study the source code, but it would seem obvious that a much more versatile tool is also more expensive.

One notable difference: the ARRAY constructor returns an empty array ({}) if no rows qualify. array_agg() returns NULL for the same.

I believe the accepted answer by Erwin could be added with the following.

Usually, we are working with regular tables with indices, instead of temporary tables (without indices) as in the original question. It's useful to note that aggregations, such as ARRAY_AGG, cannot leverage existing indices when the sorting is done during the aggregation.

For example, assume the following query:

SELECT ARRAY(SELECT c FROM t ORDER BY id)

If we have an index on t(id, ...), the index could be used, in favor of a sequential scan on t followed by a sort on t.id. Additionally, if the output column being wrapped in the array (here c) is part of the index (such as an index on t(id, c) or an include index on t(id) include(c)), this could even be an index-only scan.

Now, let's rewrite that query as following:

SELECT ARRAY_AGG(c ORDER BY id) FROM t

Now, the aggregation will not use the index and it has to sort the rows in memory (or even worse for large data sets, on disk). This will always be a sequential scan on t followed by aggregation+sort.

As far as I know, this is not documented in the official documentation, but can be derived from the source. This should be the case for all current versions, v11 included.

Why is array_agg() slower than the non-aggregate ARRAY() constructor?

Tags:

Postgresql

Aggregate

Array

Benchmark

Related

Recent Posts