How to unnest and GROUP BY elements of a JSON array?

The data type of the column people is json, as is the result of json_array_elements(people). And there is no equality operator (=) for the data type json. So you also cannot run GROUP BY on it. More:

How to remove known elements from a JSON[] array in PostgreSQL?

jsonb has an equality operator, so the "workaround" in your answer is to cast to jsonb and use the equivalent jsonb_array_elements(). The cast adds cost:

jsonb_array_elements(people::jsonb)

Since Postgres 9.4 we also have json_array_elements_text(json) returning array elements as text. Related:

How to turn JSON array into Postgres array?

So:

SELECT p.name, count(*) AS c
FROM   band b, json_array_elements_text(b.people) p(name)
GROUP  BY p.name;

It seems more convenient to get names as text instead of jsonb objects (double-quoted in text representation) and your "desired output" indicates you want / need text in the result to begin with.

GROUP BY on text data is also cheaper than on jsonb, so this alternative "workaround" should be faster for two reasons. (Test with EXPLAIN (ANALYZE, TIMING OFF).)

For the record, there was nothing wrong with your original answer. The comma (,) is just as "correct" as CROSS JOIN LATERAL. Having been defined earlier in standard SQL does no make it inferior. See:

What does [FROM x, y] mean in Postgres?

Neither is it more portable to other RDBMS, and since jsonb_array_elements() or json_array_elements_text() aren't portable to other RDBMS to begin with, that's also irrelevant. The short query does not get any clearer with CROSS JOIN LATERAL IMO, but the last bit is just my personal opinion.

I used the more explicit table and column alias p(name) and the table-qualified reference p.name to defend against possible duplicate names. name is such a common word, it might also pop up as column name in the underlying table band, in which case it would silently resolve to band.name. The simple form json_array_elements_text(people) name only attaches a table alias, the column name is still value, as returned from the function. But name resolves to it's single column value when used in the SELECT list. It happens to work as expected. But a true column name name (if band.name should exist) would bind first. While that won't bite in the given example, it can be a a loaded foot gun in other cases.

Don't use the generic "name" as identifier to begin with. Maybe that was just for the simple test case.

If the column people can hold anything but a plain JSON array, either query would trigger an exception. If you cannot guarantee data integrity, you might want to defend with json_typeof():

SELECT p.name, count(*) AS c
FROM   band b, json_array_elements_text(b.people) p(name)
WHERE  json_typeof(b.people) = 'array'
GROUP  BY 1; -- optional short syntax since you seem to prefer short syntax

Excludes violating rows from the query.

When can we use an identifier number instead of its name in PostgreSQL?

Based on @ypercubeᵀᴹ comment I ended up with :

SELECT name, count(*) as c
FROM band 
CROSS JOIN LATERAL jsonb_array_elements(people::jsonb) as name
GROUP BY name;

Just used jsonb_array_elements instead of unnest.

How to unnest and GROUP BY elements of a JSON array?

Tags:

Postgresql

Json

Group By

Array

Related

Recent Posts