PostgreSQL joining using JSONB

This would be more efficient:

With json and json_array_elements() in pg 9.3

SELECT p.id AS p_id, p.data AS p_data
     , c.id AS c_id, c.data AS c_data
FROM   test p
LEFT   JOIN LATERAL json_array_elements(p.data->'children') pc(child) ON TRUE
LEFT   JOIN test c ON c.id = pc.child::text::int;
  • Use the -> operator instead of ->> in the reference to children. The way you have it, you'd first cast json / jsonb to text and then back to json.

  • The clean way to call a set-returning function is LEFT [OUTER] JOIN LATERAL. This includes rows without children. To exclude those, change to a [INNER] JOIN LATERAL or CROSS JOIN - or the shorthand syntax with a comma:

      , json_array_elements(p.data->'children') pc(child)
    
  • Avoiding duplicate column names in result.

SQL Fiddle.

With jsonb and jsonb_array_elements() in pg 9.4

EXPLAIN 
SELECT p.id AS p_id, p.data AS p_data
     , c.id AS c_id, c.data AS c_data
FROM   test p
LEFT   JOIN LATERAL jsonb_array_elements(p.data->'children') pc(child) ON TRUE
LEFT   JOIN test c ON c.id = pc.child::text::int;
-------------------------------------------------------------------------------------------
 Hash Left Join  (cost=37.69..4826.24 rows=123000 width=72)
   Hash Cond: (((pc.child)::text)::integer = c.id)
   ->  Nested Loop Left Join  (cost=0.01..2482.31 rows=123000 width=68)
         ->  Seq Scan on test p  (cost=0.00..22.30 rows=1230 width=36)
         ->  Function Scan on jsonb_array_elements pc  (cost=0.01..1.01 rows=100 width=32)
   ->  Hash  (cost=22.30..22.30 rows=1230 width=36)
         ->  Seq Scan on test c  (cost=0.00..22.30 rows=1230 width=36)

Aside: A normalized DB design with basic data types would be way more efficient for this.


Nevermind, I found the way

SELECT *
 FROM ( SELECT *, json_array_elements((data->>'children')::JSON) child FROM test) x1
   LEFT JOIN test x2
    ON x1.child::TEXT::INT = x2.id
;

 id |                 data                 | child | id |               data
----+--------------------------------------+-------+----+-----------------------------------
  1 | {"parent": null, "children": [2, 3]} | 2     |  2 | {"parent": 1, "children": [4, 5]}
  1 | {"parent": null, "children": [2, 3]} | 3     |  3 | {"parent": 1, "children": []}
  2 | {"parent": 1, "children": [4, 5]}    | 4     |  4 | {"parent": 2, "children": []}
  2 | {"parent": 1, "children": [4, 5]}    | 5     |  5 | {"parent": 2, "children": []}

                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=37.67..4217.38 rows=123000 width=104)
   Hash Cond: ((((json_array_elements(((test.data ->> 'children'::text))::json)))::text)::integer = x2.id)
   ->  Seq Scan on test  (cost=0.00..643.45 rows=123000 width=36)
   ->  Hash  (cost=22.30..22.30 rows=1230 width=36)
         ->  Seq Scan on test x2  (cost=0.00..22.30 rows=1230 width=36)

or

SELECT *
 FROM test x1
    LEFT JOIN ( SELECT *, json_array_elements((data->>'children')::JSON) child FROM test) x2
    ON x1.id = x2.child::TEXT::INT
;

 id |                 data                 | id |                 data                 | child 
----+--------------------------------------+----+--------------------------------------+-------
  2 | {"parent": 1, "children": [4, 5]}    |  1 | {"parent": null, "children": [2, 3]} | 2
  3 | {"parent": 1, "children": []}        |  1 | {"parent": null, "children": [2, 3]} | 3
  4 | {"parent": 2, "children": []}        |  2 | {"parent": 1, "children": [4, 5]}    | 4
  5 | {"parent": 2, "children": []}        |  2 | {"parent": 1, "children": [4, 5]}    | 5
  1 | {"parent": null, "children": [2, 3]} |    |                                      | 

                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=37.67..4217.38 rows=123000 width=104)
   Hash Cond: ((((json_array_elements(((test.data ->> 'children'::text))::json)))::text)::integer = x1.id)
   ->  Seq Scan on test  (cost=0.00..643.45 rows=123000 width=36)
   ->  Hash  (cost=22.30..22.30 rows=1230 width=36)
         ->  Seq Scan on test x1  (cost=0.00..22.30 rows=1230 width=36)