How can I import a JSON file into PostgreSQL?

In near big-data cases, the most efficient way to import json from a file, not using any external tool, appear to not import a single json from a file but rather a single column csv: A list of one-line jsons:

data.json.csv:

{"id": 23635,"name": "Jerry Green","comment": "Imported from facebook."}
{"id": 23636,"name": "John Wayne","comment": "Imported from facebook."}

then, under psql:

create table t ( j jsonb )
\copy t from 'd:\path\data.json.csv'

One record per json (line) will be added into t table.

"\copy from" import was made for csv, and as such loads data line by line. As a result reading one json per line rather than a single json array to be later splited, will not use any intermediate table and will achieve high throughput.

More of that you will less likely hit the max input line-size limitation that will arise if your input json file is too big.

I would thus first convert your input into a single column csv to then import it using the copy command.


You can feed the JSON into a SQL statement that extracts the information and inserts that into the table. If the JSON attributes have exactly the name as the table columns you can do something like this:

with customer_json (doc) as (
   values 
    ('[
      {
        "id": 23635,
        "name": "Jerry Green",
        "comment": "Imported from facebook."
      },
      {
        "id": 23636,
        "name": "John Wayne",
        "comment": "Imported from facebook."
      }
    ]'::json)
)
insert into customer (id, name, comment)
select p.*
from customer_json l
  cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update 
  set name = excluded.name, 
      comment = excluded.comment;

New customers will be inserted, existing ones will be updated. The "magic" part is the json_populate_recordset(null::customer, doc) which generates a relational representation of the JSON objects.


The above assumes a table definition like this:

create table customer 
(
  id        integer primary key,
  name      text not null,
  comment   text
);

If the data is provided as a file, you need to first put that file into some table in the database. Something like this:

create unlogged table customer_import (doc json);

Then upload the file into a single row of that table, e.g. using the \copy command in psql (or whatever your SQL client offers):

\copy customer_import from 'customers.json' ....

Then you can use the above statement, just remove the CTE and use the staging table:

insert into customer (id, name, comment)
select p.*
from customer_import l
  cross join lateral json_populate_recordset(null::customer, doc) as p
on conflict (id) do update 
  set name = excluded.name, 
      comment = excluded.comment;

It turns out there's an easy way to import a multi-line JSON object into a JSON column in a postgres database using the command line psql tool, without needing to explicitly embed the JSON into the SQL statement. The technique is documented in the postgresql docs, but it's a bit hidden.

The trick is to load the JSON into a psql variable using backticks. For example, given a multi-line JSON file in /tmp/test.json such as:

{
  "dog": "cat",
  "frog": "frat"
}

We can use the following SQL to load it into a temporary table:

sql> \set content `cat /tmp/test.json`
sql> create temp table t ( j jsonb );
sql> insert into t values (:'content');
sql> select * from t;

which gives the result:

               j                
────────────────────────────────
 {"dog": "cat", "frog": "frat"}
(1 row)

You can also perform operations on the data directly:

sql> select :'content'::jsonb -> 'dog';
 ?column? 
──────────
 "cat"
(1 row)

Under the covers this is just embedding the JSON in the SQL, but it's a lot neater to let psql perform the interpolation itself.