How to convert a JSON file to an SQLite database

A way do this without CSV or a 3rd party tool is to use the JSON1 extension of SQLite combined with the readfile extension that is provided in the sqlite3 CLI tool. As well as overall being a "more direct" solution, this has the advantage of handling JSON NULL values more consistently than CSV, which will otherwise import them as empty strings.

If the input file is a well-formed JSON file, e.g. the example given as an array:

[
{"uri":"/","user_agent":"example1"},
{"uri":"/foobar","user_agent":"example1"},
{"uri":"/","user_agent":"example2"},
{"uri":"/foobar","user_agent":"example3"}
]

Then this can be read into the corresponding my_table table as follows. Open the SQLite database file my_db.db using the sqlite3 CLI:

sqlite3 my_db.db

then create my_table using:

CREATE TABLE my_table(uri TEXT, user_agent TEXT);

Finally, the JSON data in my_data.json can be inserted into the table with the CLI command:

INSERT INTO my_table SELECT 
  json_extract(value, '$.uri'), 
  json_extract(value, '$.user_agent')
FROM json_each(readfile('my_data.json'));

If the initial JSON file is newline separated JSON elements, then this can be converted first using jq using:

jq -s <my_data_raw.json >my_data.json

It's likely there is a way to do this directly in SQLite using JSON1, but I didn't pursue that given that I was already using jq to massage the data prior to import to SQLite.


sqlitebiter appears to provide a python solution:

A CLI tool to convert CSV/Excel/HTML/JSON/LTSV/Markdown/SQLite/TSV/Google-Sheets to a SQLite database file. http://sqlitebiter.rtfd.io/

docs: http://sqlitebiter.readthedocs.io/en/latest/

project: https://github.com/thombashi/sqlitebiter

  • last update approximately 3 months ago
  • last issue closed approximately 1 month ago, none open
  • noted today, 2018-03-14

I found the easiest way to do this is by using jq and CSV as an intermediary format.

Edit: As pointed out (thanks @Leo), the original question did show newline delimited JSON objects, which each on their own conform to rfc4627, but not all together in that format. jq can handle a single JSON array of objects much the same way though by preprocessing the file using jq '.[]' <input.json >preprocessed.json. If you happen to be dealing with JSON text sequences (rfc7464) luckily jq has got your back too with the --seq parameter.

Edit 2: Both the newline separated JSON and the JSON text sequences have one important advantage; they reduce memory requirements down to O(1), meaning your total memory requirement is only dependent on your longest line of input, whereas putting the entire input in a single array requires that either your parser can handle late errors (i.e. after the first 100k elements there's a syntax error), which generally isn't the case to my knowledge, or it will have to parse the entire file twice (first validating syntax, then parsing, in the process discarding previous elements, as is the case with jq --stream) which also happens rarely to my knowledge, or it will try to parse the whole input at once and return the result in one step (think of receiving a Python dict which contains the entirety of your say 50G input data plus overhead) which is usually memory backed, hence raising your memory footprint by just about your total data size.

Edit 3: If you hit any obstacles, try using keys_unsorted instead of keys. I haven't tested that myself (I kind of assume my columns were already sorted), however @Kyle Barron reports that this was needed.

Getting the CSV

First write your data to a file. I will assume data.json here.

Then construct the header using jq:

% head -1 data.json | jq -r 'keys | @csv'
"uri","user_agent"

The head -1 is because we only want one line. jq's -r makes the output a plain string instead of a JSON-String wrapping the CSV. We then call the internal function keys to get the keys of the input as an array. This we send to the @csv formatter which outputs us a single string with the headers in quoted CSV format.

We then need to construct the data.

% jq -r '[.[]] | @csv' < data.json
"/","example1"
"/foobar","example1"
"/","example2"
"/foobar","example3"

We now take the whole input and deconstruct the associative array (map) using .[] and then put it back into a simple array […]. This basically converts our dictionary to an array of keys. Sent to the @csv formatter, we again get some CSV.

Putting it all together we get a single one-liner in the form of:

% (head -1 data.json | jq -r 'keys | @csv' && jq -r '[.[]] | @csv' < data.json) > data.csv

If you need to convert the data on the fly, i.e. without a file, try this:

% cat data.json | (read -r first && jq -r '(keys | @csv),( [.[]] | @csv)' <<<"${first}" && jq -r '[.[]] | @csv')

Loading it into SQLite

Open an SQLite database:

sqlite3 somedb.sqlite

Now in the interactive shell do the following (assuming you wrote the CSV to data.csv and want it in a table called my_table):

.mode csv
.import data.csv my_table

Now close the shell and open it again for a clean environment. You can now easily SELECT from the database and do whatever you want to.

Putting it all together

Have an asciinema recording right there:

asciicast

Tags:

Sqlite

Json