Reading large JSON file in Deno

I think that a package like stream-json would be as useful on Deno as it is on NodeJs, so one way to go might surely be to grab the source code of that package and make it work on Deno. (And this answer will be outdated soon, because there are lots of people out there who do such things and it won't take long until someone – maybe you – makes their result public and importable into any Deno script.)

Alternatively, although this doesn't directly answer your question, a common pattern to treat large data sets of Json data is to have files which contain Json objects separated by newlines. (One Json object per line.) For example, Hadoop and Spark, AWS S3 select, and probably many others use this format. If you can get your input data in that format, that might help you to use a lot more tools. Also you could then stream the data with the readString('\n') method in Deno's standard library: https://github.com/denoland/deno_std/blob/master/io/bufio.ts

Has the additional advantage of less dependency on third-party packages. Example code:

    import { BufReader } from "https://deno.land/std/io/bufio.ts";

    async function stream_file(filename: string) {
        const file = await Deno.open(filename);
        const bufReader = new BufReader(file);
        console.log('Reading data...');
        let line: string;
        let lineCount: number = 0;
        while ((line = await bufReader.readString('\n')) != Deno.EOF) {
            lineCount++;
            // do something with `line`.
        }
        file.close();
        console.log(`${lineCount} lines read.`)
    }

this is the code I used for a file with 13,147,089 lines of text. Notice it's same as Roberts's code but used readLine() instead of readString('\n'). readLine() is a low-level line-reading primitive. Most callers should use readString('\n') instead or use a Scanner.`

import { BufReader } from "https://deno.land/std/io/bufio.ts";

export async function stream_file(filename: string) {
  const file = await Deno.open(filename);
  const bufReader = new BufReader(file);
  console.log("Reading data...");
  let line: string | any;
  let lineCount: number = 0;
  while ((line = await bufReader.readLine()) != Deno.EOF) {
    lineCount++;
    // do something with `line`.
  }
  file.close();
  console.log(`${lineCount} lines read.`);
}

Circling back on this now that Deno 1.0 is out and in case anyone else is interested in doing something like this. I was able to piece together a small class that works for my use case. It's not nearly as robust as something like the stream-json package but it handles large JSON arrays just fine.

import { EventEmitter } from "https://deno.land/std/node/events.ts";

export class JSONStream extends EventEmitter {

    private openBraceCount = 0;
    private tempUint8Array: number[] = [];
    private decoder = new TextDecoder();

    constructor (private filepath: string) {
        super();
        this.stream();
    }

    async stream() {
        console.time("Run Time");
        let file = await Deno.open(this.filepath);
        //creates iterator from reader, default buffer size is 32kb
        for await (const buffer of Deno.iter(file)) {

            for (let i = 0, len = buffer.length; i < len; i++) {
                const uint8 = buffer[ i ];

                //remove whitespace
                if (uint8 === 10 || uint8 === 13 || uint8 === 32) continue;

                //open brace
                if (uint8 === 123) {
                    if (this.openBraceCount === 0) this.tempUint8Array = [];
                    this.openBraceCount++;
                };

                this.tempUint8Array.push(uint8);

                //close brace
                if (uint8 === 125) {
                    this.openBraceCount--;
                    if (this.openBraceCount === 0) {
                        const uint8Ary = new Uint8Array(this.tempUint8Array);
                        const jsonString = this.decoder.decode(uint8Ary);
                        const object = JSON.parse(jsonString);
                        this.emit('object', object);
                    }
                };
            };
        }
        file.close();
        console.timeEnd("Run Time");
    }
}

Example usage

const stream = new JSONStream('test.json');

stream.on('object', (object: any) => {
    // do something with each object
});

Processing a ~4.8 MB json file with ~20,000 small objects in it

[
    {
      "id": 1,
      "title": "in voluptate sit officia non nesciunt quis",
      "urls": {
         "main": "https://www.placeholder.com/600/1b9d08",
         "thumbnail": "https://www.placeholder.com/150/1b9d08"
      }
    },
    {
      "id": 2,
      "title": "error quasi sunt cupiditate voluptate ea odit beatae",
      "urls": {
          "main": "https://www.placeholder.com/600/1b9d08",
          "thumbnail": "https://www.placeholder.com/150/1b9d08"
      }
    }
    ...
]

Took 127 ms.

❯ deno run -A parser.ts
Run Time: 127ms

Tags:

Deno