speed of parsing json structures

It's definitely much, much slower than MySQL (for a server) or SQLite (for a client) which are preferrable.

Also, JSON speed depends almost solely on the implementation. For instance, you could eval() it, but not only that is very risky, it's also slower than a real parser. At any rate, there are probably much better optimized XML parsers than JSON parsers, just because it's a more used format. (So grab a GB-sized XML and imagine the same results but slower).

Seriously, JSON was never meant for big things. Use a real database if possible.

Edit: why is JSON much slower than a database?

Many reasons. I'll try to list a few.

  • JSON relies on matching sections such as {}s (much like XML's <>s)

This means a parser has to check where's the ending to an object block. There are other of these such as []s and ""s. In a conventional database there's no "ending tag" or "ending bracket" so it's easier to read.

  • JSON parsers need to read each and every character before being able to understand the whole object structure.

So before you can even read some of the JSON you have to read the whole file. This means waiting a few minutes at best for the sizes you mentioned, and a database is ready to be queried in less than a second (because the hierarchy is stored at the beginning).

  • In JSON you can't precalculate offsets.

In a database, size is traded for performance. You can make VARCHAR(512) and all strings will be null-padded to occupy 512 bytes. Why? Because that way you can know the 4th value is at offset 2048 for example. You can't do that with JSON hence performance suffers.

  • JSON is optimized for small filesizes.

...Because it's a web format.
This may look like a pro but it's a con from a performance perspective.

  • JSON is a JavaScript subset.

So some parsers might allow unnecessary data to be present and considered, such as comments. Chrome's native JSON used to allow comments for example (not anymore).
No database engine uses eval() right?

  • JSON is meant to have some error resilience.

People might put anything into a JSON file, so parsers are defensive and try to read invalid files sometimes. Database aren't supposed to repair a broken file silently.
You might hand-code a JSON but not a database!

  • JSON is a new, unsupported and badly tested format

There are bugs in some native parsers (like IE8's) and support for most browsers is very preliminary and slower than, say, the fastest XML parser out there. Simply because XML was being used for ages and Steve Ballmer has an XML fetish so companies please him by making almost anything under the sun XML-compatible. While JSON is one of Crockford's successful weekend pasttimes.

  • The best JSON parsers are in browsers

If you pick one random open-source JSON parser for your favourite language, what chances are that it's the best possible parser under the sun? Well, for XML you do have awesome parsers like this But what is there for JSON?

Need more reasons why JSON should be relegated to its intended use case?


Benchmarks of JSON, XML, and lots of other things can be found in the JVM Serializers project. The results are too complicated to reproduce here, but the best JSON results (comparing both manual and databound classes) are quite a bit better than the best XML results. That comparison isn't complete, but it's a starting point.

EDIT: as of right now (2012-10-30), there are no published results, because the benchmark is being revised. However, there are some preliminary results available.


If you consider JSON as an intermediate format for data transfer, you might want to consider binary alternatives as well, because they need less disk space and network bandwidth (both compressed and uncompressed), so you may get faster parsing because the input to parse is shorter.

  • MessagePack
  • BSON (binary JSON)
  • Google Protocol Buffers
  • Apache & Facebook Thrift
  • Python Pickle implemented as the `cPickle' module with the highest version
  • Python Marshal (very fast but architecture- and version-dependent)

If you run your own benchmark, make sure to benchmark multiple parsers for the same language, e.g. a JSON parser implemented in pure Python is expected to be much slower than a JSON parser written in C -- but you may find a significant speed difference (up to a factor of 2, but maybe 5) between different implementations in the same programming language as well.