A Lightning Fast JSON Parser Library

Written by Nikos Vaggalis

Thursday, 14 October 2021

simdjson is a C++ library that can parse JSON documents very fast. Version 1. 0 has been just released. How does it compare?

Does parsing 3 gigabytes of JSON per second sound fast enough?
This library achieves it. In last year's benchmark against the fastest standard compliant C++ JSON parsers, RapidJSON and sajson, smidjson by far outperformed them. It can parse 4x faster than RapidJSON and 25x faster than Modern C++.

This efficiency is mainly achieved due to the library under the hood using SIMD instructions, which excel at data level parallelism by fitting operations many times over per instruction, even under a single core.

Parsing fast is not only applicable to huge dumps of JSON, which the related benchmarks are applied upon. The same speed increase is experienced when parsing millions of small JSON documents per second. That aside, it can also minify JSON files by stripping spaces, tabs, newlines, and carriage returns therefore saving great amounts of space.

Version 1. 0 offers a new, "On Demand", frontend in addition to the standard DOM-based one, which is flagged as the default from now on.

What's the difference, you ask?

In the DOM-based approach, the document is parsed entirely and materialized in an in-memory construction. The On Demand approach feels like a DOM approach, but it sidesteps the construction of the DOM tree. It is entirely lazy: it decodes only the parts of the document that you access.

And what is the actual advantage in that?

With On Demand, if you open a file containing 1000 numbers and you need just one of these numbers, only one number is parsed. If you need to put the numbers into your own data structure, they are materialized there directly, without being first written to a temporary tree.

Thus we expect that the simdjson On Demand might often provide superior performance, when you do not need to materialize a DOM tree

Other than that, release 1. 0. 0 adds several other key features:

In big data analytics, it is common to serialize large sets of records as multiple JSON documents separated by while spaces. You can now get the benefits of On Demand while parsing almost infinitely long streams of JSON records. At each step, you have access to the current document, but a secondary thread indexes the following block. You can thus access enormous files while using a small amount of memory and achieve record-breaking speeds.
In some cases, JSON documents contain numbers embedded within strings (e. g. , "3. 1416"). You can access these numbers directly using methods such as get_double_in_string().
Given an On Demand instance (value, array, object, etc. ), you can now convert it to a JSON string using the to_json_string method which returns a string view in the original document for unbeatable speeds.
The On Demand front-end now supports the JSON Pointer specification. You can request a specific value using a JSON Pointer within a large document.
Arrays in On Demand now have a count_elements() method. Objects have a count_fields() method. Arrays and objects have a reset method for when you need to iterate through them more than once. Document instances now have a rewind method in case you need to process the same document multiple times.

As a simple example this is how you use the library to parse under the DOM based frontend:

You might think that since it is a C++ lib that only devs writing in C++ are benefited. This is not true as there's already bindings for other languages like Go, Ruby, Python and more, while there's even a port for PostgreSQL too in pg_simdjson.

ZippyJSON: Swift bindings for the simdjson project.
libpy_simdjson: high-speed Python bindings for simdjson using libpy.
pysimdjson: Python bindings for the simdjson project.
cysimdjson: high-speed Python bindings for the simdjson project.
simdjson-rs: Rust port.
simdjson-rust: Rust wrapper (bindings).
SimdJsonSharp: C# version for .NET Core (bindings and full port).
simdjson_nodejs: Node.js bindings for the simdjson project.
simdjson_php: PHP bindings for the simdjson project.
simdjson_ruby: Ruby bindings for the simdjson project.
fast_jsonparser: Ruby bindings for the simdjson project.
simdjson-go: Go port using Golang assembly.
rcppsimdjson: R bindings.
simdjson_erlang: erlang bindings.
lua-simdjson: lua bindings.

Therefore, no matter what language you are writing code in, you can still leverage simdjson's advantages.

More Information

simdjson: Parsing gigabytes of JSON per second

Google JavaScript Engine Speeds JSON Parsing

Emacs 27.1 Adds Native JSON Parsing

Chrome To Support Simd.js

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Google Releases Python Client For Data Commons
01/07/2025

Google has released a new Python client library for Data Commons based on the V2 REST API. They say the new library enhances how data developers can make use of Data Commons.

+ Full Story

Coursera Plus - Your Ticket To Success
18/07/2025

At the moment Coursera Plus has a special offer for new subscribers. If you love learning new skills or keeping your existing skills up to date, it's worth your immediate attention.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Saturday, 16 October 2021 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments