Python JSON benchmarking and correectness.

Overview

json_benchmark

This repository contains benchmarks for Python JSON readers & writers. What's the fastest Python JSON parser? Let's find out.

To run the tests yourself:

git clone [email protected]:TkTech/json_benchmark.git && cd json_benchmark
<setup a virtualenv using your tool of choice>
pip install -r requirements.txt
pytest

Candidate Libraries

The following libraries are the current candidates for benchmarking. Feel free to request or add new libraries.

Library Reader Writer Version
simdjson Yes No 4.0.3
cysimdjson Yes No 21.11
yyjson Yes Yes 1.0.0
orjson Yes Yes 3.6.7
rapidjson Yes Yes 1.6
ujson Yes Yes 5.2.0
json Yes Yes 3.10

Correctness

It doesn't matter how fast a JSON parser is if it's not going to give you the correct results. We run the JSON minefield against each library. To see the complete line-by-line results, see the <minefield_reports/> directory.

yyjson

count result
283 🎉 expected result
0 🔥 parsing should have failed but succeeded
0 🔥 parsing should have succeeded but failed
6 ➕ result undefined, parsing succeeded
29 ➖ result undefined, parsing failed

rapidjson

count result
277 🎉 expected result
6 🔥 parsing should have failed but succeeded
0 ?? parsing should have succeeded but failed
6 ➕ result undefined, parsing succeeded
29 ➖ result undefined, parsing failed

orjson

count result
283 🎉 expected result
0 🔥 parsing should have failed but succeeded
0 🔥 parsing should have succeeded but failed
5 ➕ result undefined, parsing succeeded
30 ➖ result undefined, parsing failed

simdjson

count result
283 🎉 expected result
0 🔥 parsing should have failed but succeeded
0 🔥 parsing should have succeeded but failed
8 ➕ result undefined, parsing succeeded
27 ➖ result undefined, parsing failed

ujson

count result
257 🎉 expected result
26 🔥 parsing should have failed but succeeded
0 🔥 parsing should have succeeded but failed
16 ➕ result undefined, parsing succeeded
19 ➖ result undefined, parsing failed

Performance

Complete load of data/canada.json

Sample file is 2251051 bytes.

library min (ms) max (ms) mean (ms)
yyjson 5.2855 14.6124 9.5136
simdjson 5.7374 14.3674 9.5140
cysimdjson 5.9193 10.7559 8.2352
orjson 8.4625 18.6653 11.8414
ujson 10.1251 18.8607 14.3548
json 23.1732 30.2110 26.2201
rapidjson 25.4896 34.1050 29.1916

Complete load of data/citm_catalog.json

Sample file is 1727030 bytes.

library min (ms) max (ms) mean (ms)
simdjson 3.0496 10.2288 4.5276
yyjson 3.0725 12.9260 4.7278
cysimdjson 3.2350 9.2566 4.6005
orjson 3.2869 8.9607 4.3607
rapidjson 4.8654 10.7684 6.0624
ujson 4.9233 12.8696 6.5945
json 5.2351 9.3376 6.1055

Complete load of data/twitter.json

Sample file is 567916 bytes.

library min (ms) max (ms) mean (ms)
orjson 1.3714 6.0291 1.4893
yyjson 1.3791 9.2868 1.5647
simdjson 1.3860 8.8234 1.5560
cysimdjson 1.3954 6.9549 1.6465
rapidjson 2.1363 8.4668 2.2876
ujson 2.2646 9.1984 2.4557
json 2.2850 6.2415 2.3875

Complete load of data/verysmall.json

Sample file is 7 bytes.

library min (ms) max (ms) mean (ms)
orjson 0.0002 0.0028 0.0002
ujson 0.0002 0.0003 0.0002
yyjson 0.0002 0.0007 0.0002
rapidjson 0.0003 0.0009 0.0003
cysimdjson 0.0004 0.0010 0.0004
simdjson 0.0004 0.0011 0.0005
json 0.0010 0.0115 0.0011

FAQ

Why doesn't this run benchmarks using Github Actions?

Generally, unless you control the CI runners with self-hosted boxes (which are unsafe on public Github projects!), you have no idea what machine you're going to get, or how many other jobs may be running on the same machine. This can cause the benchmarks to vary drastically between runs, even minutes apart. For this reason benchmarks are always run locally on a consistent machine, in a consistent state.

What machine is used for the x64 benchmarks?

x64 tests are run on an AMD Ryzen 7 5800X, capped at base clock speed with 64GB of Corsair CMW32GX4M2E3200C16.

JSON is terrible, use X!

This repository isn't for arguing the pros or cons of JSON versus some other exchange format. You frequently have no choice but to work with JSON, and if you can read or write responses 15% faster, that means you can handle more requests per second with the same hardware.

Comments
  • Update msgspec dependency

    Update msgspec dependency

    Bumps msgspec version to recently released 0.7.0.

    I didn't re-render the benchmarks, since I think you want to do that on your own machine, but did check that things ran correctly.

    opened by jcrist 4
  • Benchmark results are out of date

    Benchmark results are out of date

    In #3 the version of msgspec in these benchmarks was updated, but the results are still showing for an older version. There's also a new-ish version of orjson out with some noticeable performance improvements (it's now mostly a thin wrapper around yyjson). It'd be good to get accurate results up.

    Perhaps adding a "last run" date near the top of the report would be helpful in showing how up-to-date the results are?

    opened by jcrist 2
  • Add `msgspec` to the benchmarks

    Add `msgspec` to the benchmarks

    First, thanks for setting up a nice repo for benchmarking JSON + Python performance. I stumbled upon this while looking at pysimdjson (from doing a quick benchmark myself), and thought I'd push up a commit. Feel free to ignore, I recognize that this benchmarks repo is very new :).

    This adds msgspec (https://github.com/jcrist/msgspec) to the list of repos measured. I also modified the scripts a bit to work on python 3.9 as well (happy to revert if needed). A few assorted comments:

    • msgspec doesn't do that well in these tests, because it's mainly optimized for loading json where you already know the schema (a common case for services and apis). I'm not sure if expanding the benchmarks to include another row for msgspec + schema would make sense. With a schema it's much more performant.
    • msgspec only supports decoding from a bytes-like object, str is not supported. I utf-8 encode the content as part of the benchmark, AFAICT there should be no harm in that.

    Anyway, thanks for starting on this work. If you'd like additional help in any way I'd be happy to contribute.

    opened by jcrist 2
  • Add Streaming Parser Benchmarks

    Add Streaming Parser Benchmarks

    This benchmark should take a medium-sized test document, and benchmark the time to emit all events from start to finish, with full recursion.

    • [ ] json-stream
    • [ ] ijson
    enhancement good first issue 
    opened by TkTech 0
  • Expanding JSON benchmarks

    Expanding JSON benchmarks

    I've started work on what is turning out to be a fairly intensive effort to generate high quality quantitative json benchmarks with graphs, tables, and multiple statistical analyses.

    I'm currently working on the code as a PR to ujson: https://github.com/ultrajson/ultrajson/pull/542 but its getting involved so it might make sense to target it as a standalone repo.

    Adding simdjson to my current benchmarks shows an interesting (and expected) result: the value of SIMD kicks in when the data gets large:

    image

    (Notice the red line in the left "loads" graph)

    For dumps, because pysimdjson does not wrap the "dumps" and just uses "json.dumps", we get a line that maps almost exactly on top of the "json" line.

    I'm writing this issue as an attempt to gauge interest in integrating my work here instead of "ujson" itself (or making yet another json benchmarking repo).

    opened by Erotemic 1
Owner
Tyler Kennedy
Tyler Kennedy
Minimal Python server that takes a JSON input and returns a JSON

Minimal Python server that takes a JSON input and returns a JSON. Useful to create a communication interface of a Python application with applications in other platforms and languages.

Henrique Emanoel Viana 1 Mar 23, 2022
This is a JSON editor which can new, load, save and edit JSON files, in both code browser and content table views.

JSON_Editor_Py Watch the video demo: https://youtu.be/BbvcT_QJDS8 This is a JSON editor which can new, load, save and edit JSON files, in both code br

null 1 Apr 29, 2022
Simple Json File Descriptor built using Python that can describe simple or nested JSON file.

Json File Descriptor in Python This is a simple json file descriptor. It can describe the the simple and nested json file. Basic Usage from dot_json i

Dipak Niroula 1 Aug 29, 2022
A script to cover value with same key from source.json to target.json

JSON键值覆盖器 (自动抄值脚本) 用于把 source.json 的值覆盖到 target.json 的同名键上,非同名键所对应的值不进行操作。 人话翻译:这脚本帮你把source.json里面的键值对,拿去target.json里面搜一下看看能不能找到同名的键名,找到了就把source.jso

null 6 Sep 3, 2022
JSON, but with a lisp. Some stuff to make JSON easier to work with.

Jthon This is a utility to make working with JSON files easier. Installation pip install jthon Usage import jthon a_new_dict = { 'fruits': {

null 1 Sep 5, 2022
Converter JSONLZ4 to JSON. Script converts Firefox backup profile data file from .jsonlz4 format to JSON format.

CLI converter jsonlz4 to json Description This command line tool converts Firefox user profile data backup file from .jsonlz4 format to JSON format. W

null 2 Sep 18, 2022
A simple benchmarking SEO tool written in Python.

SEO-Project A simple benchmarking SEO tool written in Python. The program extracts and compares the values of the sites you enter for you (the data is

Xrypt0 2 Nov 18, 2022
Benchmarking for dot-accessible dict packages in python

dotdict-bench Benchmarking for dot-accessible dict packages in python More test ideas? Submit an issue! Package Information As of 2022-09-21 23:11:19.

null 1 Sep 22, 2022
Accelerated, Python-only, single-cell integration benchmarking metrics

scib-metrics Accelerated and Python-only metrics for benchmarking single-cell integration outputs. This package contains implementations of metrics fo

Yosef Lab 6 Nov 14, 2022
[ICML'22] Benchmarking and Analyzing Point Cloud Robustness under Corruptions

Benchmarking and Analyzing Point Cloud Robustness under Corruptions Jiawei Ren, Lingdong Kong, Liang Pan, Ziwei Liu S-Lab, Nanyang Technological Unive

Lingdong Kong 76 Nov 18, 2022
Multi-Modal Lidar Dataset for Benchmarking General-Purpose Localization and Mapping Algorithms

Multi-Modal Lidar Dataset for Benchmarking General-Purpose Localization and Mapping Algorithms (Left) Front view of the multi-modal data acquisition s

TIERS 79 Nov 2, 2022
Source code for the ECCV 2022 paper "Benchmarking Localization and Mapping for Augmented Reality".

The LaMAR Benchmark for Localization and Mapping in Augmented Reality This repository hosts the source code for our upcoming ECCV 2022 paper: LaMAR: B

Microsoft 205 Nov 24, 2022
Home of the PaRoutes framework for benchmarking multi-step retrosynthesis predictions.

PaRoutes is a framework for benchmarking multi-step retrosynthesis methods, i.e. route predictions. It provides: A curated reaction dataset for buildi

AstraZeneca - Molecular AI 45 Nov 25, 2022
Benchmarking toolkit for patch-based histopathology image classification.

ChampKit ChampKit: Comprehensive Histopathology Assessment of Model Predictions toolKit. A benchmarking toolkit for patch-based image classification i

Jakub Kaczmarzyk 18 Nov 3, 2022
Benchmarking Panoptic Scene Graph Generation (PSG), ECCV'22

Panoptic Scene Graph Generation Panoptic Scene Graph Generation Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, Ziwei Liu S-Lab, Nany

Jingkang Yang 200 Nov 25, 2022
A tool for benchmarking image generation models.

??️ Dream Bench ?? What does it do? dream_bench provides a simplified interface for benchmarking your image-generation models. This repository also ho

zion 16 Nov 25, 2022
Benchmarking end-to-end SAT solvers.

SAT Benchmark Introduction SAT Benchmark (satb) is a PyTorch implementation of a collection of published end-to-end SAT solvers with the ability to re

null 4 Nov 20, 2022
Simple traffic generator for benchmarking your FT8 hardware/software

ft8_benchmark -- A simple traffic generator for benchmarking your FT8 hardware/software --WIP-- Work in progress, this proto needs some love... TL;DR

Guenael, VA2GKA 1 Sep 24, 2022
MEDFAIR: Benchmarking Fairness for Medical Imaging

MEDFAIR: Benchmarking Fairness for Medical Imaging MEDFAIR is a fairness benchmarking suite for medical imaging (paper). We are actively updating this

Yongshuo Zong 6 Nov 10, 2022