Research

"Parsing JSON is a Minefield 💣" — Nicolas Seriot

fixjson.org · October 2016

The Research

In October 2016, Swiss security researcher Nicolas Seriot published "Parsing JSON is a Minefield 💣", a comprehensive survey of JSON parser behaviour across 30+ implementations spanning C, C++, Objective-C, Python, Ruby, Java, JavaScript, Go, PHP, and others. The paper is among the most cited pieces of applied security research in the data interchange format space.

Seriot's methodology was to construct a large corpus of JSON test cases — valid inputs, invalid inputs, and edge cases in "grey zones" where the specification was ambiguous — and feed each to all tested parsers, recording which accepted, rejected, or crashed on each input. The divergences were striking.

Categories of Parser Disagreement

The study identified several recurring categories where parsers disagreed:

Number precision: A 64-bit float can represent integers exactly up to 2⁵³. Numbers larger than this are silently rounded in parsers backed by IEEE 754 doubles. Seriot found wide variation: some parsers promoted to BigDecimal or arbitrary precision, others rounded silently, others rejected large numbers entirely.
Duplicate keys: RFC 4627 (then-current) said implementations "SHOULD" not produce duplicate keys but gave no guidance on parsing them. Some parsers returned the first value, some the last, some merged them into a list, some threw exceptions.
Unicode handling: Lone surrogates (code points in the range U+D800 to U+DFFF) are illegal in UTF-8 but can appear in JSON strings as \uD800 escape sequences. Parsers differed wildly on whether to accept, reject, or transform them.
Control characters: Unescaped control characters (U+0000–U+001F) in strings are invalid JSON. Some parsers accepted them; others rejected only NUL.
Comments: JSON has no comment syntax, but several parsers accepted // ... or /* ... */ comments without any configuration flag.

Security Implications

The research's most consequential section demonstrates how parser disagreement translates directly to exploitable security vulnerabilities. The canonical attack pattern:

A payload is parsed by Parser A (a WAF, authentication middleware, or API gateway).
The same payload is then parsed by Parser B (the application backend).
A and B disagree on the meaning of the payload.
The attacker crafts a payload that appears safe to A but malicious to B.

For example: an authentication token containing duplicate role keys, where the gateway checks the first ("user") and the application reads the last ("admin"). The gateway allows the request; the application grants elevated access.

These issues were later documented in production by Bishop Fox's 2022 research and directly influenced the IETF JSONBIS working group charter.

The JSONTestSuite

The lasting contribution of Seriot's work is the JSONTestSuite on GitHub — a structured collection of test cases with expected outcomes (accept, reject, or undefined/implementation-defined). The test suite is actively maintained and used by JSON parser authors to verify conformance.

Each test case is a file containing a JSON fragment. The filename prefix indicates the expected result: y_ for must accept, n_ for must reject, i_ for implementation-defined. Running your parser against the suite and examining where it diverges from RFC 8259 requirements is one of the most effective ways to find parser bugs.

For a practical introduction to JSON's grammar and where it differs from JavaScript, see What Is JSON? For hands-on repair of the parser errors this research documents, online JSON fixers handle the most common variants in the browser.

The Research

Categories of Parser Disagreement

Security Implications

The JSONTestSuite

Sources

Related on fixjson.org