← Back to News
Tools

Tree-sitter's JSON Grammar: How Editors Read JSON Differently from JSON.parse

Tree-sitter's JSON Grammar: How Editors Read JSON Differently from JSON.parse

What Tree-sitter Is

Tree-sitter is an incremental parser generator originally written for the Atom editor and now used by Neovim, Helix, Emacs, GitHub's blob viewer, and the Language Server Protocol implementations for dozens of languages. When your editor highlights JSON syntax, folds a long array, or jumps between matching braces, it almost certainly does so by running a Tree-sitter grammar over the file.

The reference JSON grammar lives at tree-sitter/tree-sitter-json and is maintained as part of the Tree-sitter organization. It defines the shape of a JSON document as a context-free grammar that produces a concrete syntax tree — useful for editors, but not the same thing as a strict RFC 8259 validator.

The JSON Grammar

The grammar is small enough to read in a single sitting. Top-level rule: document => _value. A _value is one of object,array, number, string, true, false, or null. Strings allow standard JSON escape sequences plus a hook for editor-specific extensions. Numbers follow the JSON number grammar — sign, integer part, optional fraction, optional exponent.

The grammar tolerates a few things that strict JSON does not, because Tree-sitter grammars are designed to keep producing a parse tree even when the input has minor errors (the goal is "good enough syntax highlighting while you're still typing"). That tolerance is the source of the discrepancies developers notice.

Where It Differs from JSON.parse

Three places where a file accepted by Tree-sitter's JSON grammar can be rejected by JSON.parse:

  • Error recovery. Tree-sitter inserts (ERROR) nodes around unexpected tokens and keeps parsing, so a JSON file with a trailing comma or a missing quote still shows highlighted siblings. The CST has the error marked, but the rest of the document looks parsed. JSON.parse throws on the first such token and returns nothing.
  • Comments. The tree-sitter-json grammar has an optional comment external token that some downstream consumers (JSONC editors, VS Code's JSON-with-comments mode) wire in. The same file pasted into a strict JSON parser fails immediately.
  • Trailing commas. Editor extensions frequently relax the array/object grammar to accept a trailing comma after the last item so that highlighting and folding still work mid-edit. Strict JSON forbids this.

The grammar itself is conservative on these — most differences come from the editor-specific configuration on top of it, not from the upstream grammar. Either way, the developer-visible effect is the same: the file looks fine in the editor, and JSON.parse throws.

Why This Matters for Parse Errors

When a developer reports "my JSON looks fine in VS Code but my Node script throwsUnexpected token", the most common explanations are:

  1. The editor is configured to read the file as JSONC, not strict JSON.
  2. The Tree-sitter grammar reported errors but the editor kept showing siblings highlighted.
  3. The file has a BOM or other invisible character at the top.

The fix is always the same: feed the file to a strict parser and read its error message. The position of the parser's error usually points within a few characters of the real problem. The JSON validator on this site reports the exact line and column; the parse-errors hub maps each common error string to a dedicated article.

Tree-sitter's grammar is a useful piece of infrastructure for editor tooling — but it's not a stand-in for an RFC 8259 validator. Treat what the editor renders as a hint, and what the strict parser says as the truth.

Sources

Related on fixjson.org