← All articles

How to Compare Two JSON Files: Algorithms and Tools

A plain text diff misses key-reordering and whitespace noise. Learn how a proper JSON diff works: LCS line diffing, semantic tree comparison, key normalisation, and the tradeoffs of each approach.

Two JSON responses look almost identical, but something changed between deploys and your tests are failing. Or you're reviewing a pull request that touches a large config file and you need to know exactly which values shifted. Comparing JSON files sounds simple — until you realise that a naive text diff treats key-reordering as a change, and that deeply nested structures need a smarter approach. This article explains the algorithms that make a proper JSON diff work.

Why a Plain Text Diff Falls Short

The simplest way to compare two files is to run diff or compare them line-by-line. For plain text this works well. For JSON it breaks in at least two ways.

Problem 1: Key order

JSON objects are unordered by specification. The following two documents are semantically identical:

// Document A
{ "name": "Ada", "plan": "pro", "active": true }

// Document B
{ "active": true, "name": "Ada", "plan": "pro" }

A line-by-line text diff would report every line as changed. A proper JSON diff reports zero differences.

Problem 2: Formatting noise

Indentation, whitespace, and whether a long array is written on one line or many lines are all irrelevant to the data. A text diff treats every whitespace difference as a change.

The solution to both problems is the same: parse first, diff the structure, not the text.

Step 1 — Parse and Normalise

Before doing any comparison, both documents are parsed into JavaScript objects with JSON.parse() and then re-serialised with sorted keys and consistent indentation:

function normalise(json) {
  const value = JSON.parse(json);
  return JSON.stringify(sortKeys(value), null, 2);
}

function sortKeys(value) {
  if (Array.isArray(value)) return value.map(sortKeys);
  if (value !== null && typeof value === 'object') {
    return Object.keys(value)
      .sort((a, b) => a.localeCompare(b))
      .reduce((acc, key) => {
        acc[key] = sortKeys(value[key]);
        return acc;
      }, {});
  }
  return value;
}

After normalisation, documents A and B above both produce the same string:

{
  "active": true,
  "name": "Ada",
  "plan": "pro"
}

Now a text diff between the two normalised strings will only highlight actual data differences, not formatting or key-ordering noise.

Step 2 — Line-Level Diff with LCS

With normalised text, the problem becomes: given two sequences of lines, find the minimal set of changes (insertions and deletions) that transforms one into the other. This is the classic Longest Common Subsequence (LCS) problem.

What is LCS?

The LCS of two sequences is the longest subsequence that appears in both, in the same order, without necessarily being contiguous. For example:

Before: ["  active: true", "  name: Ada",   "  plan: pro" ]
After:  ["  active: true", "  name: Ada",   "  plan: team"]

LCS:    ["  active: true", "  name: Ada"]   // 2 lines in common

// Result:
//   same:    "  active: true"
//   same:    "  name: Ada"
//   deleted: "  plan: pro"
//   added:   "  plan: team"

The LCS gives us exactly the diff we want: lines that stayed the same, lines that were deleted, and lines that were added.

The DP algorithm

LCS is solved with dynamic programming. For two arrays of length m and n, we build a 2D table where dp[i][j] is the length of the LCS of the first i elements of the before-array and the first j elements of the after-array:

// Fill the DP table (bottom-up)
for (let i = m - 1; i >= 0; i--) {
  for (let j = n - 1; j >= 0; j--) {
    dp[i][j] = before[i] === after[j]
      ? dp[i + 1][j + 1] + 1                    // lines match
      : Math.max(dp[i + 1][j], dp[i][j + 1]);   // take the better branch
  }
}

// Trace back through the table to recover the diff operations
let i = 0, j = 0;
while (i < m && j < n) {
  if (before[i] === after[j]) {
    ops.push({ type: 'same', line: before[i++] }); j++;
  } else if (dp[i + 1][j] >= dp[i][j + 1]) {
    ops.push({ type: 'del', line: before[i++] });
  } else {
    ops.push({ type: 'add', line: after[j++] });
  }
}

Time complexity: O(m × n). Space complexity: O(m × n). For typical JSON documents this is fast enough. For very large documents (say, more than 2000 lines each), the table can consume significant memory — at that scale it's worth switching to Myers' diff algorithm, which runs in O(n + d²) time where d is the number of differences.

Memory optimisation: typed arrays

A plain JavaScript 2D array of numbers has substantial overhead per element. Using a flat Int32Array reduces memory by roughly 8× and improves cache locality:

const W  = n + 1;
const dp = new Int32Array((m + 1) * W); // flat buffer

// Access dp[i][j] as dp[i * W + j]
dp[i * W + j] = before[i] === after[j]
  ? dp[(i + 1) * W + (j + 1)] + 1
  : Math.max(dp[(i + 1) * W + j], dp[i * W + (j + 1)]);

For a 500-line document this brings the table from ~2 MB of heap objects down to ~1 MB of contiguous typed memory.

Step 3 — Pair Adjacent Changes into "Modified" Rows

The raw LCS output only knows about deletions and additions. But in a side-by-side diff view, it looks much better to show a deleted line paired with the added line that replaced it — a "modified" row:

// Raw ops from LCS:
del  "  plan: pro"
add  "  plan: team"

// After pairing:
modified  left: "  plan: pro"   right: "  plan: team"

The pairing algorithm collects consecutive blocks of del and add operations and zips them together. Unmatched deletions get an empty placeholder on the right; unmatched additions get an empty placeholder on the left:

// Collect a consecutive del/add block
const dels = [], adds = [];
while (ops[i].type !== 'same') {
  if (ops[i].type === 'del') dels.push(ops[i].line);
  else                        adds.push(ops[i].line);
  i++;
}

// Zip into modified pairs
const pairs = Math.min(dels.length, adds.length);
for (let k = 0; k < pairs; k++)
  rows.push({ type: 'modified', left: dels[k], right: adds[k] });

// Remaining unmatched deletions
for (let k = pairs; k < dels.length; k++)
  rows.push({ type: 'deleted', left: dels[k] });

// Remaining unmatched additions
for (let k = pairs; k < adds.length; k++)
  rows.push({ type: 'added', right: adds[k] });

Step 4 — Semantic Diff for the Summary Counts

The line diff tells you what text changed. A semantic diff tells you which fields changed and how — useful for a summary like "3 added, 1 removed, 2 changed".

The semantic diff recursively walks both parsed objects simultaneously:

function diffValue(key, path, before, after) {
  if (before === undefined) return { status: 'added',   after  };
  if (after  === undefined) return { status: 'removed', before };

  if (isObject(before) && isObject(after)) {
    const keys     = union(Object.keys(before), Object.keys(after)).sort();
    const children = keys.map(k =>
      diffValue(k, path + '.' + k, before[k], after[k])
    );
    return {
      status: children.every(c => c.status === 'unchanged') ? 'unchanged' : 'changed',
      children,
    };
  }

  if (Array.isArray(before) && Array.isArray(after)) {
    const len      = Math.max(before.length, after.length);
    const children = Array.from({ length: len }, (_, i) =>
      diffValue(i, path + '[' + i + ']', before[i], after[i])
    );
    return {
      status: children.every(c => c.status === 'unchanged') ? 'unchanged' : 'changed',
      children,
    };
  }

  return deepEqual(before, after)
    ? { status: 'unchanged', before, after }
    : { status: 'changed',   before, after };
}

Walking the resulting tree and counting leaf nodes by status gives the summary numbers. Unlike the line diff, the semantic diff is key-order-agnostic by construction — it compares values at the same key path, regardless of the order they appear in the JSON.

Putting It All Together

A complete JSON diff tool chains these steps:

function jsonDiff(leftText, rightText) {
  // 1. Parse
  const leftVal  = JSON.parse(leftText);
  const rightVal = JSON.parse(rightText);

  // 2. Semantic diff → summary counts
  const summary = summarise(diffValue('root', '
#x27;, leftVal, rightVal)); // 3. Normalise (sort keys, consistent indent) const leftNorm = JSON.stringify(sortKeys(leftVal), null, 2); const rightNorm = JSON.stringify(sortKeys(rightVal), null, 2); // 4. LCS line diff const rows = buildLineDiff(leftNorm, rightNorm); return { summary, rows }; }

Deep-Equal vs Structural Diff

A common shortcut is to compare two JSON documents with deepEqual (Node's assert.deepStrictEqual, Lodash's isEqual) — quick, but boolean-only. It tells you that the documents differ; it doesn't tell you where or how.

A structural diff walks both trees and produces a report (counts, paths, before/after values) and optionally a JSON Patch. Use deepEqual when the answer is yes/no — in tests, in cache invalidation. Use a structural diff when a human needs to act on the differences.

Producing a JSON Patch From the Diff

Once you have a semantic diff tree, the same walk that counts added / removed / changed nodes can emit a JSON Patch (RFC 6902) — a portable list of add / remove / replace operations that turns the "before" document into the "after." That's how you take a diff and ship it as the body of an HTTP PATCH request, or apply it later somewhere else.

function toJsonPatch(node, pointer = '') {
  const ops = [];
  if (node.status === 'added')   ops.push({ op: 'add',     path: pointer, value: node.after });
  if (node.status === 'removed') ops.push({ op: 'remove',  path: pointer });
  if (node.status === 'changed' && !node.children) {
    ops.push({ op: 'replace', path: pointer, value: node.after });
  }
  for (const child of node.children ?? []) {
    const seg = String(child.key).replace(/~/g, '~0').replace(/\//g, '~1');
    ops.push(...toJsonPatch(child, pointer + '/' + seg));
  }
  return ops;
}

The path segments follow JSON Pointer (RFC 6901) rules — note the ~0 / ~1 escapes for ~ and / in keys. Libraries like fast-json-patch generate the same thing if you'd rather not roll your own. When the patch is sent over HTTP, use the application/json-patch+json content type; for the simpler "overlay" alternative where null means delete, see JSON Patch vs JSON Merge Patch.

Edge Cases Worth Knowing

  • Array element reordering — the semantic diff compares arrays positionally ([0] vs [0]), so if an array is sorted differently between versions, every element may appear "changed" even if the data is the same. Handling this well requires array-level LCS, which adds significant complexity.
  • Very large documents — the O(m × n) LCS table can exhaust memory for documents with tens of thousands of lines. A practical heuristic: if m × n exceeds a threshold (say, 2 million), fall back to displaying all lines as changed rather than computing the full diff.
  • Numeric precisionJSON.parse() converts all numbers to IEEE 754 doubles. Very large integers (beyond 2⁵³) lose precision silently, so two documents that differ only in the last digits of a large integer may compare as equal after parsing.

Frequently Asked Questions

How do I compare two JSON files?

Parse both, normalise them (sort keys, consistent indentation), then run a structural diff rather than a plain text diff — or paste both into JSON Diff, which does this and shows a colour-coded side-by-side view.

Why doesn't a plain text diff work for JSON?

Because JSON objects are unordered and whitespace is insignificant. A text diff flags re-ordered keys and reformatting as changes; a JSON-aware diff ignores both and reports only real data differences.

What's a semantic JSON diff?

One that walks both parsed structures and compares values at the same key path regardless of order, producing counts like "3 added, 1 removed, 2 changed." It's key-order-agnostic by construction.

Can comparing JSON lose numeric precision?

Yes — JSON.parse() converts numbers to IEEE 754 doubles, so integers beyond 2⁵³ can compare as equal even when their last digits differ. The canonicalization rules in RFC 8785 address related issues.

Try the JSON Diff Tool

JSON Diff on fixjson.org implements everything described above: it parses both documents, normalises with sorted keys, runs the LCS line diff, and displays a side-by-side view with colour-coded additions, deletions, and modifications — plus a summary row showing added, removed, changed, and unchanged field counts. It also supports YAML documents. Everything runs in your browser; no data is sent to any server.