Base64

RFC 4648 — The Base64 Standard

fixjson.org · October 2006

Before RFC 4648

Base64 encoding predates its formal standardisation by decades. The encoding was informally defined as part of MIME (RFC 2045, 1996) for embedding binary content in email. But the same basic encoding was reinvented multiple times with slight variations: different line length limits, different handling of whitespace, different alphabet characters for the 62nd and 63rd values, and different rules for padding.

RFC 3548 (2003) was the first attempt to consolidate Base64 variants into a single document, but it had gaps and ambiguities. RFC 4648, published in October 2006, replaced RFC 3548 and became the definitive reference for all base-N encodings in the IETF context.

Standard Base64

Base64 encodes binary data as ASCII text using an alphabet of 64 characters: A–Z (26), a–z (26), 0–9 (10), + (62nd), and / (63rd). Every three bytes of input produce four characters of output, expanding the data size by approximately 33%.

When the input length is not a multiple of three, padding characters (=) are added to make the output a multiple of four characters. RFC 4648 defines the standard alphabet and padding rules precisely, resolving the ambiguity about whether whitespace in Base64 strings should be ignored or treated as an error.

Per RFC 4648, implementations should reject characters outside the alphabet by default (strict mode), though they may offer a permissive mode for legacy compatibility.

Base64url: The URL-Safe Variant

Standard Base64 uses + and /, which are special characters in URLs (query string delimiters and path separators). RFC 4648 Section 5 defines Base64url, which substitutes - for + and _ for /. Base64url strings can be included in URLs and HTTP headers without percent-encoding.

RFC 4648 also permits omitting padding in Base64url when the context makes the length unambiguous — a provision widely used in JWT and JOSE, where padding is always omitted. This reduces token length and avoids the = character, which is also special in URLs.

Base64url has become one of the most-used encodings in web development through its adoption in JSON Web Tokens, where all three token parts are Base64url-encoded. For a deeper look at a critical misconception, see Base64 Encoding Is Not Encryption.

Base32, Base16, and the Padding Question

RFC 4648 also standardises Base32 and Base16. Base32 uses a 32-character alphabet (A–Z and 2–7) and is used where case insensitivity or reduced ambiguity between visually similar characters (0/O, 1/I) matters — for example, in TOTP (Time-based One-Time Password) secret keys and Crockford's Base32 for human-readable identifiers.

Base16 is simply hexadecimal encoding: each byte maps to exactly two hex characters. RFC 4648 formalises it to remove ambiguity about case (the RFC specifies uppercase; most implementations accept both).

The padding question — whether = characters are required — was the source of significant interoperability friction before RFC 4648. The RFC's answer: padding is required in standard Base64 unless the application-specific specification explicitly says to omit it. JWT omits it for Base64url; MIME requires it for standard Base64.

Before RFC 4648

Standard Base64

Base64url: The URL-Safe Variant

Base32, Base16, and the Padding Question

Sources

Related on fixjson.org