URL Encoding: The Complete Guide
Have you ever clicked a link and seen strange sequences like %20 or %3A in your browser's address bar? Or perhaps you've struggled with a URL that works in your browser but breaks in your code? These situations all relate to URL encoding, and understanding it thoroughly will save you countless hours of debugging.
In this guide, we'll explore everything you need to know about URL encoding. We'll start with why it exists, work through the mechanics of how it works, examine the key differences between JavaScript's encoding functions, and finish with practical debugging techniques. Let's begin.
Why URL Encoding Exists
URLs were designed with a limited character set in mind. The original specification allowed only a small set of characters that were considered safe and unambiguous across different systems and contexts.
Think about it: a URL needs to be parsed by browsers, servers, and various network equipment. It might be transmitted over different protocols, stored in databases, or embedded in HTML. At each step, certain characters could cause problems.
Consider this URL:
https://example.com/search?q=coffee & tea
That space character creates ambiguity. Where does the URL end? Is "tea" part of the query or something else? And that ampersand might be interpreted as the start of a new parameter.
URL encoding solves this by replacing problematic characters with safe representations. The properly encoded version looks like this:
https://example.com/search?q=coffee%20%26%20tea
Now there's no ambiguity. Every system along the way knows exactly what this URL means.
Understanding Reserved and Unreserved Characters
URLs divide characters into two categories, and understanding this distinction is fundamental.
Unreserved Characters
These characters can appear anywhere in a URL without encoding:
- Letters: A-Z and a-z
- Digits: 0-9
- Four special characters: hyphen (-), period (.), underscore (_), tilde (~)
These 66 characters will never cause problems and never need encoding.
Reserved Characters
These characters have special meaning in URLs:
: / ? # [ ] @ ! $ & ' ( ) * + , ; =
Each serves a specific purpose:
:separates the scheme from the rest (https**:**//example.com)/separates path segments (/users**/profile/**settings)?marks the start of the query string#marks the start of the fragment&separates query parameters (?page=1**&**sort=name)=separates parameter names from values (page**=**1)
When these characters appear in data rather than as structural delimiters, they must be encoded. This is where many developers run into trouble.
Percent Encoding: The Mechanism
URL encoding is technically called percent encoding because it uses the percent sign followed by two hexadecimal digits representing the character's byte value.
Here's how it works:
- Take the character you need to encode
- Convert it to its UTF-8 byte sequence
- Represent each byte as %XX where XX is the hexadecimal value
Let's trace through some examples:
Space character:
- ASCII/UTF-8 value: 32 (decimal) = 20 (hexadecimal)
- Encoded form:
%20
Ampersand (&):
- ASCII/UTF-8 value: 38 (decimal) = 26 (hexadecimal)
- Encoded form:
%26
Equals sign (=):
- ASCII/UTF-8 value: 61 (decimal) = 3D (hexadecimal)
- Encoded form:
%3D
Encoding Non-ASCII Characters
What about characters outside the basic ASCII range? This is where UTF-8 encoding comes in. Characters like accented letters or emoji are first converted to their UTF-8 byte sequence, then each byte is percent-encoded.
Take the word "cafe" with the French spelling "cafe" (with an accent: cafe):
The character e is UTF-8 encoded as two bytes: C3 A9. So in a URL, it becomes:
caf%C3%A9
An emoji like the check mark would be encoded as multiple bytes:
%E2%9C%93
This multi-byte encoding is important to understand when debugging encoding issues with international text.
JavaScript's Encoding Functions
JavaScript provides several functions for URL encoding, and using the wrong one is one of the most common mistakes I see. Let's examine each carefully.
encodeURI()
The encodeURI() function is designed to encode an entire URI. It preserves characters that have special meaning in URLs because it assumes you're encoding a complete, valid URL.
const url = 'https://example.com/path?query=hello world'; console.log(encodeURI(url)); // Output: https://example.com/path?query=hello%20world
Notice that the spaces were encoded, but the colons, slashes, question mark, and equals sign were preserved. That's intentional because those characters are part of the URL structure.
Characters NOT encoded by encodeURI:
A-Z a-z 0-9 - _ . ~ ! # $ & ' ( ) * + , / : ; = ? @
When to use encodeURI: Use it when you have a complete URL and just need to ensure any spaces or special characters in it are properly encoded. This is relatively rare in practice.
encodeURIComponent()
The encodeURIComponent() function is more aggressive. It encodes everything except unreserved characters because it's designed for encoding data that will become part of a URL.
const searchTerm = 'coffee & tea'; console.log(encodeURIComponent(searchTerm)); // Output: coffee%20%26%20tea
The ampersand was encoded because in the context of a query string component, it's data, not a delimiter.
Characters NOT encoded by encodeURIComponent:
A-Z a-z 0-9 - _ . ~ ! ' ( ) *
When to use encodeURIComponent: This is what you'll use most often. Use it whenever you're constructing a URL and need to encode:
- Query parameter values
- Query parameter names (if they might contain special characters)
- Path segments
- Any piece of data that will be embedded in a URL
The Critical Difference
Let me illustrate why choosing the right function matters:
const userInput = 'Tom & Jerry'; // Building a search URL incorrectly: const badUrl = 'https://example.com/search?q=' + encodeURI(userInput); console.log(badUrl); // Output: https://example.com/search?q=Tom%20&%20Jerry // Problem! The & wasn't encoded, so we now have two parameters // Building it correctly: const goodUrl = 'https://example.com/search?q=' + encodeURIComponent(userInput); console.log(goodUrl); // Output: https://example.com/search?q=Tom%20%26%20Jerry // Correct! The entire input is preserved as one parameter value
In the first case, a server parsing this URL would see two parameters: q=Tom and Jerry (with an empty value). That's not what we intended at all.
decodeURI() and decodeURIComponent()
For completeness, here are the decoding counterparts:
decodeURI('https://example.com/search?q=hello%20world'); // Output: https://example.com/search?q=hello world decodeURIComponent('hello%20world'); // Output: hello world
These reverse the encoding process. Use decodeURIComponent() for decoding parameter values and decodeURI() for complete URLs.
What About escape() and unescape()?
You might encounter these older functions in legacy code:
escape('hello world'); // "hello%20world" unescape('hello%20world'); // "hello world"
Do not use these functions. They are deprecated and don't handle Unicode correctly. They use a non-standard encoding scheme that can cause problems with international characters. Always use encodeURIComponent() and decodeURIComponent() instead.
Building URLs Safely
Now that we understand the encoding functions, let's look at patterns for building URLs safely.
Manual Construction
The traditional approach involves string concatenation with encoding:
function buildSearchUrl(baseUrl, params) { const queryString = Object.keys(params) .map(key => { const encodedKey = encodeURIComponent(key); const encodedValue = encodeURIComponent(params[key]); return `${encodedKey}=${encodedValue}`; }) .join('&'); return `${baseUrl}?${queryString}`; } const url = buildSearchUrl('https://api.example.com/search', { q: 'coffee & tea', category: 'hot drinks', sort: 'price:asc' }); // Output: https://api.example.com/search?q=coffee%20%26%20tea&category=hot%20drinks&sort=price%3Aasc
Using the URL API
Modern JavaScript provides the URL and URLSearchParams APIs, which handle encoding automatically:
const url = new URL('https://api.example.com/search'); url.searchParams.set('q', 'coffee & tea'); url.searchParams.set('category', 'hot drinks'); url.searchParams.set('sort', 'price:asc'); console.log(url.toString()); // Output: https://api.example.com/search?q=coffee+%26+tea&category=hot+drinks&sort=price%3Aasc
Notice something interesting? URLSearchParams encodes spaces as + instead of %20. Both are valid in query strings (the + convention comes from the HTML form specification), but it's worth knowing about this difference.
The URL API is my recommended approach for most cases. It handles encoding automatically and provides a clean interface for manipulating URL parts.
Encoding Path Segments
Path segments need encoding too, but be careful not to encode the slashes that separate them:
// Wrong: encoding the whole path const badPath = encodeURIComponent('/users/John Doe/profile'); // Output: %2Fusers%2FJohn%20Doe%2Fprofile (slashes are encoded!) // Right: encoding each segment separately const segments = ['users', 'John Doe', 'profile']; const goodPath = '/' + segments.map(encodeURIComponent).join('/'); // Output: /users/John%20Doe/profile
Common Mistakes and How to Avoid Them
Let me walk you through the mistakes I see most often, along with their solutions.
Mistake 1: Double Encoding
This happens when you encode a value that's already encoded:
const alreadyEncoded = 'hello%20world'; const doubleEncoded = encodeURIComponent(alreadyEncoded); // Output: hello%2520world // The % became %25, so %20 became %2520
When you decode this, you get hello%20world instead of hello world.
Solution: Only encode once, at the point where you construct the URL. If you're unsure whether something is encoded, decode it first:
function safeEncode(value) { // Decode first in case it's already encoded, then encode try { return encodeURIComponent(decodeURIComponent(value)); } catch { // If decoding fails, the value might not be encoded return encodeURIComponent(value); } }
Mistake 2: Using encodeURI for Query Parameters
We covered this earlier, but it's worth emphasizing:
// Wrong const url = 'https://api.example.com?name=' + encodeURI('A & B'); // Result: https://api.example.com?name=A%20&%20B (& not encoded!) // Right const url = 'https://api.example.com?name=' + encodeURIComponent('A & B'); // Result: https://api.example.com?name=A%20%26%20B
Mistake 3: Forgetting to Encode User Input
Any data that comes from users must be encoded:
// Dangerous if userName contains special characters const url = `https://api.example.com/users/${userName}`; // Safe const url = `https://api.example.com/users/${encodeURIComponent(userName)}`;
This isn't just about preventing errors; it's also a security consideration. Unencoded user input can lead to injection attacks.
Mistake 4: Encoding the Entire URL
Sometimes developers encode an entire URL, including its structure:
// Wrong const encoded = encodeURIComponent('https://example.com/path?query=value'); // Result: https%3A%2F%2Fexample.com%2Fpath%3Fquery%3Dvalue // This is no longer a valid URL! // Right: only encode the parts that need encoding const url = `https://example.com/path?query=${encodeURIComponent('value with spaces')}`;
Debugging Encoded URLs
When things go wrong with URL encoding, here's my systematic approach to debugging.
Step 1: Visually Inspect the URL
Look at the URL in your browser's address bar or in your logs. Common patterns to look for:
%25indicates double encoding (the%itself was encoded)- Missing encoding for special characters (
&,=,?in data) - Garbled text for international characters
Step 2: Decode and Compare
Use the browser console to decode the URL and see what you actually have:
const suspiciousUrl = 'https://example.com/search?q=hello%2520world'; const decoded = decodeURIComponent('hello%2520world'); console.log(decoded); // "hello%20world" - aha, double encoded!
Step 3: Trace the Encoding Path
Walk through your code and identify every place where encoding happens. Look for:
- Libraries that might encode automatically
- Server-side encoding that adds to client-side encoding
- Middleware that transforms URLs
Step 4: Test with Problematic Characters
When testing URL handling, always test with characters that commonly cause problems:
const testCases = [ 'hello world', // space 'Tom & Jerry', // ampersand 'price=100', // equals sign 'path/to/file', // slash '50% off', // percent sign 'search?q=test', // question mark '[email protected]', // plus sign 'uber', // non-ASCII ]; testCases.forEach(input => { const encoded = encodeURIComponent(input); const decoded = decodeURIComponent(encoded); console.log(`${input} -> ${encoded} -> ${decoded}`); console.assert(input === decoded, 'Round trip failed!'); });
Quick Reference
Let me leave you with a quick reference for common scenarios:
| Scenario | Function |
|---|---|
| Encoding a query parameter value | encodeURIComponent() |
| Encoding a path segment | encodeURIComponent() |
| Encoding a complete URL with spaces | encodeURI() |
| Building a URL from scratch | Use the URL API |
| Decoding a parameter value | decodeURIComponent() |
And here's a handy encoding reference for common characters:
| Character | Encoded |
|---|---|
| Space | %20 or + |
& | %26 |
= | %3D |
? | %3F |
/ | %2F |
# | %23 |
% | %25 |
+ | %2B |
Conclusion
URL encoding might seem like a small detail, but getting it wrong can cause frustrating bugs that are hard to diagnose. The key principles to remember are:
- Reserved characters have special meaning in URLs and must be encoded when used as data
- Use
encodeURIComponent()for encoding data that becomes part of URLs - Use
encodeURI()only for encoding complete URLs (rare) - Encode once, at the point of URL construction
- Test with special characters early and often
With these principles in mind, you'll be well-equipped to handle any URL encoding challenge that comes your way. The next time you see a %20 in your address bar, you'll know exactly what's going on and why it's there.
