Which characters must be encoded in HTML?

Five characters have special meaning in HTML and must always be encoded when used as literal text: the ampersand (&), less-than sign ( ), double quote ("), and single quote/apostrophe ('). The ampersand starts entity references, the angle brackets define tags, and the quotes delimit attribute values. Failing to encode any of these can break your HTML structure or create security vulnerabilities.

HTML Entity Encoder & Decoder Online

Convert special characters to HTML entities or decode entity references back to their original characters. Protect your web pages from XSS vulnerabilities and rendering issues. Everything runs locally in your browser with zero data sent to any server.

Encode all characters (numeric)

Text to Encode

Characters: 0

Encoded Output

Encoded output will appear here...

Characters: 0

Common HTML Entity Reference

Character	Named Entity	Numeric Entity	Description
&	&	&	Ampersand
<	<	<	Less-than sign
>	>	>	Greater-than sign
"	"	"	Double quotation mark
'	'	'	Apostrophe / single quote
			Non-breaking space
©	©	©	Copyright symbol
®	®	®	Registered trademark
™	™	™	Trademark symbol
—	—	—	Em dash
–	–	–	En dash
…	…	…	Horizontal ellipsis
«	«	«	Left double angle quote
»	»	»	Right double angle quote
€	€	€	Euro sign
£	£	£	Pound sterling sign
¥	¥	¥	Yen / Yuan sign
¢	¢	¢	Cent sign
§	§	§	Section sign
°	°	°	Degree symbol
±	±	±	Plus-minus sign
×	×	×	Multiplication sign
÷	÷	÷	Division sign
≠	≠	≠	Not equal to
≤	≤	≤	Less-than or equal to
≥	≥	≥	Greater-than or equal to
∞	∞	∞	Infinity
←	←	←	Left arrow
→	→	→	Right arrow

Why Use This Tool

Instant Client-Side Processing

All HTML entity encoding and decoding happens entirely within your web browser using native JavaScript APIs. There is no server round-trip, no upload delay, and no limitation on input size imposed by a backend service. The tool leverages the browser's built-in DOM parsing capabilities to ensure accurate and standards-compliant conversion of every character. Whether you are encoding a single line of HTML or an entire page of markup, the result appears in milliseconds. This approach guarantees the fastest possible performance regardless of your internet connection speed, because the computation never leaves your device.

Complete XSS Protection

Cross-site scripting remains one of the most prevalent web security vulnerabilities. This tool helps developers sanitize user-generated content by converting dangerous characters into safe HTML entity references before they reach the browser's HTML parser. The encoding process neutralizes script injection vectors by transforming angle brackets, quotes, and ampersands into their entity equivalents. The browser renders these entities as visible text rather than executable markup, effectively preventing malicious code injection. Use this tool to verify your output encoding strategy, test edge cases, and build safer web applications from the ground up.

Flexible Encoding Modes

Choose between two encoding strategies depending on your use case. The default mode encodes only the five special HTML characters (ampersand, less-than, greater-than, double quote, and single quote) that can break HTML structure or create security vulnerabilities. The full encoding mode converts every non-ASCII character into its numeric entity reference, which is useful when you need maximum compatibility with legacy systems, email clients, or environments that do not support UTF-8 encoding. Both modes produce valid HTML that renders correctly across all modern browsers and older systems alike.

Comprehensive Entity Reference

The built-in reference table provides quick access to the most commonly used HTML entities organized by category. Each entry shows the rendered character, its named entity, its numeric entity code, and a description. The table covers essential characters including the five reserved HTML characters, typographic symbols such as em dashes, en dashes, and ellipses, international quotation marks, currency symbols for global commerce, mathematical operators, and directional arrows. Use this reference as a quick lookup when writing HTML by hand or debugging entity-related rendering issues in your web applications.

How HTML Entity Encoder Works

Select your mode. Click the Encode tab to convert special characters into HTML entities, or click the Decode tab to convert entity references back into readable characters. Toggle the Encode all characters option if you need every non-ASCII character converted to its numeric entity reference for maximum compatibility with legacy systems and non-UTF-8 environments.
Enter your content. Type or paste your text, HTML markup, or entity-encoded string into the input textarea. The character count updates in real time as you type. You can paste entire HTML documents, code snippets, or any text that contains special characters needing conversion.
Process and copy. Click the Encode or Decode button to transform your content. The result appears instantly in the output area along with a character count. Click Copy Output or the Copy button inside the output area to send the result to your clipboard for pasting into your HTML source code, template files, database entries, or any other destination.

Pro Tip

You only need to memorize four essential HTML entities for everyday development: & (ampersand), < (less-than), > (greater-than), and " (double quote). These four cover 99% of encoding needs for preventing XSS and rendering issues.

Common Mistake

Double-encoding entities — writing &amp; instead of &. This happens when you encode content that has already been encoded, resulting in visible entity codes in the rendered page instead of the intended characters. Always check whether your framework auto-encodes output before manually encoding.

Real-World Use Cases

Web Developer

Paulo sanitizes user-generated content before rendering it in HTML templates. He encodes forum posts and comments to prevent XSS attacks, ensuring that angle brackets and quotes are displayed as text rather than interpreted as markup.

CMS Administrator

Hannah migrates content between WordPress and a custom CMS. She decodes legacy entity-encoded content back to readable characters, then re-encodes only the necessary reserved characters for the new platform's template engine.

Email Developer

Kofi builds HTML email templates that must render across dozens of email clients. He encodes special characters and symbols as named entities to guarantee consistent display in clients that don't fully support UTF-8 character encoding.

Understanding HTML Entities: A Complete Guide

HTML entities are a fundamental mechanism in web development that allows authors to represent reserved characters, invisible characters, and symbols that cannot be easily typed on a standard keyboard. Every HTML entity begins with an ampersand character and ends with a semicolon. Between these delimiters, the entity is identified either by a human-readable name (a named entity) or by the character's Unicode code point expressed in decimal or hexadecimal notation (a numeric entity). The HTML specification defines named entities for hundreds of commonly used characters, while numeric entities can represent any of the more than 143,000 characters in the Unicode standard.

Why Encoding Matters for Web Security

The most critical reason to encode HTML entities is security. Cross-site scripting, commonly abbreviated as XSS, is an attack technique where malicious actors inject executable scripts into web pages viewed by other users. When a web application takes user input and inserts it directly into the page without encoding, an attacker can craft input containing script tags or event handler attributes that the browser will execute as code. HTML entity encoding defeats this attack vector by converting the angle brackets, quotes, and ampersands that form the building blocks of HTML tags into entity references that the browser displays as plain visible text. Every major web security framework and guideline, including the OWASP Top Ten, identifies output encoding as a primary defense against injection attacks. By encoding user-supplied data before rendering it in HTML context, developers ensure that the browser treats the content as data rather than as executable instructions.

Named Entities vs. Numeric Entities

HTML supports two forms of entity references. Named entities use a predefined alias that describes the character, such as & for the ampersand, © for the copyright symbol, or — for the em dash. These are easy to read and remember, making source code more maintainable. However, only a subset of Unicode characters have assigned named entities. Numeric entities can represent any Unicode character using its code point. Decimal numeric entities follow the pattern &# followed by the decimal code point and a semicolon, such as © for the copyright symbol. Hexadecimal numeric entities use the prefix &#x followed by the hexadecimal code point, such as ©. When working with characters that lack a named entity, numeric references are the only option. Both formats are universally supported across all modern web browsers and HTML parsers.

The Role of UTF-8 in Modern Web Development

With the widespread adoption of UTF-8 as the dominant character encoding on the web, the need for HTML entities has evolved but not disappeared. UTF-8 can encode every Unicode character directly, which means characters like accented letters, CJK ideographs, and emoji can appear in HTML source code without entity encoding, provided the document declares its encoding correctly using the <meta charset="UTF-8"> tag. Despite this, the five reserved HTML characters must still be encoded regardless of the document's character set because they have syntactic meaning in HTML. Beyond security, entities remain valuable for inserting characters that are invisible or difficult to distinguish visually in source code, such as non-breaking spaces, zero-width joiners, and soft hyphens. Entities also provide a fallback mechanism for environments that do not properly handle UTF-8, such as certain email clients, legacy content management systems, and older database configurations. Understanding the interplay between character encoding and HTML entities is essential for any developer building robust, internationalized web applications.

FAQ

What are HTML entities?

HTML entities are special sequences of characters used to represent reserved or hard-to-type characters in HTML documents. Every entity starts with an ampersand (&) and ends with a semicolon (;). For example, < represents the less-than sign (<), which would otherwise be interpreted as the beginning of an HTML tag by the browser. HTML entities serve three main purposes: they allow you to safely display characters that have special meaning in HTML markup (like angle brackets and ampersands), they provide a way to insert characters that cannot be typed directly on most keyboards (like copyright symbols, em dashes, and currency signs), and they ensure correct rendering across different character encoding configurations. The HTML specification defines over 2,000 named entities, and numeric entity references can represent any of the 143,000-plus characters in the Unicode standard. Understanding entities is a fundamental skill for web developers, content authors, and anyone who works with HTML source code directly.

Why should I encode HTML entities in my web pages?

Encoding HTML entities is essential for both security and correct rendering of web content. From a security perspective, encoding prevents cross-site scripting (XSS) attacks, which are among the most common and dangerous web vulnerabilities. When user-supplied text is inserted into a web page without encoding, an attacker can inject malicious scripts that steal cookies, session tokens, personal data, or redirect users to phishing sites. By encoding special characters like <, >, &, ", and ', you ensure the browser treats them as displayable text rather than executable markup. From a rendering perspective, unencoded ampersands and angle brackets can cause browsers to misinterpret your content, leading to broken layouts, missing text, or invalid HTML. Every security framework recommends output encoding as a primary defense against injection attacks, and HTML validators require proper encoding for the document to pass validation. Whether you are building a blog, an e-commerce platform, or a web application, encoding HTML entities is a non-negotiable best practice.

What is the difference between named and numeric HTML entities?

Named entities use a human-readable alias that describes the character, such as & for the ampersand, © for the copyright symbol, or &hearts; for the heart symbol. They are easier to read and understand in source code, making maintenance simpler. However, only a subset of Unicode characters have named entities defined in the HTML specification. Numeric entities use the character's Unicode code point and can represent any character in the entire Unicode standard. Decimal numeric entities follow the format &# plus the decimal code point plus a semicolon (for example, © for the copyright symbol). Hexadecimal numeric entities use &#x plus the hex code point (for example, ©). In practice, named entities are preferred for common characters because they are self-documenting, while numeric entities are used for characters that lack a named entity or when maximum compatibility with older parsers is required. Both forms are supported by all modern web browsers.

How does HTML entity encoding prevent XSS attacks?

Cross-site scripting (XSS) attacks rely on injecting HTML or JavaScript code into a web page that other users will view. The attack works because the browser cannot distinguish between legitimate markup written by the developer and malicious markup injected through user input. HTML entity encoding breaks this attack chain by converting the characters that form HTML syntax into entity references that the browser renders as visible text. When a less-than sign becomes <, the browser displays the character "<" on the page instead of interpreting it as the start of a tag. Similarly, encoding quotes prevents attackers from breaking out of HTML attribute values to inject event handlers like onmouseover or onerror. This technique is called output encoding or contextual escaping, and it is recommended by every major web security standard including the OWASP Application Security Verification Standard, the OWASP XSS Prevention Cheat Sheet, and the W3C security guidelines. It should be applied every time user-controlled data is inserted into an HTML context.

Which characters must always be encoded in HTML?

Five characters have special syntactic meaning in HTML and must always be encoded when they appear as literal text content or attribute values. The ampersand (&) must be encoded as & because it signals the beginning of an entity reference. The less-than sign (<) must be encoded as < because it signals the beginning of an HTML tag. The greater-than sign (>) should be encoded as > to prevent ambiguity in closing contexts. The double quotation mark (") must be encoded as " when used inside attribute values delimited by double quotes. The single quotation mark or apostrophe (') must be encoded as ' or ' when used inside attribute values delimited by single quotes. Failing to encode any of these characters can result in broken HTML, incorrect page rendering, or exploitable security vulnerabilities depending on the context in which the unencoded character appears.

What is the relationship between HTML entities and UTF-8 encoding?

UTF-8 is a character encoding that can represent every character in the Unicode standard directly as a sequence of bytes. When an HTML document declares <meta charset="UTF-8">, the browser knows how to interpret the raw bytes of the file as specific characters, allowing you to include accented letters, CJK characters, emoji, and other symbols directly in your source code without using entities. However, the five reserved HTML characters (ampersand, less-than, greater-than, double quote, and single quote) must still be encoded regardless of the character set because they control HTML syntax. HTML entities remain useful even in a UTF-8 world for several reasons: they provide a way to include invisible characters like non-breaking spaces and zero-width joiners that are impossible to distinguish from regular spaces in a text editor; they serve as a fallback for systems that do not handle UTF-8 correctly, such as certain email templates and legacy content management systems; and they make source code more explicit when dealing with characters that look similar to other characters.

What are the most commonly used HTML entities?

The most frequently used HTML entities fall into several categories. The five essential entities for security and correct HTML are & (ampersand), < (less-than), > (greater-than), " (double quote), and ' (apostrophe). Typographic entities are widely used in professional content: — (em dash), – (en dash), … (ellipsis), « and » (angle quotation marks), and   (non-breaking space). Legal and business symbols include © (copyright), ® (registered trademark), and ™ (trademark). Currency entities such as € (euro), £ (pound), ¥ (yen), and ¢ (cent) are common in e-commerce and financial content. Mathematical entities like × (multiplication), ÷ (division), ± (plus-minus), ≠ (not equal), ≤ (less-than or equal), ≥ (greater-than or equal), and ∞ (infinity) appear frequently in technical and scientific documentation.

Essential HTML Entities

HTML entities represent characters that are reserved in HTML or difficult to type directly. The table below lists the 20 most commonly needed entities organized by category, along with their named reference, numeric code, and rendered character.

Character	Named Entity	Numeric Code	Description	Category
&	`&`	`&`	Ampersand	Essential
<	`<`	`<`	Less than	Essential
>	`>`	`>`	Greater than	Essential
"	`"`	`"`	Double quote	Essential
'	`'`	`'`	Apostrophe	Essential
	` `	` `	Non-breaking space	Spacing
©	`©`	`©`	Copyright	Legal
®	`®`	`®`	Registered	Legal
™	`™`	`™`	Trademark	Legal
—	`—`	`—`	Em dash	Typography
–	`–`	`–`	En dash	Typography
…	`…`	`…`	Horizontal ellipsis	Typography
€	`€`	`€`	Euro sign	Currency
£	`£`	`£`	Pound sign	Currency
¥	`¥`	`¥`	Yen / Yuan sign	Currency
×	`×`	`×`	Multiplication sign	Math
÷	`÷`	`÷`	Division sign	Math
±	`±`	`±`	Plus-minus	Math
°	`°`	`°`	Degree symbol	Science
∞	`∞`	`∞`	Infinity	Math

When to use entities: Always encode &, <, >, and " inside HTML attributes and text content to prevent parsing errors and cross-site scripting (XSS) vulnerabilities. For other characters, entities are optional if your document uses UTF-8 encoding, but they remain useful for characters that are difficult to type or visually ambiguous in source code.

Tools That Pair Well

JSON Formatter Base64 Encoder UUID Generator Word Counter Case Converter