unicode-to-utf8
Converting Unicode to UTF-8 transforms code points (U+1F600 = 😀) into the byte sequence that UTF-8 uses to represent them (F0 9F 98 80). UTF-8 is variable-width: ASCII characters (0x00-0x7F) are 1 byte, non-ASCII characters take 2-4 bytes. Useful for low-level string handling, debugging encoding bugs, and understanding how text becomes bytes. The ZTools Unicode to UTF-8 converter accepts code-point notation (U+0041, 0x41, 65), characters directly, or escape sequences (\u0041, \U0001F600), and outputs hex / binary / decimal byte sequences.
Use cases
Debug a string encoding issue
Code expects 4-byte UTF-8 for an emoji; got 2 bytes. Converter shows the canonical UTF-8 to compare.
Build URL-percent encoding
😀 (U+1F600) → F0 9F 98 80 → %F0%9F%98%80 in URL form.
Embed Unicode in a code template
Need the byte sequence to write to a binary file. Converter gives the exact hex.
Teach UTF-8 encoding
Show how each code point maps to bytes — students see the variable-width pattern (1-4 bytes).
How it works
- Paste code point or character — U+1F600, 0x1F600, decimal 128512, the character 😀, or escape \u00e9.
- Encode — Look up UTF-8 byte sequence: 1-byte ASCII, 2-byte (00080-007FF), 3-byte (00800-FFFF), 4-byte (10000-10FFFF).
- Display — Hex, binary, decimal byte sequences. Plus URL-percent encoding.
Examples
Input: U+0041 (A)
Output: 1 byte: 0x41. ASCII range. URL %41.
Input: U+00E9 (é)
Output: 2 bytes: 0xC3 0xA9. URL %C3%A9.
Input: U+20AC (€)
Output: 3 bytes: 0xE2 0x82 0xAC. URL %E2%82%AC.
Input: U+1F600 (😀)
Output: 4 bytes: 0xF0 0x9F 0x98 0x80. URL %F0%9F%98%80.
Frequently asked questions
Why is é 2 bytes but A is 1?
UTF-8 is variable-width. ASCII (U+0000 to U+007F) fits in 1 byte (compatibility with ASCII). Above that, 2-4 bytes encode each code point. Saves space for English text, costs more for non-Latin.
How is this different from URL encoding?
URL encoding wraps each byte in %XX. So UTF-8 bytes 0xC3 0xA9 become URL %C3%A9. Same content, different presentation.
Does it support combining characters?
Yes — combining accents (e.g. é as e + combining acute U+0301) encode as separate code points, each with its own UTF-8 sequence.
Privacy?
All in browser.
Tips
- For URL encoding, convert to UTF-8 bytes first, then percent-escape. Don't encode code points directly.
- For binary files, write the byte sequence directly. Don't round-trip through string types that might re-encode.
- For emoji, expect 4 bytes — they're in the supplementary plane (above U+FFFF).
Try it now
The full unicode-to-utf8 runs in your browser at https://ztools.zaions.com/unicode-to-utf8 — no signup, no upload, no data leaves your device.
Last updated: 2026-05-06 · Author: Ahsan Mahmood · Edit this page on GitHub