utf8-to-hex
A UTF-8 to hex converter encodes a text string as the sequence of hexadecimal byte values its UTF-8 representation produces β useful for inspecting non-ASCII characters byte-by-byte, debugging encoding mismatches, and embedding raw bytes in places that accept hex literals (Wireshark filter rules, hex-edited binary files, low-level network packets). The ZTools UTF-8 to Hex converter handles full Unicode (BMP plus supplementary planes β emoji, CJK extensions), supports multiple separator styles (space, comma, none, 0x prefix), and is paired with an inverse Hex-to-UTF-8 converter for round-trip verification.
Use casesβ
Debugging encoding bugsβ
Mystery character renders as "Γ’β¬β’". Convert the original UTF-8 to hex; compare to what the system reads. Mismatches reveal the wrong-decode path.
Wireshark / network analysisβ
Filter rules need byte-level patterns (e.g. tcp contains 48:65:6c:6c:6f). Convert "Hello" to hex once.
Embedding bytes in codeβ
A test fixture or migration script embeds a known string as \\x48\\x65\\x6c\\x6c\\x6f for byte-precision. Convert text once.
Educational demosβ
Demonstrate that "β¬" is 3 UTF-8 bytes (E2 82 AC). Concrete byte view makes the abstract concrete.
How it worksβ
- Paste text β Any Unicode β Latin, CJK, emoji, etc.
- Encode UTF-8 β Each codepoint encodes to 1β4 bytes per UTF-8 rules.
- Format as hex β Each byte β two-digit hex. Pick separator: space (
48 65 6c), comma (48,65,6c), 0x (0x48 0x65 0x6c), or no separator. - Copy β Output to clipboard. Round-trip via Hex-to-UTF-8 to verify.
- Inspect codepoints β Optional view shows codepoint, character, byte sequence side-by-side.
Examplesβ
Input: Hello (UTF-8 β hex, space-separated)
Output: 48 65 6c 6c 6f
Input: β¬ (UTF-8)
Output: e2 82 ac
Input: π (UTF-8, 4 bytes)
Output: f0 9f 9a 80
Frequently asked questionsβ
Why is "Hello" 5 bytes but "hΓ©llo" is 6?
ASCII characters fit in 1 byte each. "Γ©" (U+00E9) requires 2 bytes in UTF-8 (c3 a9). So "hΓ©llo" is 6 bytes.
How big can emoji be?
4 bytes in UTF-8 (codepoints in the supplementary planes). Newer emoji like π (U+1F680) take 4 bytes.
Can I convert directly to UTF-16 hex?
Yes β switch the encoding selector to UTF-16 BE / LE. Output is hex of those bytes.
What if my text has a BOM?
A leading BOM (EF BB BF for UTF-8) appears in the hex output. Toggle "strip BOM" to omit it.
Are uppercase or lowercase hex digits used?
Selectable. Lowercase is the modern convention; some legacy systems prefer uppercase.
Does it handle invalid UTF-8?
Input is assumed valid Unicode text from your browser; output is always valid UTF-8 hex. For decoding broken byte streams, use Hex-to-UTF-8 in lenient mode.
Tipsβ
- For Wireshark filters, no-separator hex (
48656c6c6f) is the right format. - For embedded code literals,
\\xNNstyle is most common β switch the format to that. - Round-trip via Hex-to-UTF-8 always β round-trip mismatches reveal off-by-one or BOM issues.
- For mojibake debugging, also dump the bytes the system thinks it has β comparing the two reveals the wrong-decode step.
- Use the codepoint table for in-depth investigation β sometimes a "weird character" is a confusable Unicode lookalike.
Try it nowβ
The full utf8-to-hex runs in your browser at https://ztools.zaions.com/utf8-to-hex β no signup, no upload, no data leaves your device.
Last updated: 2026-05-05 Β· Author: Ahsan Mahmood Β· Edit this page on GitHub