Skip to main content

utf8-to-hex

A UTF-8 to hex converter encodes a text string as the sequence of hexadecimal byte values its UTF-8 representation produces β€” useful for inspecting non-ASCII characters byte-by-byte, debugging encoding mismatches, and embedding raw bytes in places that accept hex literals (Wireshark filter rules, hex-edited binary files, low-level network packets). The ZTools UTF-8 to Hex converter handles full Unicode (BMP plus supplementary planes β€” emoji, CJK extensions), supports multiple separator styles (space, comma, none, 0x prefix), and is paired with an inverse Hex-to-UTF-8 converter for round-trip verification.

Use cases​

Debugging encoding bugs​

Mystery character renders as "Ò€ℒ". Convert the original UTF-8 to hex; compare to what the system reads. Mismatches reveal the wrong-decode path.

Wireshark / network analysis​

Filter rules need byte-level patterns (e.g. tcp contains 48:65:6c:6c:6f). Convert "Hello" to hex once.

Embedding bytes in code​

A test fixture or migration script embeds a known string as \\x48\\x65\\x6c\\x6c\\x6f for byte-precision. Convert text once.

Educational demos​

Demonstrate that "€" is 3 UTF-8 bytes (E2 82 AC). Concrete byte view makes the abstract concrete.

How it works​

  1. Paste text β€” Any Unicode β€” Latin, CJK, emoji, etc.
  2. Encode UTF-8 β€” Each codepoint encodes to 1–4 bytes per UTF-8 rules.
  3. Format as hex β€” Each byte β†’ two-digit hex. Pick separator: space (48 65 6c), comma (48,65,6c), 0x (0x48 0x65 0x6c), or no separator.
  4. Copy β€” Output to clipboard. Round-trip via Hex-to-UTF-8 to verify.
  5. Inspect codepoints β€” Optional view shows codepoint, character, byte sequence side-by-side.

Examples​

Input: Hello (UTF-8 β†’ hex, space-separated)

Output: 48 65 6c 6c 6f


Input: € (UTF-8)

Output: e2 82 ac


Input: πŸš€ (UTF-8, 4 bytes)

Output: f0 9f 9a 80

Frequently asked questions​

Why is "Hello" 5 bytes but "hΓ©llo" is 6?

ASCII characters fit in 1 byte each. "Γ©" (U+00E9) requires 2 bytes in UTF-8 (c3 a9). So "hΓ©llo" is 6 bytes.

How big can emoji be?

4 bytes in UTF-8 (codepoints in the supplementary planes). Newer emoji like πŸš€ (U+1F680) take 4 bytes.

Can I convert directly to UTF-16 hex?

Yes β€” switch the encoding selector to UTF-16 BE / LE. Output is hex of those bytes.

What if my text has a BOM?

A leading BOM (EF BB BF for UTF-8) appears in the hex output. Toggle "strip BOM" to omit it.

Are uppercase or lowercase hex digits used?

Selectable. Lowercase is the modern convention; some legacy systems prefer uppercase.

Does it handle invalid UTF-8?

Input is assumed valid Unicode text from your browser; output is always valid UTF-8 hex. For decoding broken byte streams, use Hex-to-UTF-8 in lenient mode.

Tips​

  • For Wireshark filters, no-separator hex (48656c6c6f) is the right format.
  • For embedded code literals, \\xNN style is most common β€” switch the format to that.
  • Round-trip via Hex-to-UTF-8 always β€” round-trip mismatches reveal off-by-one or BOM issues.
  • For mojibake debugging, also dump the bytes the system thinks it has β€” comparing the two reveals the wrong-decode step.
  • Use the codepoint table for in-depth investigation β€” sometimes a "weird character" is a confusable Unicode lookalike.

Try it now​

The full utf8-to-hex runs in your browser at https://ztools.zaions.com/utf8-to-hex β€” no signup, no upload, no data leaves your device.

Open the tool β†—


Last updated: 2026-05-05 Β· Author: Ahsan Mahmood Β· Edit this page on GitHub