Unicode Inspector: complete usage guide
Inspect Unicode text locally by listing code points, UTF-16 units, UTF-8 bytes, invisible characters, combining marks, bidirectional controls, and non-ASCII characters.
What this tool does
It walks text by Unicode code point instead of raw UTF-16 code units, so emoji and supplementary-plane characters keep correct indexes.
It displays code point, UTF-16 units, UTF-8 bytes, category, name hint, and flags for each character.
It highlights invisible characters, combining marks, bidirectional controls, right-to-left scripts, and non-ASCII text.
It exports the character table as JSON for local debugging notes and reproducible bug reports.
Typical use cases
- Debug strings that look identical but compare differently.
- Find zero-width or non-breaking characters in tokens, URLs, code, or config files.
- Inspect emoji and supplementary-plane characters before byte-level processing.
- Review right-to-left or bidirectional control characters in copied text.
- Document exact Unicode code points in localization and parser bugs.
Input examples
Zero-width sample
admin\u200B@example.com
Combining mark sample
A\u0301
Output examples
Code point row
U+200B Zero Width Space, invisible flag, UTF-8 E2 80 8B.
Emoji index
Supplementary-plane character counted as one code point and two UTF-16 units.
Stats
Code points, UTF-16 units, bytes, lines, controls, invisible characters, and non-ASCII count.
Common errors and fixes
Counting UTF-16 units as characters
Use code point indexes when debugging emoji and supplementary-plane text.
Missing zero-width differences in diffs
Check invisible flags and code point rows before assuming text is identical.
Confusing composed and decomposed accents
Look for combining mark rows such as U+0301.
Ignoring bidirectional controls
Review bidi flags in copied source, logs, and UI strings.
Sharing sensitive strings in screenshots
Redact tokens, emails, and secrets before exporting Unicode evidence.
Security and privacy notes
For the shared privacy terminology, local processing model, external-request labels, and DevTools verification workflow, see the Trust Center.
- Unicode inspection runs locally and does not upload text.
- Text may contain secrets even when the issue is only a hidden character.
- Exported JSON should be redacted before public issue reports.
Step-by-step workflow
- Feed Unicode Inspector the smallest reproducible sample you can collect from the real issue.
- Review the first findings and separate confirmed signals from assumptions or environment-specific noise.
- Compare a clean baseline sample against the problematic input when you need to isolate regressions.
- Keep one redacted output snapshot with the key findings for tickets, runbooks, or incident handoff.
Quality checklist before sharing output
- Confirm Unicode Inspector findings still reproduce with the same input and assumptions.
- Check that the sample includes enough surrounding context to support the conclusion you are drawing.
- Translate notable findings into concrete next checks, ownership, or remediation notes.
- Redact private hosts, tokens, certificates, or customer identifiers before sharing analysis output.
Operational notes
Unicode Inspector is most effective when it produces a focused, reproducible evidence bundle that can be handed to the next engineer without extra cleanup.
Frequently asked questions
Does it count emoji correctly?
Yes. The scanner iterates by code point, so emoji are not split into two visible rows.
Can it find zero-width characters?
Yes. Common zero-width and invisible characters are flagged explicitly.
Does it normalize text?
No. It inspects the original text so you can see exact code points.
What is the difference between UTF-16 units and code points?
Some characters use two UTF-16 units but still represent one Unicode code point.
Can it help with localization bugs?
Yes. Exact code points make it easier to reproduce rendering, sorting, and parser issues.