How to Find and Remove Hidden Unicode Characters in Your Text

If a line of text looks perfectly normal but refuses to behave — a password is rejected, a search returns nothing, a CSV column splits in the wrong place, or a compiler throws a mysterious error — the cause is very often a hidden Unicode character you cannot see. This guide explains what these characters are, how to find them, and how to remove them safely. You can do everything described here with the Invisible Character Detector on the home page, which runs entirely in your browser.

The short answer

Paste the suspect text into the detector, look at the highlighted view to see exactly where the hidden characters sit, tick the clean-up options you want, and copy the cleaned result. That is the whole workflow. The rest of this article explains why the problem happens so often and how to avoid re-introducing it.

Why invisible characters end up in your text

Unicode defines more than a hundred code points that are either completely invisible or that look identical to a character you already know. They are legitimate — each was added for a real typographic, linguistic or formatting reason — but they travel silently when you copy and paste. The usual sources are word processors, which insert non-breaking spaces and curly "smart" quotes automatically; PDFs, which scatter zero-width characters and unusual spaces as a side effect of layout; web pages, where a designer used a non-breaking space to stop a phrase from wrapping; chat and email apps, which add directional marks for languages such as Arabic and Hebrew; and the byte-order mark (U+FEFF), which some programs prepend to the start of a file. None of these are visible on screen, which is exactly why they are so hard to track down by eye.

Why a character you cannot see still breaks things

Software does not read text the way a person does. A search box, a programming language, a database and a spreadsheet all compare text byte by byte, or code point by code point. To them, a normal space (U+0020) and a non-breaking space (U+00A0) are simply two different characters, just as "a" and "b" are different. So a value that reads Total: 100 with a hidden non-breaking space will not match the same value typed with an ordinary space. A zero-width space tucked between two letters of a keyword makes that keyword unrecognisable to a parser even though it looks untouched. This is why "it looks fine but doesn't work" is the classic symptom.

Step by step: cleaning your text

1. Capture the exact text. Copy the problem text directly from wherever it is failing — the form field, the code editor, the spreadsheet cell — rather than retyping it, so you preserve the hidden characters you are hunting.

2. Scan it. Paste it into the detector. The summary line tells you how many characters were flagged, and the highlighted view replaces every hidden character with a small badge showing its hex code, so you can see precisely where the intruders are.

3. Read the breakdown. The table groups the findings by code point and names each one — for example "Zero-Width Space (U+200B)" or "No-Break Space (U+00A0)" — and tells you which category it belongs to. This is useful when you only want to remove some kinds of character and keep others.

4. Choose your clean-up. By default the tool removes invisible, control, bidirectional and tag characters, and converts unusual spaces to ordinary ones. If you are preparing text for code, JSON or a CSV file, also enable "straighten smart quotes, dashes and ellipses" so that curly quotes become straight ones. If you are cleaning prose for publication, you may prefer to leave the typography intact.

5. Copy the result. Use the copy button and paste the cleaned text back where it belongs. Re-run the scan on the output if you want to confirm it now comes back clean.

How to stop it happening again

A few habits prevent most recurrences. When moving text between programs, paste as plain text (Ctrl/Cmd+Shift+V in many apps) so formatting characters are stripped on the way in. Turn off automatic "smart quotes" and automatic hyphenation in your word processor if the text is destined for code or data. When saving files that other tools will read, choose "UTF-8" rather than "UTF-8 with BOM" unless the consumer specifically wants the byte-order mark. And when a value mysteriously fails to match, make checking for hidden characters one of the first things you try rather than the last.

A note on safety

Because hidden characters are a known vector for confusing both people and software — bidirectional overrides can make source code display differently from how it compiles, and tag characters can carry a concealed message — it is worth scanning any text you receive from an untrusted source before you run or trust it. The detector makes that a two-second check. For more on the individual character types and the debate around AI text watermarks, see our field guide. When in doubt, scan first and clean second; it costs nothing and saves a great deal of confusion.