⚠️ JavaScript Disabled

For the best experience, please enable JavaScript. However, you can still read all content on this page. Some interactive features may not be available.

.noscript-blog { max-width: 800px; margin: 0 auto; padding: 2rem; font-family: system-ui, sans-serif; } .noscript-blog h1 { font-size: 2rem; margin-bottom: 1rem; } .noscript-blog .meta { color: #666; margin-bottom: 2rem; } .noscript-blog .content { line-height: 1.6; }

How Invisible Unicode Attacks Break NLP Models

Wed Dec 03 2025 16:00:00 GMT-0800 (Pacific Standard Time)

Disclaimer:

This post analyzes security research presented at the IEEE Symposium on Security and Privacy. The techniques discussed are for educational and defensive purposes only.

(And Why Your Codebase Is Vulnerable)

We've already celebrated the landmark paper "Bad Characters: Imperceptible NLP Attacks" as the moment the industry woke up to invisible threats. But celebrating the paper isn't enough we need to understand the mechanics, because the threat has only evolved since Nicholas Boucher took the stage at IEEE S&P three years ago.

In that groundbreaking talk, Boucher and his team (from Cambridge, Toronto, and Edinburgh) demonstrated a fundamental vulnerability: Text-based machine learning is broken.

Unlike images, where adversarial attacks require complex pixel manipulation, text models can be fooled by characters you can't even see.

@[youtube-player](c5FDHqfMcO4, true)

In the talk the focus was on breaking translation and toxicity models. Today, the stakes are much higher.

The same "imperceptible perturbations" that Boucher demonstrated are now being weaponized against:

LLM Prompt Injection: As we covered in our analysis of PromptFoo's research, invisible characters can now backdoor AI-generated code.
Supply Chain Attacks: The "Trojan Source" vulnerability (CVE-2021-42574) showed us that these characters can hide malicious logic in source code that passes human review.
RAG Poisoning: Injecting invisible characters into knowledge bases to corrupt Retrieval-Augmented Generation systems.
AI-IDE rule files backdooring: Attackers exploit characters to communicate by posting simple messages on social media sites with the words 'rule file example'. This creates a backdoor to the LLM that powers the user's IDE.

The core issue remains the same: The gap between what humans see (pixels) and what machines see (bytes).

Boucher's team identified four distinct classes of attacks that are still relevant today:

Invisible Characters: Zero-width spaces that alter tokenization unseen.
Homoglyphs: Visually identical characters from different scripts.
Reorderings: Bidi controls that swap logical byte order.
Deletions: Backspace characters that logically remove visible content.

Attack Techniques Visualization

Why This Matters Now

It affects Google Translate, Facebook's AI models, IBM's toxicity detectors, and if you work in Machine Learning, likely your own internal NLP pipelines.

But more importantly, as we move to Agentic AI and Auto-Coding systems, these "glitches" become security holes. If an AI agent misinterprets a command because of an invisible character, it could delete the wrong database or deploy the wrong code.

The Genetic Algorithm: Evolution of an Attack

With this, the researchers developed a genetic algorithm to optimize the attacks. By iteratively injecting, swapping, and reordering invisible characters, they could find the exact combination that breaks a specific model with minimal perturbation.

Remarkably, they found that models often break with just 1 to 3 perturbations. It doesn't take a massive amount of garbage data to confuse these systems just a few well-placed "bad characters."

This part is an AD, I know I know, but we got to pay the bills:

The Solution: Bad Character Scanner

This research validates the core mission of Bad Character Scanner. We built our tool specifically to detect these kinds of invisible threats.

While the researchers propose "Input Sanitization" as a defense, implementing it correctly across an entire organization is a massive challenge. That's where we come in.

Machine-Level Scanning: Our scanner operates at the byte level, not the visual level. It sees the invisible characters, the homoglyphs, and the Bidi overrides that your IDE and your eyes miss.
Pre-Compilation Detection: By scanning your source code and data files before they enter your build pipeline or ML training set, we prevent model poisoning at the source.
Specific Detection: We have dedicated scanners for:
- Bidirectional Overrides: Detecting the reordering attacks.
- Homoglyphs: Identifying mixed-script confusion attacks.
- Invisible Characters: Flagging zero-width spaces and other non-printing anomalies.
- Malformed UTF-8: Catching invalid byte sequences that can crash parsers.

References

"Bad Characters: Imperceptible NLP Attacks" - Nicholas Boucher, Ilia Shumailov, Ross Anderson, Nicolas Papernot. IEEE Symposium on Security and Privacy 2022. imperceptible.ml
Trojan Source - trojansource.codes