[ JAVASCRIPT_DISABLED ]

For the best experience, please enable JavaScript. However, you can still read all content on this page. Some interactive features may not be available.

Bad Character Scanner™ — Free Unicode Security & AI Code Safety Tools

.noscript-blog { max-width: 800px; margin: 0 auto; padding: 2rem; font-family: system-ui, sans-serif; } .noscript-blog h1 { font-size: 2rem; margin-bottom: 1rem; } .noscript-blog .meta { color: #666; margin-bottom: 2rem; } .noscript-blog .content { line-height: 1.6; }

We're the Only Ones Who Know What We Don't Know — and We Have the Math to Prove It

Sat Mar 14 2026 17:00:00 GMT-0700 (Pacific Daylight Time)

⚠️ Disclaimer: The BCS Industry Blog is an independent, volunteer-run publication for educational and entertainment purposes only. The views and opinions expressed in blog posts are solely those of the individual authors and do not represent the official policy, position, or advice of Bad Character Scanner (BCS) or its affiliates. The blog is operated independently and is not subject to the jurisdiction or editorial control of BCS. All content is provided "as is" and should not be construed as professional or legal advice.

The following is an editorial opinion piece.

We're proud of what our scanner found. In a recent test against a corpus of known Unicode threats, Bad Character Scanner detected approximately 2,200 threats. That's a real number. That's real work.

But here's the part most tools won't tell you:

We believe there are over 2,600 threats in that same dataset.

And we know this — not through guesswork, not through marketing copy — but through mathematics.

The Honest Number Nobody Talks About

Almost every cybersecurity tool on the market will tell you what it found. Very few will tell you what it missed. Fewer still can prove they missed something without having found it first.

We can.

Our FMM™ (Fractal Morphological Machine) engine includes a statistical layer built on what we call GT Problem (GTP) estimation — a family of mathematical estimators originally developed to solve a WWII intelligence problem. The original problem: given that you've captured a series of enemy tanks with sequential serial numbers, how many tanks does the enemy really have?

The math is elegant and sound. If you've seen the highest serial number m across k captures, the best unbiased estimate of the total population is:

N̂ = m + (m/k) − 1

We apply an equivalent logic to Unicode threat detection. Every threat we find is a "captured serial number." The position of each threat in the text is its serial number. And the GTP estimator tells us — with mathematical confidence — how many threats we likely haven't captured yet.

The result is what we call Manifold Coverage — currently reported at approximately 57.8% in our free scanner.

That means we believe we're finding roughly 6 in every 10 threats. Not 10 in 10.

Why We Tell You This

"We're moving from an era of accidental contamination to intentional weaponization."

— J. Shoy, Bad Character Scanner

That quote is ours. We mean it. And if we mean it, then we have an obligation to be honest about the limits of what we see.

The uncomfortable truth of cybersecurity in the age of AI is this: there is no scanner that finds everything. Every detection engine has a coverage boundary. The real question is whether you know where that boundary is — or whether you're operating under the assumption that silence means safety.

Silence doesn't mean safety. It often means the opposite.

Most tools give you a clean report and a confident score. We give you a clean report, a confident score, and an honest estimate of the gap between what we found and what we think exists. That gap is not a failure. It is a feature. It is the most important number on the page.

The Math Behind the Humility

Our GTP engine runs during every scan. After all detection layers complete, it:

Collects the positions of every detected threat as "serial numbers"
Applies the UMVU (Uniformly Minimum Variance Unbiased) estimator across the full text manifold
Projects a total threat population and compares it against the confirmed count
Reports the coverage ratio and an estimated number of hidden threats still in the text

When coverage is high (above ~90%), the engine reports confidence that the scan was comprehensive. When coverage is low, it warns you — plainly, in the results — that material risk may remain undetected.

This is what we call Fractal Topology Analysis: looking at the shape of what you've found to infer the shape of what you haven't.

What This Means for Cybersecurity

Very few tools in the security space will ever publish a number like "we got 57.8% coverage." The incentive structure runs the other way: show confidence, show completeness, show a clean dashboard.

We think that's dangerous.

As threats grow more sophisticated — as invisible characters transition from accidental LLM artifacts to intentional weaponization, as homoglyph attacks compound across codebases, as AI assistants become attack surfaces — the gap between "what we found" and "what's actually there" becomes the most important measurement in security.

We feature it publicly. We feature it in the free tool. We think you deserve to see it.

Not because we want to undersell what we've built — 2,200 real detections in a single scan is genuinely impressive — but because the number next to it, the projected number, is what tells you whether you can sleep at night.

Right now, for our free tool, the answer is: you should keep scanning, and so should we.

Where We're Going

We are actively working to close the gap. Our scanner architecture is modular by design, and the GTP projection serves as our north star. Each new detection layer we add — combining diacritical mark clusters, Private Use Area characters, enclosed alphanumerics — narrows the distance between what we find and what we project.

When those two numbers converge, we'll know we're close to comprehensive.

Until then, we'll keep telling you the honest number. Both of them.

Published: March 15, 2026 Author: J. Shoy, Independent Contributor