NEWS

White Circle Bags $11M to Stop AI Models Going Rogue

Published

1 month ago

May 12, 2026

A Paris startup born from a viral hack has just pulled in serious money to fix one of the biggest blind spots in enterprise AI. White Circle has raised $11 million in seed funding to help companies monitor, secure and control the AI systems they push into production. The kicker? Its backers are the very people building the models it polices.

Inside the $11 Million Seed Round Shaking Up AI Safety

The Paris-based startup pulled in $11 million from some of the biggest names in the industry, including Romain Huet of OpenAI, Dirk Kingma (ex-OpenAI, now Anthropic), Guillaume Lample of Mistral, Thomas Wolf of Hugging Face, Olivier Pomel of Datadog, François Chollet of Keras, Mehdi Ghissassi (ex-DeepMind), Paige Bailey of DeepMind, and David Cramer of Sentry.

That investor list is rare. It reads as senior practitioners from the labs that build the models White Circle is designed to police.

The seed money will accelerate product development, expand the team across the US, UK and Europe, and grow White Circle’s global customer base. The startup currently runs with a team of 20 distributed across London, France, Amsterdam and elsewhere in Europe, and almost all of them are engineers.

white circle ai governance platform seed funding announcement

Who Wrote the Cheques

Romain Huet – OpenAI, head of developer experience
Durk Kingma – OpenAI cofounder, now at Anthropic
Guillaume Lample – Mistral cofounder and chief scientist
Thomas Wolf – Hugging Face cofounder and chief science officer
François Chollet – Creator of Keras
Paige Bailey & Mehdi Ghissassi – DeepMind
Olivier Pomel – Datadog CEO
David Cramer – Sentry

The Viral Jailbreak That Started It All

One evening in late 2024, Denis Shilov was watching a crime thriller when he had an idea for a prompt that would break through the safety filters of every leading AI model.

The prompt was what researchers call a universal jailbreak, meaning it could be reused to get any model to bypass their own guardrails and produce dangerous or prohibited outputs, like instructions on how to make drugs or build weapons. The trick was almost embarrassingly simple.

Shilov simply told the AI models to stop acting like a chatbot with safety rules and instead behave like an API endpoint, a software tool that automatically takes in a request and sends back a response. The prompt reframed the model’s job as simply answering, rather than deciding whether a request should be rejected, and made every leading AI model comply with dangerous questions it was supposed to refuse.

Shilov’s universal jailbreak meant he could unlock instructions on how to make drugs and weapons, access dangerous or illegal information and extract anything that models including ChatGPT, Claude and more were explicitly built to refuse. After hitting 1.4M views, the post drew attention from Anthropic, OpenAI and Hugging Face, leading to Shilov being invited to join Anthropic’s bug bounty program and later building the White Circle platform.

“Jailbreaks are just one part of the problem. In as many ways people can misbehave, models can misbehave too. Because these models are very smart, they can do a lot more harm.” – Denis Shilov, Founder and CEO, White Circle

How the Platform Keeps AI Models in Check

The startup builds software that sits between a company’s users and its AI models, checking inputs and outputs in real time against company-specific policies. Think of it as a security guard standing at the door of every conversation.

If a user tries to generate malware, scams, or other prohibited content, the system can flag or block the request. If a model starts hallucinating, leaking sensitive data, promising refunds it cannot issue, or taking destructive actions inside a software environment, White Circle says its platform can catch that too.

Customers can set custom enforcement actions, including rate-limiting and bans, and feed labelled user feedback back into White Circle’s models to improve accuracy over time. The platform supports 150 languages and is SOC 2 Type I and Type II certified and HIPAA-compliant.

What White Circle Catches in Real Time

Risk Detected	Action Taken
Prompt injection attacks	Block at input layer
Hallucinated answers	Flag or suppress output
Sensitive data leakage	Redact and alert team
Model drift over time	Real-time analytics
Abusive or malicious users	Rate limit or ban

Why Enterprises Are Lining Up

The numbers tell their own story. The company has already processed more than 1 billion API requests and works with customers including Lovable and two of the world’s largest digital banks.

White Circle says its platform is already used by Lovable, the vibe-coding startup, as well as several fintech and legal companies. The pattern matches the moment. As firms look to capitalise on the AI boom, with vibe coding letting anyone ship AI products in an instant and increased reports of AI models breaking free from their safety rails, White Circle is positioning itself as the all-in-one enterprise-grade platform that tests, protects, observes and improves AI in real time.

Shilov sees a structural problem with leaving safety to the labs alone. AI companies still charge for input and output tokens even when a model refuses a harmful request, which reduces the financial incentive to block abuse before it reaches the model. He also pointed to what researchers call the alignment tax, the idea that training models to be safer can sometimes make them less performant on tasks such as coding.

The Research Behind the Mission

White Circle is not just selling software. CircleGuardBench, published in May 2025, is a benchmark that tests how AI moderation models perform under real-world conditions. KillBench ran more than one million experiments across 15 AI models from OpenAI, Google, Anthropic and xAI, and the study found preferences linked to nationality, religion, body type and even phone brand when models were asked to make decisions about human lives.

KillBench also found that structured-output integrations, common in production deployments, can reduce refusal rates but sometimes amplify biases. That finding alone should worry any company shipping AI into hiring, lending or healthcare.

“Denis and the White Circle team have an unusual combination of deep technical credibility and a clear commercial instinct,” said Ophelia Cai, Partner at Tiny VC. “The KillBench research alone shows what’s possible when you approach AI safety empirically rather than ideologically and the team is building the infrastructure the industry genuinely needs.”

The bigger picture is what Shilov keeps coming back to. As companies transition from using chatbots to autonomous AI agents that can write code, browse the web, access files, and take actions on a user’s behalf, the risks become much more widespread. A customer service bot that wrongly promises a refund, a coding agent that drops malware on a virtual machine, a fintech agent that mishandles personal data – these are not theoretical worries anymore.

White Circle’s bet is simple but bold. The future of AI safety will not be won inside model training labs alone. It will be won in the messy, live, unpredictable world of production, where real users meet real models every second. With $11 million in the bank and the industry’s brightest minds personally vouching for it, this Paris startup has just stepped into one of the most important fights in tech. What do you think about giving every company a guardrail for its AI? Share your views in the comments and join the conversation on X using #WhiteCircle and #AISafety.

TTE MEDIA

White Circle Bags $11M to Stop AI Models Going Rogue

NEWS

White Circle Bags $11M to Stop AI Models Going Rogue

Inside the $11 Million Seed Round Shaking Up AI Safety

Who Wrote the Cheques

The Viral Jailbreak That Started It All

How the Platform Keeps AI Models in Check

What White Circle Catches in Real Time

Why Enterprises Are Lining Up

The Research Behind the Mission

Leave a Reply
Cancel reply

Leave a Reply

SEARCH

Orbio Closes $21M From Dawn Capital to Scale AI Frontline Agents

Honda’s Electric Trial Bike Lands in the Top Five at TrialGP Japan

GTA 6’s November Launch Is Reshaping the 2026 Game Calendar

Qorelo Lands $3.5M Seed to Tackle SAP’s 2027 Migration Crunch

Italian Edtech Sirius Game Raises €1.3M to Scale Primary School Play

Bestie Bite Raises €1.5M to Take AI Video Reviews to the US

European Tech’s €2.8B Week Came With a US Acquirer Twist

Rainn Wilson Calls Out Media’s Double Standard on Platner

OnePlus N6 Confirmed for June 30 India Launch as First N-Series Phone

Munich Court: Google Liable for False Answers in AI Overviews

Zcash Patched a Double-Spend Bug as ZEC Climbed 5%

Steam Summer Sale 2026 Locks In June 25 to July 9 Dates

‘Widow’s Bay’ Review: Apple TV’s Sleeper Horror-Comedy Earns Its Fog

Amazon Scraps Its Stargate Revival After a 20-Week Writers Room

Citigroup Says ETF Outflows Drove Bitcoin’s Crash, Not Strategy’s Sale

Coinbase Invests in Ethena, ENA Jumps 10% on Open-Market Buy

CLARITY Act Floor Vote Likely Shifts to August, Lummis Says

Gigaton Lands $26M to Replace Heavy Industry’s Control Stack

London AI Lab Inherent Raises $50m to Reinvent Science

Quobly’s €115M Bet to Scale Silicon Quantum Computing

Trending

TTE MEDIA

White Circle Bags $11M to Stop AI Models Going Rogue

Inside the $11 Million Seed Round Shaking Up AI Safety

Who Wrote the Cheques

The Viral Jailbreak That Started It All

How the Platform Keeps AI Models in Check

What White Circle Catches in Real Time

Why Enterprises Are Lining Up

The Research Behind the Mission

You may like

Leave a Reply Cancel reply

Leave a Reply

SEARCH

Orbio Closes $21M From Dawn Capital to Scale AI Frontline Agents

Honda’s Electric Trial Bike Lands in the Top Five at TrialGP Japan

GTA 6’s November Launch Is Reshaping the 2026 Game Calendar

Qorelo Lands $3.5M Seed to Tackle SAP’s 2027 Migration Crunch

Italian Edtech Sirius Game Raises €1.3M to Scale Primary School Play

Bestie Bite Raises €1.5M to Take AI Video Reviews to the US

European Tech’s €2.8B Week Came With a US Acquirer Twist

Rainn Wilson Calls Out Media’s Double Standard on Platner

OnePlus N6 Confirmed for June 30 India Launch as First N-Series Phone

Munich Court: Google Liable for False Answers in AI Overviews

Zcash Patched a Double-Spend Bug as ZEC Climbed 5%

Steam Summer Sale 2026 Locks In June 25 to July 9 Dates

‘Widow’s Bay’ Review: Apple TV’s Sleeper Horror-Comedy Earns Its Fog

Amazon Scraps Its Stargate Revival After a 20-Week Writers Room

Citigroup Says ETF Outflows Drove Bitcoin’s Crash, Not Strategy’s Sale

Coinbase Invests in Ethena, ENA Jumps 10% on Open-Market Buy

CLARITY Act Floor Vote Likely Shifts to August, Lummis Says

Gigaton Lands $26M to Replace Heavy Industry’s Control Stack

London AI Lab Inherent Raises $50m to Reinvent Science

Quobly’s €115M Bet to Scale Silicon Quantum Computing

Trending

Leave a Reply
Cancel reply