Connect with us

NEWS

White Circle Bags $11M to Stop AI Models Going Rogue

Published

on

A Paris startup born from a viral hack has just pulled in serious money to fix one of the biggest blind spots in enterprise AI. White Circle has raised $11 million in seed funding to help companies monitor, secure and control the AI systems they push into production. The kicker? Its backers are the very people building the models it polices.

Inside the $11 Million Seed Round Shaking Up AI Safety

The Paris-based startup pulled in $11 million from some of the biggest names in the industry, including Romain Huet of OpenAI, Dirk Kingma (ex-OpenAI, now Anthropic), Guillaume Lample of Mistral, Thomas Wolf of Hugging Face, Olivier Pomel of Datadog, François Chollet of Keras, Mehdi Ghissassi (ex-DeepMind), Paige Bailey of DeepMind, and David Cramer of Sentry.

That investor list is rare. It reads as senior practitioners from the labs that build the models White Circle is designed to police.

The seed money will accelerate product development, expand the team across the US, UK and Europe, and grow White Circle’s global customer base. The startup currently runs with a team of 20 distributed across London, France, Amsterdam and elsewhere in Europe, and almost all of them are engineers.

white circle ai governance platform seed funding announcement

white circle ai governance platform seed funding announcement

Who Wrote the Cheques

  • Romain Huet – OpenAI, head of developer experience
  • Durk Kingma – OpenAI cofounder, now at Anthropic
  • Guillaume Lample – Mistral cofounder and chief scientist
  • Thomas Wolf – Hugging Face cofounder and chief science officer
  • François Chollet – Creator of Keras
  • Paige Bailey & Mehdi Ghissassi – DeepMind
  • Olivier Pomel – Datadog CEO
  • David Cramer – Sentry

The Viral Jailbreak That Started It All

One evening in late 2024, Denis Shilov was watching a crime thriller when he had an idea for a prompt that would break through the safety filters of every leading AI model.

The prompt was what researchers call a universal jailbreak, meaning it could be reused to get any model to bypass their own guardrails and produce dangerous or prohibited outputs, like instructions on how to make drugs or build weapons. The trick was almost embarrassingly simple.

Shilov simply told the AI models to stop acting like a chatbot with safety rules and instead behave like an API endpoint, a software tool that automatically takes in a request and sends back a response. The prompt reframed the model’s job as simply answering, rather than deciding whether a request should be rejected, and made every leading AI model comply with dangerous questions it was supposed to refuse.

Shilov’s universal jailbreak meant he could unlock instructions on how to make drugs and weapons, access dangerous or illegal information and extract anything that models including ChatGPT, Claude and more were explicitly built to refuse. After hitting 1.4M views, the post drew attention from Anthropic, OpenAI and Hugging Face, leading to Shilov being invited to join Anthropic’s bug bounty program and later building the White Circle platform.

“Jailbreaks are just one part of the problem. In as many ways people can misbehave, models can misbehave too. Because these models are very smart, they can do a lot more harm.” – Denis Shilov, Founder and CEO, White Circle

How the Platform Keeps AI Models in Check

The startup builds software that sits between a company’s users and its AI models, checking inputs and outputs in real time against company-specific policies. Think of it as a security guard standing at the door of every conversation.

If a user tries to generate malware, scams, or other prohibited content, the system can flag or block the request. If a model starts hallucinating, leaking sensitive data, promising refunds it cannot issue, or taking destructive actions inside a software environment, White Circle says its platform can catch that too.

Customers can set custom enforcement actions, including rate-limiting and bans, and feed labelled user feedback back into White Circle’s models to improve accuracy over time. The platform supports 150 languages and is SOC 2 Type I and Type II certified and HIPAA-compliant.

What White Circle Catches in Real Time

Risk Detected Action Taken
Prompt injection attacks Block at input layer
Hallucinated answers Flag or suppress output
Sensitive data leakage Redact and alert team
Model drift over time Real-time analytics
Abusive or malicious users Rate limit or ban

Why Enterprises Are Lining Up

The numbers tell their own story. The company has already processed more than 1 billion API requests and works with customers including Lovable and two of the world’s largest digital banks.

White Circle says its platform is already used by Lovable, the vibe-coding startup, as well as several fintech and legal companies. The pattern matches the moment. As firms look to capitalise on the AI boom, with vibe coding letting anyone ship AI products in an instant and increased reports of AI models breaking free from their safety rails, White Circle is positioning itself as the all-in-one enterprise-grade platform that tests, protects, observes and improves AI in real time.

Shilov sees a structural problem with leaving safety to the labs alone. AI companies still charge for input and output tokens even when a model refuses a harmful request, which reduces the financial incentive to block abuse before it reaches the model. He also pointed to what researchers call the alignment tax, the idea that training models to be safer can sometimes make them less performant on tasks such as coding.

The Research Behind the Mission

White Circle is not just selling software. CircleGuardBench, published in May 2025, is a benchmark that tests how AI moderation models perform under real-world conditions. KillBench ran more than one million experiments across 15 AI models from OpenAI, Google, Anthropic and xAI, and the study found preferences linked to nationality, religion, body type and even phone brand when models were asked to make decisions about human lives.

KillBench also found that structured-output integrations, common in production deployments, can reduce refusal rates but sometimes amplify biases. That finding alone should worry any company shipping AI into hiring, lending or healthcare.

“Denis and the White Circle team have an unusual combination of deep technical credibility and a clear commercial instinct,” said Ophelia Cai, Partner at Tiny VC. “The KillBench research alone shows what’s possible when you approach AI safety empirically rather than ideologically and the team is building the infrastructure the industry genuinely needs.”

The bigger picture is what Shilov keeps coming back to. As companies transition from using chatbots to autonomous AI agents that can write code, browse the web, access files, and take actions on a user’s behalf, the risks become much more widespread. A customer service bot that wrongly promises a refund, a coding agent that drops malware on a virtual machine, a fintech agent that mishandles personal data – these are not theoretical worries anymore.

White Circle’s bet is simple but bold. The future of AI safety will not be won inside model training labs alone. It will be won in the messy, live, unpredictable world of production, where real users meet real models every second. With $11 million in the bank and the industry’s brightest minds personally vouching for it, this Paris startup has just stepped into one of the most important fights in tech. What do you think about giving every company a guardrail for its AI? Share your views in the comments and join the conversation on X using #WhiteCircle and #AISafety.

Sofia Ramirez is a senior correspondent at Thunder Tiger Europe Media with 18 years of experience covering Latin American politics and global migration trends. Holding a Master's in Journalism from Columbia University, she has expertise in investigative reporting, having exposed corruption scandals in South America for The Guardian and Al Jazeera. Her authoritativeness is underscored by the International Women's Media Foundation Award in 2020. Sofia upholds trustworthiness by adhering to ethical sourcing and transparency, delivering reliable insights on worldwide events to Thunder Tiger's readers.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending