At 10:30 AM on May 13, 2024, OpenAI launched GPT-4o. It was their most capable model ever, trained with safety mechanisms the team had spent months hardening.
By 2:29 PM that same day, it was broken.
Four hours. That's how long it took an anonymous person on the internet, operating under the name "Pliny the Prompter," to extract nuclear weapon plans, meth recipes, and restricted medical advice from OpenAI's flagship model. He posted the method publicly on X. Anyone could copy-paste it. (Source: VentureBeat | Original jailbreak tweet by @elder_plinius)
He's done this to every major model since.
The pattern never changes. A lab drops a model. They announce robust safety testing. Pliny jailbreaks it within hours, sometimes minutes. He posts to his 100,000+ X followers. The lab patches it. He moves to the next model.
OpenAI. Anthropic. Google. Meta. xAI. Not one has held for 24 hours.
Eliezer Yudkowsky, arguably the most prominent voice in AI safety research, put it ...
