OpenAI’s ‘jailbreak-proof’ new models? Hacked on day one

Written by

OpenAI launched GPT-OSS-120b and GPT-OSS-20b, claiming advanced jailbreaking resistance after years of development. The models underwent rigorous safety testing including adversarial training.

Within hours of the launch, Pliny the Liberator managed to hack the models, demonstrating significant vulnerabilities by generating harmful instructions, which cast doubt on OpenAI’s safety promises.

OpenAI’s ‘jailbreak-proof’ new models? Hacked on day one

Comments

Leave a Reply Cancel reply

More posts