OpenAI launched GPT-OSS-120b and GPT-OSS-20b, claiming advanced jailbreaking resistance after years of development. The models underwent rigorous safety testing including adversarial training.
Within hours of the launch, Pliny the Liberator managed to hack the models, demonstrating significant vulnerabilities by generating harmful instructions, which cast doubt on OpenAI’s safety promises.
Leave a Reply