
CoT Forgery flipped prompt defense success to ~60%, but not by “jailbreaking”
Researchers show LLM role tags fail in the model’s internal representation, enabling cocaine-recipe compliance at scale.
By Lama Al-Rashid·· 4 min

Curating from trusted global sources…
1 briefing · “llm security”

Researchers show LLM role tags fail in the model’s internal representation, enabling cocaine-recipe compliance at scale.