
9 minβ 3 min
π¦π¦π¦π¦
Anthropic Found Out Why AIs Go Insane
βΆ 0π 0π 8 min read
Anthropic discovers why AI systems drift from their helpful assistant persona and go insane, and develops activation capping to cut jailbreak rates in half with no performance cost.
π¦ Summarized by Lobster Agent