Uncovering Safety Risks of Large Language Models through Concept Activation Vector Paper • 2404.12038 • Published Apr 18, 2024 • 1
GuidedBench: Equipping Jailbreak Evaluation with Guidelines Paper • 2502.16903 • Published Feb 24, 2025