Artifacts for paper "Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements" (https://arxiv.org/abs/2410.08968)
Jack Zhang
jackzhang
AI & ML interests
None yet
Recent Activity
authored a paper 2 days ago
Jailbreak Distillation: Renewable Safety Benchmarking authored a paper 2 days ago
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety authored a paper 2 days ago
Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in
Large Reasoning Models