18 16 1

Jack Zhang

jackzhang

http://jackz.io/

AI & ML interests

None yet

Recent Activity

authored a paper 2 days ago

Jailbreak Distillation: Renewable Safety Benchmarking

authored a paper 2 days ago

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety

authored a paper 2 days ago

Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models

View all activity

Organizations

Collections 1

Papers 12

models 5

datasets 38

jackzhang/ManyIH-Bench

Viewer • Updated 5 days ago • 853 • 51

jackzhang/mbpp-sanitized-withsig

Viewer • Updated Mar 8 • 427 • 15

jackzhang/mbpp-processed

Viewer • Updated Jan 20 • 500 • 6

jackzhang/JBDistill-Bench

Viewer • Updated Aug 24, 2025 • 1k • 15

jackzhang/wjharm-or79k-stage2

Viewer • Updated Aug 8, 2025 • 79.5k • 10

jackzhang/wjharm-or79k-stage1

Viewer • Updated Aug 8, 2025 • 79.5k • 10

jackzhang/cosalign_train_simplified

Viewer • Updated Jul 30, 2025 • 125k • 19

jackzhang/cosalign_test_simplfied

Viewer • Updated Jul 30, 2025 • 3.2k • 24

jackzhang/wjharm-or79k

Viewer • Updated Jul 24, 2025 • 159k • 13

jackzhang/wjtrain_prompts-advonly-held500

Viewer • Updated Jul 9, 2025 • 161k • 5

View 38 datasets

Jack Zhang

AI & ML interests

Recent Activity

Organizations

Collections 1

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

jackzhang/Llama3.1-8B-Instruct-CoSAlign

jackzhang/CoSAlign-Test

jackzhang/CoSApien

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements

jackzhang/Llama3.1-8B-Instruct-CoSAlign

jackzhang/CoSAlign-Test

jackzhang/CoSApien

Papers 12

models 5

jackzhang/Llama3.1-8B-Instruct-CoSAlign

jackzhang/newsspan.normalized.quip.100-1.bf

jackzhang/cnndm.normalized.quip.50-1.bf

jackzhang/newsspan.normalized.quip.50-1.bf

jackzhang/newsspan.normalized.quip.25-1.bf

datasets 38

jackzhang/ManyIH-Bench

jackzhang/mbpp-sanitized-withsig

jackzhang/mbpp-processed

jackzhang/JBDistill-Bench

jackzhang/wjharm-or79k-stage2

jackzhang/wjharm-or79k-stage1

jackzhang/cosalign_train_simplified

jackzhang/cosalign_test_simplfied

jackzhang/wjharm-or79k

jackzhang/wjtrain_prompts-advonly-held500

Jack Zhang

AI & ML interests

Recent Activity

Organizations

Collections 1

Papers 12

models 5 Sort: Recently updated

datasets 38 Sort: Recently updated

models 5

datasets 38