SDPO under Continual Learning
Meng Wang
Moenupa
AI & ML interests
MLLM Post-Training & Alignment
Recent Activity
updated a model about 10 hours ago
Moenupa/Qwen3-4B-Thinking-2507-SDPO1restart-Math updated a model about 11 hours ago
Moenupa/Qwen3-4B-Thinking-2507-SDPO0restart-Math updated a model about 12 hours ago
Moenupa/Qwen3-4B-Thinking-2507-SDPO0-MathChemToolCode